10,000 Matching Annotations
  1. Dec 2025
    1. **Summary ** 1) This isn’t nostalgia — it’s a structural change in childhood space

      The essay argues that across history and cultures, kids have naturally carved out autonomous zones (streets, empty lots, forests, corners of towns) where they own time and space away from adults. That’s not a random pattern — it’s deeply human behavior. The Browser

      The disappearance of these spaces isn’t just kids playing less. It’s a loss of a psychological environment where children make sense of the world on their own terms.

      Insight: It reframes the problem from “kids spend more time inside” to “children are being structurally excluded from public life,” not by kids’ choices, but by how adult society is organized.

      2) The cause is more built environment + social patterns than screens

      The author pushes back against the common idea that the internet is the big culprit. Instead, he points to car-dependent suburbs, families spread far apart, and modern work patterns (parents not at home, schedules tightly managed), making free interaction physically harder. aman.bh

      Insight: Technology is a symptom of isolation, not the root cause. The real bottlenecks are:

      towns designed without gathering places

      kids physically separated from peers

      reliance on cars over walking/biking

      3) Modern “play” is not truly play

      There’s a distinction made between:

      Structured activities (sports practice, classes with adults)

      Unstructured peer play (kids deciding what to do, how to do it, together)

      The latter is what’s disappearing. Organized activities fill time, but don’t create the same kind of autonomy and peer culture that spontaneous play does. aman.bh

      Insight: If all your child’s social interactions are planned by adults, the dynamic changes — it becomes supervision, not co-participation.

      4) Internet/online spaces are a child-managed arena

      One reason kids gravitate online is because it’s one of the only unsupervised social spaces left. They aren’t free in the physical world, so they find agency where adults are less present (forums, chats, games). The Browser

      New angle: The internet isn’t the cause of isolation — it’s a response to it. Kids go where they can control interactions without adult oversight.

      5) The core issue isn’t “kids vs screens” — it’s where childhood autonomy can exist

      This reframes the whole debate from blaming technologies to asking:

      Where in the modern city can children act independently?

      And the answer the essay hints at is: almost nowhere — so kids create their own spaces, even if imperfect.

      Insight: Autonomy isn’t earned by limiting devices. It’s earned by restoring real-world environments where children can make choice, risk, negotiation, and friendship happen without adult orchestration.

      6) Play functions as a designed culture, not an activity

      When the essay references he “wishes children had forests,” he’s pointing to a deeper truth: What matters isn’t a physical object (forest) — it’s the freedom to explore, innovate, and improvise with peers.

      Insight: Play loses value when it’s designed by adults for kids (e.g., programs, classes) and gains value when it’s designed by kids for themselves.

      7) This problem isn’t just a “kids issue” — it’s a community design failure

      The commentary makes it clear that the conditions limiting play — distance, traffic fears, suburban sprawl — are not random. They’re outcomes of how cities and societies organize:

      roads instead of paths

      fences instead of common spaces

      schedules instead of unstructured time

      Insight: If you want kids to have autonomy, you have to change the adult world — it’s not something kids can generate on their own.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      In this important study, the authors characterized the transformation of neural representations of olfactory stimuli from the primary sensory cortex to multisensory regions in the medial temporal lobe and investigated how they were affected by non-associative learning. The authors used high-density silicon probe recordings from five different cortical regions while familiar vs. novel odors were presented to a head-restrained mouse. This is a timely study because unlike other sensory systems (e.g., vision), the progressive transformation of olfactory information is still poorly understood. The authors report that both odor identity and experience are encoded by all of these five cortical areas but nonetheless some themes emerge. Single neuron tuning of odor identity is broad in the sensory cortices but becomes narrowly tuned in hippocampal regions. Furthermore, while experience affects neuronal response magnitudes in early sensory cortices, it changes the proportion of active neurons in hippocampal regions. Thus, this study is an important step forward in the ongoing quest to understand how olfactory information is progressively transformed along the olfactory pathway.

      The study is well-executed. The direct comparison of neuronal representations from five different brain regions is impressive. Conclusions are based on single neuronal level as well as population level decoding analyses. Among all the reported results, one stands out for being remarkably robust. The authors show that the anterior olfactory nucleus (AON), which receives direct input from the olfactory bulb output neurons, was far superior at decoding odor identity as well as novelty compared to all the other brain regions. This is perhaps surprising because the other primary sensory region - the piriform cortex - has been thought to be the canonical site for representing odor identity. A vast majority of studies have focused on aPCx, but direct comparisons between odor coding in the AON and aPCx are rare. The experimental design of this current study allowed the authors to do so and the AON was found to convincingly outperform aPCx. Although this result goes against the canonical model, it is consistent with a few recent studies including one that predicted this outcome based on anatomical and functional comparisons between the AON-projecting tufted cells vs. the aPCx-projecting mitral cells in the olfactory bulb (Chae, Banerjee et. al. 2022). Future experiments are needed to probe the circuit mechanisms that generate this important difference between the two primary olfactory cortices as well as their potential causal roles in odor identification.

      The authors were also interested in how familiarity vs. novelty affects neuronal representation across all these brain regions. One weakness of this study is that neuronal responses were not measured during the process of habituation. Neuronal responses were measured after four days of daily exposure to a few odors (familiar) and then some other novel odors were introduced. This creates a confound because the novel vs. familiar stimuli are different odorants and that itself can lead to drastic differences in evoked neural responses. Although the authors try to rule out this confound by doing a clever decoding and Euclidian distance analysis, an alternate more straightforward strategy would have been to measure neuronal activity for each odorant during the process of habituation.

      Reviewer #2 (Public review):

      This manuscript investigates how olfactory representations are transformed along the cortico-hippocampal pathway in mice during a non-associative learning paradigm involving novel and familiar odors. By recording single-unit activity in several key brain regions (AON, aPCx, LEC, CA1, and SUB), the authors aim to elucidate how stimulus identity and experience are encoded and how these representations change across the pathway.

      The study addresses an important question in sensory neuroscience regarding the interplay between sensory processing and signaling novelty/familiarity. It provides insights into how the brain processes and retains sensory experiences, suggesting that the earlier stations in the olfactory pathway, the AON aPCx, play a central role in detecting novelty and encoding odor, while areas deeper into the pathway (LEC, CA1 & Sub) are more sparse and encodes odor identity but not novelty/familiarity. However, there are several concerns related to methodology, data interpretation, and the strength of the conclusions drawn.

      Strengths:

      The authors combine the use of modern tools to obtain high-density recordings from large populations of neurons at different stages of the olfactory system (although mostly one region at a time) with elegant data analyses to study an important and interesting question.

      Weaknesses:

      (1) The first and biggest problem I have with this paper is that it is very confusing, and the results seem to be all over the place. In some parts, it seems like the AON and aPCx are more sensitive to novelty; in others, it seems the other way around. I find their metrics confusing and unconvincing. For example, the example cells in Figure 1C show an AON neuron with a very low spontaneous firing rate and a CA1 with a much higher firing rate, but the opposite is true in Figure 2A. So, what are we to make of Figure 2C that shows the difference in firing rates between novel vs. familiar odors measured as a difference in spikes/sec. This seems nearly meaningless. The authors could have used a difference in Z-scored responses to normalize different baseline activity levels. (This is just one example of a problem with the methodology.)

      We appreciate the reviewer’s concerns regarding clarity and methodology. It is less clear why all neurons in a given brain area should have similar firing rates. Anatomically defined brain areas typically comprise of multiple cell types, which can have diverse baseline firing rates. Since we computed absolute firing rate differences per neuron (i.e., novel vs. familiar odor responses within the same neuron), baseline differences across neurons do not have a major impact.

      The suggestion to use Z-scores instead of absolute firing rate differences is well taken. However, Z-scoring assumes that the underlying data are normally distributed, which is not the case in our dataset. Specifically, when analyzing odor-evoked firing rates on a per-neuron basis, only 4% of neurons exhibit a normal distribution. In cases of skewed distributions, Z-scoring can distort the data by exaggerating small variations, leading to misleading conclusions. We acknowledge that different analysis methods exist, we believe that our chosen approach best reflects the properties of the dataset and avoids potential misinterpretations introduced by inappropriate normalization techniques.

      (2) There are a lot of high-level data analyses (e.g., decoding, analyzing decoding errors, calculating mutual information, calculating distances in state space, etc.) but very little neural data (except for Figure 2C, and see my comment above about how this is flawed). So, if responses to novel vs. familiar odors are different in the AON and aPCx, how are they different? Why is decoding accuracy better for novel odors in CA1 but better for familiar odors in SUB (Figure 3A)? The authors identify a small subset of neurons that have unusually high weights in the SVM analyses that contribute to decoding novelty, but they don't tell us which neurons these are and how they are responding differently to novel vs. familiar odors.

      We performed additional analyses to address the reviewer’s feedback (Figures 2C-E and lines 118-132) and added more single-neuron data (Figures 1, S3 and S4).

      (3) The authors call AON and aPCx "primary sensory cortices" and LEC, CA1, and Sub "multisensory areas". This is a straw man argument. For example, we now know that PCx encodes multimodal signals (Poo et al. 2021, Federman et al., 2024; Kehl et al., 2024), and LEC receives direct OB inputs, which has traditionally been the criterion for being considered a "primary olfactory cortical area". So, this terminology is outdated and wrong, and although it suits the authors' needs here in drawing distinctions, it is simplistic and not helpful moving forward.

      We appreciate the reviewer’s concern regarding the classification of brain regions as “primary sensory” versus “multisensory.” Of note, the cited studies (Poo et al., 2021; Federman et al., 2024; Kehl et al., 2024) focus on posterior PCx (pPCx), while our recordings were conducted in very anterior section of anterior PCx. The aPCx and pPCx have distinct patterns of connectivity, both anatomically and functionally. To the best of our knowledge, there is no evidence for multimodal responses in aPCx, whereas there is for LEC, CA1 and SUB. Furthermore, our distinction is not based on a connectivity argument, as the reviewer suggests, but on differences in the α-Poisson ratio (Figure 1E and F).

      To avoid confusion due to definitions of what constitutes a “primary sensory” region, we adopted a more neutral description throughout the manuscript.

      (4) Why not simply report z-scored firing rates for all neurons as a function of trial number? (e.g., Jacobson & Friedrich, 2018). Figure 2C is not sufficient.

      Regarding z-scores, please see response to 1). We further added a figure showing responses of all neurons to novel stimuli (using ROC instead of z-scoring, as described previously (e.g. Cohen et al. Nature 2012). We added the following figure to the supplementary for the completeness of the analysis (S2E).

      For example, in the Discussion, they say, "novel stimuli caused larger increases in firing rates than familiar stimuli" (L. 270), but what does this mean?

      This means that on average, the population of neurons exhibit higher firing rates in response to novel odors compared to familiar ones.

      Odors typically increase the firing in some neurons and suppress firing in others. Where does the delta come from? Is this because novel odors more strongly activate neurons that increase their firing or because familiar odors more strongly suppress neurons?

      We thank the reviewer for this valuable feedback and extended the characterization of firing rate properties, including a separate analysis of neurons i) significantly excited by odorants, ii) significantly inhibited by odorants and iii) not responsive to odorants. We added the analysis and corresponding discussion to the main manuscript (Figures 2C-E and lines 118-132)

      (5) Lines 122-124 - If cells in AON and aPCx responded the same way to novel and familiar odors, then we would say that they only encode for odor and not at all for experience. So, I don't understand why the authors say these areas code for a "mixed representation of chemical identity and experience." "On the other hand," if LEC, CA1, and SUB are odor selective and only encode novel odors, then these areas, not AON and aPCx, are the jointly encoding chemical identity and experience. Also, I do not understand why, here, they say that AON and PCx respond to both while LEC, CA1, and SUB were selective for novel stimuli, but the authors then go on to argue that novelty is encoded in the AON and PCx, but not in the LEC, CA1, and SUB.

      We appreciate the reviewer’s request for clarification. Throughout the brain areas we studied, odorant identity and experience can be decoded. However, the way information is represented is different between regions. We acknowledge that that “mixed” representation is a misleading term and removed it from the manuscript.

      In AON and aPCx, neurons significantly respond to both novel and familiar odors. However, the magnitude of their responses to novel and familiar odors is sufficiently distinct to allow for decoding of odor experience (i.e., whether an odor is novel or familiar). Moreover, novelty engages more neurons in encoding the stimulus (Figure 2D). In neural space, the position of an odor’s representation in AON and aPCx shifts depending on whether it is novel or familiar, meaning that experience modifies the neural representation of odor identity. This suggests that in these regions the two representations are intertwined.

      In contrast, some neurons in LEC, CA1, and SUB exhibit responses to novel odors, but few neurons respond to familiar odors at all. This suggests a more selective encoding of novelty.

      (6) Lines 132-140 - As presented in the text and the figure, this section is poorly written and confusing. Their use of the word "shuffled" is a major source of this confusion, because this typically is the control that produces outcomes at the chance level. More importantly, they did the wrong analysis here. The better and, I think, the only way to do this analysis correctly is to train on some of the odors and test on an untrained odor (i.e., what Bernardi et al., 2021 called "cross-condition generalization performance"; CCGP).

      We appreciate the feedback and thank the reviewer for the recommendation to implement cross-condition generalization performance (CCGP) as used in Bernardi et al., 2020. We acknowledge that the term "shuffled" may have caused confusion, as it typically refers to control analyses producing chance-level outcomes. In our case, by "shuffling" we shuffled the identity of novel and familiar odors to assess how much the decoder relies on odor identity when distinguishing novelty. This test provided insight into how novelty-based structure exists within neural activity beyond random grouping but does not directly assess generalization.

      As suggested, we used CCGP to measure how well novelty-related representations generalize across different odors. Our findings show that in AON and aPCx, novelty-related information is indeed highly generalizable, supporting the idea that these regions encode novelty in a less odor-selective manner (Figure 2K).

      Reviewer #3 (Public review):

      In this manuscript, the authors investigate how odor-evoked neural activity is modulated by experience within the olfactory-hippocampal network. The authors perform extracellular recordings in the anterior olfactory nucleus (AON), the anterior piriform (aPCx) and lateral entorhinal cortex (LEC), the hippocampus (CA1), and the subiculum (SUB), in naïve mice and in mice repeatedly exposed to the same odorants. They determine the response properties of individual neurons and use population decoding analyses to assess the effect of experience on odor information coding across these regions.

      The authors' findings show that odor identity is represented in all recorded areas, but that the response magnitude and selectivity of neurons are differentially modulated by experience across the olfactory-hippocampal pathway.

      Overall, this work represents a valuable multi-region data set of odor-evoked neural activity. However, limitations in the interpretability of odor experience of the behavioral paradigm, and limitations in experimental design and analysis, restrict the conclusions that can be drawn from this study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some suggestions, in no particular order, to further improve the manuscript:

      (1) The example neuronal responses for CA1 and SUB in Figure 1 are not very inspiring. To my eyes, the odor period response is not that different from the baseline period. In general, a thorough characterization of firing rate properties during the odor period between the different brain regions would be informative.

      We thank the reviewer for this valuable feedback. We have replaced the example neurons from CA1 and SUB in Figure 1C. We further extended the characterization of firing rate properties, including a separate analysis of neurons i) significantly excited by odorants, ii) significantly inhibited by odorants and iii) not responsive to odorants. We added the analysis and corresponding discussion to the main manuscript (Figures 2C-E and lines 118-132)

      (2) For the summary in Figure 1, why not show neuronal responses as z-scored firing rates as opposed to auROC?

      We chose to use auROC instead of z-scored firing rates due to the non-normality of the dataset, which can distort results when using z-scores. Specifically, z-scoring can exaggerate small deviations in neurons with low responsiveness, potentially leading to misleading conclusions. auROC provides a more robust measure of response change that is less sensitive to these distortions because it does not assume any specific distribution. This approach has been used previously (e.g. Cohen et al. 2012, Nature).

      (3) To study novelty, the authors presented odorants that were not used during four days of habituation. But this design makes it hard to dissociate odor identity from novelty. Why not track the response of the same odorants during the habituation process itself?

      We respectfully disagree with the argument that using different stimuli as novel and familiar constitutes a confound in our analysis. In our study, we used multiple different, structurally dissimilar single molecule chemicals which were randomly assigned to novel and familiar categories in each animal. If individual stimuli did cause “drastic differences in evoked neural responses”, these would be evenly distributed between novel and familiar stimuli. It is therefore extremely unlikely that the clear differences we observed between novel and familiar conditions and between brain areas can be attributed to the contribution of individual stimuli, in particular given our analyses was performed at the population level. In fact, we observed that responses between novel and familiar conditions were qualitatively very similar in the short time window after odor onset (Figure 1G and H).

      Importantly, the goal of this study was to investigate the impact of long-term habituation over more than 4 days, rather than short term habituation during one behavioral session. However, tracking the activity of large numbers of neurons across multiple days presents a significant technical challenge, due to the difficulty of identifying stable single-unit recordings over extended periods of time with sufficient certainty. Tools that facilitate tracking have recently been developed (e.g. Yuan AX et al., Elife. 2024) and it will be interesting to apply them to our dataset in the future.

      (4) Since novel odors lead to greater sniffing and sniffing strongly influences firing rates in the olfactory system, the authors decided to focus on a 400 ms window with similar sniffing rates for both novel vs. familiar odors. Although I understand the rationale for this choice, I worry that this is too restrictive, and it may not capture the full extent of the phenomenology.

      Could the authors model the effect of sniffing on firing rates of individual neurons from the data, and then check whether the odor response for novel context can be fully explained just by increased sniffing or not?

      It is an interesting suggestion to extend the window of analysis and observe how responses evolve with sniffing (and other behavioral reactions). To address this, we added an additional figure to the supplementary material, showing the mean responses of all neurons to novel stimuli during the entire odor presentation window (Fig. S1B).

      As suggested, we further created a Generalized Linear Model (GLM) for the entire 2s odor stimulation period, incorporating sniffing and novelty as independent variables. As expected, sniffing had a dominant impact on firing rate in all brain areas. A smaller proportion of neurons was modulated by novelty or by the interaction between novelty x breathing, suggesting the entrainment of neural activity by sniffing during the response to novel odors. These results support our decision to focus the analysis on the early 400ms window in order to dissociate the effects of novelty and behavioral responses. Taken together, our results suggest that odorant responses are modulated by novelty early during odorant processing, whereas at later stages sniffing becomes the predominant factor driving firing (Figure S2C-D).

      (5) The authors conclude that aPCx has a subset of neurons dedicated to familiar odors based on the distribution of SVM weights in Figure 3D. To me, this is the weakest conclusion of the paper because although significant, the effect size is paltry; the central tendencies are hardly different for the two conditions in aPCx. Could the authors show the PSTHs of some of these neurons to make this point more convincing?

      We appreciate the reviewer’s concern regarding the effect size. To strengthen our conclusion, we now include PSTHs of representative neurons in the least 10% and best 10% of neuronal population based on the SVM analysis (Figures S3 and S4). We hope this provides more clarity and support for the interpretation that there is a subset of neurons in aPCx that show greater sensitivity to familiar odors, despite the relatively modest central tendency differences.

      In the revised manuscript, we discuss the effect size more explicitly in the text to provide context for its significance (lines 193 - 195).

      Reviewer #2 (Recommendations for the authors):

      (1) The authors only talk about "responsive" neurons. Does this include neurons whose activity increases significantly (activated) and neurons whose activity decreases (suppressed)?

      Yes, the term "responsive" refers to neurons whose activity either increases significantly (excited) or decreases (inhibited) in response to the odor stimuli. We performed additional analyses to characterize responses separately for the different groups (Figure 2C-E and lines 118-132).

      (2) Line 54 - The Schoonover paper doesn't show that cells lose their responses to odors, but rather that the population of cells that respond to odors changes with time. That is, population responses don't become more sparse

      The fact that “the population of cells that respond to odors changes with time”, implies that some neurons lose their responsiveness (e.g. unit 2 in Figure 1 of Schoonover et al., 2021), while others become responsive (e.g. unit 1 in Figure 1 of Schoonover et al., 2021). Frequent responses reduce drift rate (Figure 4 of Schoonover et al., 2021), thus fewer neurons loose or gain responsiveness. We have revised the manuscript to clarify this.

      (3) Line 104 - "Recurrent" is incorrectly used here. I think the authors mean "repeated" or something more like that.

      Thank you for pointing this out. We replaced "recurrent" with "repeated".

      (4) Figure 3D - What is the scale bar here?

      We apologize for the accidental omission. The scale bar was be added to Figure 3D in the revised version of the manuscript.

      (5) Line 377 - They say they lowered their electrodes to "200 um/s per second." This must be incorrect. Is this just a typo, or is it really 200 um/s, because that's really fast?

      Thank you for pointing this out. It was 20 to 60 um/s, the change has been made in the manuscript.

      (6) Line 431: The authors say they used auROC to calculate changes in firing rates (which I think is only shown in Figure 1D). Note that auROC measures the discriminability of two distributions, not the strength or change in the strength of response.

      Indeed we used auROC to measure the discriminability of firing between baseline and during stimulus response. We have corrected the wording in the methods.

      (7) Figure 1B: The anatomical locations of the five areas they recorded from are straightforward, and this figure is not hugely helpful. However, the reader would benefit tremendously by including an experimental schematic. As is, we needed to scour the text and methods sections to understand exactly what they did when.

      We thank the reviewer for this suggestion. We included an experimental schematic in the supplementary material.

      (8) Figure 1F(left): This plot is much less useful without showing a pre-odor window, even if only times after the odor onset were used for calculation alpha

      We appreciate this concern, however the goal of Figure 1F is to illustrate the meaning of the alpha value itself. We chose not to include a pre-odor window comparison to avoid confusing the reader.

      (9) Figure 2A: What are the bar plots above the raster plots? Are these firing rates? Are the bars overlaid or stacked? Where is the y-axis scale bar?

      The bar plots above the raster plots represent a histogram of the spike count/trials over time, with a bin width of 50 ms. These bars are overlaid on the raster plot. We will include a y-axis scale bar in the revised figure to clarify the presentation.

      (10) Figure 4G: This makes no sense. First, the Y axis is supposed to measure standard deviation, but the axis label is spikes/s. Second, if responses in the AON are much less reliable than responses in "deeper" areas, why is odor decoding in AON so much better than in the other areas?

      We acknowledge the error in the axis label, and we will correct it to indicate the correct units. AON has a larger response variability but also larger responses magnitudes, which can explain the higher decoding accuracy.

      (11) From the model and text, one predicts that the lifetime sparseness increases along the pathway. The authors should use this metric as well/instead of "odor selectivity" because of problems with arbitrary thresholding.

      We acknowledge that lifetime sparseness, often computed using lifetime kurtosis, can be an informative measure of selectivity. However, we believe it has limitations that make it less suitable for our analysis. One key issue is that lifetime sparseness does not account for the stability of responses across multiple presentations of the same stimulus. In contrast, our odor selectivity measure incorporates trial-to-trial variability by considering responses over 10 trials and assessing significance using a Wilcoxon test compared to baseline. While the choice of a p-value threshold (e.g., 0.05) is somewhat arbitrary, it is a widely accepted statistical convention. Additionally, lifetime sparseness does not account for excitatory and inhibitory responses. For example, if a neuron X is strongly inhibited by odor A, strongly excited by odor B, and unresponsive to odors C and D, lifetime sparseness would classify it as highly selective for odor B, without capturing its inhibitory selectivity for odor A. The lifetime sparseness will be higher than if X was simply unresponsive for A.

      Our odor selectivity measure addresses this by considering both excitation and inhibition as potential responses. Thus, while lifetime sparseness could provide a useful complementary perspective in another type of dataset, it does not fully capture the dynamics of odor selectivity here.

      Author response 1.

      Lifetime Kurtosis distribution per region.

      Reviewer #3 (Recommendations for the authors):

      Main points:

      (1) The authors use a non-associative learning paradigm - repeated odor exposure - to test how experience modulates odor responses along the olfactory-hippocampal pathway. While repeated odor exposure clearly modulates odor-evoked neural activity, the relevance of this modulation and its differential effect across different brain areas are difficult to assess in the absence of any behavioral read-outs.

      Our experimental paradigm involves a robust, reliable behavioral readout of non-associative learning. Novel olfactory stimuli evoke a well-characterized orienting reaction, which includes a multitude of physiological reactions, including exploratory sniffing, facial movements and pupil dilation (Modirshanechi et al., Trends Neuroscience 2023). In our study, we focused on exploration sniffing.

      Compared to associative learning, non-associative learning might have received less attention. However, it is critically important because it forms the foundation for how organisms adapt to their environment through experience without forming associations. This is highlighted by the fact that non-instrumental stimuli can be remembered in large number (Standing, 1973) and with remarkable detail (Brady et al., 2008). While non-associative learning can thus create vast, implicit memory of stimuli in the environment, it is unclear how stimulus representations reflect this memory. Our study contributes to answering this question. We describe the impact of experience on olfactory sensory representations and reveal a transformation of representations from olfactory cortical to hippocampal structures. Our findings also indicate that sensory responses to familiar stimuli persist within sensory cortical and hippocampal regions, even after spontaneous orienting behaviors habituated. Further studies involving experimental manipulation techniques are needed to elucidate the causal mechanisms underlying the formation of stimulus memory during non-associative learning.

      (2) The authors discuss the olfactory-hippocampal pathway as a transition from primary sensory (AON, aPCx) to associative areas (LEC, CA1, SUB). While this is reasonable, given the known circuit connectivity, other interpretations are possible. For example, AON, aPCx, and LEC receive direct inputs from the olfactory bulb ('primary cortex'), while CA1 and SUB do not; AON receives direct top-down inputs from CA1 ('associative cortex'), while aPCx does not. In fact, the data presented in this manuscript does not appear to support a consistent, smooth transformation from sensory to associative, as implied by the authors (e.g. Figure 4A, F, and G).

      Thank you for this insightful comment. Indeed, there are complexities in the circuitry, and the relationships between different areas are not linear. We believe that AON and aPCx are distinctly different from LEC, CA1 and SUB, as the latter areas have been shown to integrate multimodal sensory information. To avoid confusion due to definitions of what constitutes a “primary sensory” region, we adopted a more neutral description throughout the manuscript. We also removed the term “gradual” to describe the transition of neural representations from olfactory cortical to hippocampal areas.

      (3) The analysis of odor-evoked responses is focused on a 400 ms window to exclude differences in sniffing behavior. This window spans 200 ms before and after the first inhalation after odor onset. Inhalation onset initiates neural odor responses - why do the authors include neural data before inhalation onset?

      The reason to include a brief time window prior to odor onset is to account for what is often called “partical” sniffs. In our experimental setup, odor delivery is not triggered by the animal’s inhalation. Therefore, it can happen that an animal has just begun to inhale when the stimulus is delivered. In this case, the animal is exposed to odorant molecules prior to the first complete inhalation after odor onset. We acknowledge that this limits the temporal resolution of our measurements, but it does not affect the comparison of sensory representations between different brain areas.

      It would also be interesting to explore the effect of sniffing behavior (see point 2) on odor-evoked neural activity.

      Thank you for your comment, we performed additional analysis including a GLM to address this question (Figure S2C-D).

      Minor points:

      (4) Figure 2A represents raster plots for 2 neurons per area - it is unclear how to distinguish between the 2 neurons in the plots.

      Figure 2A shows one example neuron per brain area. Each neurons has two raster plot which indicate responses to either a novel (orange) or a familiar stimulus (blue). We have revised the figure caption for clarity.

      (5) Overall, axes should be kept consistent and labeled in more detail. For example, Figure 2H and I are difficult to compare, given that the y-axis changes and that decoding accuracies are difficult to estimate without additional marks on the y-axis.

      Axes are indeed different, because chance level decoding accuracy is different between those two figures. The decoding between novel and familiar odors has a chance level of 0.5, while chance level decoding odors is 0.1 (there are 10 odors to decode the identity from).

      (6) Some parts of the discussion seem only loosely related to the data presented in this manuscript. For example, the statement that 'AON rather than aPCx should be considered as the primary sensory cortex in olfaction' seems out of context. Similarly, it would be helpful to provide data on the stability of subpopulations of neurons tuned to familiar odors, rather than simply speculate that they could be stable. The authors could summarize more speculative statements in an 'Ideas and Speculation' subsection.

      Thank you for your comment. We appreciate your perspective on our hypotheses. We have revised the discussion accordingly. Specifically, we removed the discussion of stable subpopulations, since we have not performed longitudinal tracking in this study.

      (7) The authors should try to reference relevant published work more comprehensively.

      Thank you for your comment. We attempted to include relevant published work without exceeding the limit for references but might have overseen important contributions. We apologize to our colleagues, whose relevant work might not have been cited.

    1. Reviewer #2 (Public review):

      Summary:

      This study investigated whether the identity of a peripheral saccade target object is fed back to the foveal retinotopic cortex during saccade preparation, a critical prediction of the foveal prediction hypothesis proposed by Kroell & Rolfs (2022). To achieve this, the authors leveraged a gaze-contingent fMRI paradigm, where the peripheral saccade target was removed before the eyes landed near it, and used multivariate decoding analysis to quantify identity information in the foveal cortex. The results showed that the identity of the saccade target object can be decoded based on foveal cortex activity, despite the fovea never directly viewing the object, and that the foveal feedback representation was similar to passive viewing and not explained by spillover effects. Additionally, exploratory analysis suggested IPS as a candidate region mediating such foveal decodability. Overall, these findings provide neural evidence for the foveal cortex processing the features of the saccade target object, potentially supporting the maintenance of perceptual stability across saccadic eye movements.

      Strengths:

      This study is well-motivated by previous theoretical findings (Kroell & Rolfs, 2022), aiming to provide neural evidence for a potential neural mechanism of trans-saccadic perceptual stability. The question is important, and the gaze-contingent fMRI paradigm is a solid methodological choice for the research goal. The use of stimuli allowing orthogonal decoding of stimulus category vs stimulus shape is a nice strength, and the resulting distinctions in decoded information by brain region are clean. The results will be of interest to readers in the field, and they fill in some untested questions regarding pre-saccadic remapping and foveal feedback.

      Weaknesses:

      The authors have done a nice job addressing the previous weaknesses. The remaining weaknesses / limitations are appropriately discussed in the manuscript. E.g., the use of only 4 unique stimuli in the experiment. The findings are intriguing and relevant to saccadic remapping and foveal feedback, but somewhat limited in terms of the ability to draw theoretical distinctions between these related phenomena.

      Specifics:

      The revised manuscript is much improved in terms of framing and discussion of the prior literature, and the theoretical claims are now stated with appropriate nuance.

      I have two remaining minor suggestions/comments, which the authors may optionally respond to:

      (1) In the parametric modulation analysis, the authors' additional analyses nicely addresses my concern and strengthens the claim. However, the description in the revised manuscript (Pg 7 Ln 190-191) is minimal and may be difficult to grasp what the control analysis is about and how it rules out alternative explanations to the IPS findings. The authors may wish to elaborate on the description in the text.

      (2) Out of curiosity (not badgering): The authors argued that the findings of Harrison et al. (2013) and Szinte et al. (2015) can be explained by feature integration between the currently attended location and its future, post-saccadic location. Couldn't the same argument apply in the current paradigm, where attention at the saccade target gets remapped to the pre-saccadic fovea (see also Rolfs et al., 2011 Fig 5), thus leading to the observed feature remapping?

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The main contributions of this paper are: (1) a replication of the surprising prior finding that information about peripherally-presented stimuli can be decoded from foveal V1 (Williams et al 2008), (2) a new demonstration of cross-decoding between stimuli presented in the periphery and stimuli presented at the fovea, (3) a demonstration that the information present in the fovea is based on shape not semantic category, and (4) a demonstration that the strength of foveal information about peripheral targets is correlated with the univariate response in the same block in IPS.

      Strengths:

      The design and methods appear sound, and finding (2) above is new, and importantly constrains our understanding of this surprising phenomenon. The basic effect investigated here is so surprising that even though it has been replicated several times since it was first reported in 2008, it is useful to replicate it again.

      We thank the reviewer for their summary. While we agree with many points, we would like to respectfully push back on the notion that this work is a replication of Williams et al. (2008). What our findings share with those of Williams is a report of surprising decoding at the fovea without foveal stimulation. Beyond this similarity, we treat these as related but clearly separate findings, for the following reasons:

      (1) Foveal feedback, as shown by Williams et al. (2008) and others during fixation, was only observed during a shape discrimination task, specific to the presented stimulus. Control experiments without such a task (or a color-related task) did not show effects of foveal feedback. In contrast, in the present study, the participants’ task was merely to perform saccades towards stimuli, independently of target features. We thus show that foveal feedback can occur independently of a task related to stimulus features. This dissociation demonstrates that our study must be tapping into something different than reported by Williams.

      (2) In a related study, Kroell and Rolfs (2022, 2025) demonstrated a connection between foveal feedback and saccade preparation, including the temporal details of the onset of this effect before saccade execution, highlighting the close link of this effect to saccade preparation. Here we used a very similar behavioral task to capture this saccade-related effect in neural recordings and investigate how early it occurs and what its nature is. Thus, there is a clear motivation for this study in the context of eye movement preparation that is separate from the previous work by Williams.

      (3) Lastly, decoding in the experimental task was positively associated with activity in FEF and IPS, areas that have been reliably linked to saccade preparation. We have now also performed an additional analysis (see our response to Specific point 2 of Reviewer 2) showing that decoding in the control condition did not show the same association, further supporting the link of foveal feedback to saccade preparation. 

      Despite our emphasis on these critical differences in studies, covert peripheral attention, as required by the task in Williams et al., and saccade preparation in natural vision, as in our study, are tightly coupled processes. Indeed, the task in Williams et al. would, during natural vision, likely involve an eye movement to the peripheral target. While speculative, a parsimonious and ecologically valid explanation is that both ours and earlier studies involve eye movement preparation, for which execution is suppressed, however, in studies enforcing fixation (e.g., Williams et al., 2008). We now discuss this idea of a shared underlying mechanism more extensively in the revised manuscript (pg 8 ln 228-240). 

      Weaknesses:

      (1) The paper, including in the title ("Feedback of peripheral saccade targets to early foveal cortex") seems to assume that the feedback to foveal cortex occurs in conjunction with saccade preparation. However, participants in the original Williams et al (2008) paper never made saccades to the peripheral stimuli. So, saccade preparation is not necessary for this effect to occur. Some acknowledgement and discussion of this prior evidence against the interpretation of the effect as due to saccade preparation would be useful. (e.g., one might argue that saccade preparation is automatic when attending to peripheral stimuli.)

      We agree that the effects Williams et al. showed were not sufficiently discussed in the first version of this manuscript. To more clearly engage with these findings we now introduce saccade related foveal feedback (foveal prediction) and foveal feedback during fixation separately in the introduction (pg 2 ln 46-59).

      We further added another section in the discussion called “Foveal feedback during saccade preparation” in which we discuss how our findings are related to Williams et al. and how they differ (pg 8 ln 211-240). 

      As described in our previous response, we believe that our findings go beyond those described by Williams et al. (2008) and others in significant ways. However, during natural vision, the paradigm used by Williams et al. (2008) would likely be solved using an eye movement. Thus, while participants in Williams et al. (2008) did not execute saccades, it appears plausible that they have prepared saccades. Given the fact that covert peripheral attention and saccade preparation are tightly coupled processes (Kowler et al., 1995, Vis Res; Deubel & Schneider, 1996, Vis Res; Montagnini & Castet, 2007, J Vis; Rolfs & Carrasco, 2012, J Neurosci; Rolfs et al., 2011, Nat Neurosci), their results are parsimoniously explained by saccade preparation (but not execution) to a behaviorally relevant target.

      (2) The most important new finding from this paper is the cross-decodability between stimuli presented in the fovea and stimuli presented in the periphery. This finding should be related to the prior behavioral finding (Yu & Shim, 2016) that when a foveal foil stimulus identical to a peripheral target is presented 150 ms after the onset of the peripheral target, visual discrimination of the peripheral target is improved, and this congruency effect occurred even though participants did not consciously perceive the foveal stimulus (Yu, Q., & Shim, W. M., 2016). Modulating foveal representation can influence visual discrimination in the periphery (Journal of Vision, 16(3), 15-15).

      We thank the reviewer for highlighting this highly relevant reference. In the revised version of the manuscript, we now put more emphasis on the finding of cross-decodability (pg 2 ln 60-61). We now also discuss Yu et al.’s finding, which support our conclusion that foveal feedback and direct stimulus presentation share representational formats in early visual areas (pg 9 ln 277-279).

      (3) The prior literature should be laid out more clearly. For example, most readers will not realize that the basic effect of decodability of peripherally-presented stimuli in the fovea was first reported in 2008, and that that original paper already showed that the effect cannot arise from spillover effects from peripheral retinotopic cortex because it was not present in a retinotopic location between the cortical locus corresponding to the peripheral target and the fovea. (For example, this claim on lines 56-57 is not correct: "it remains unknown 1) whether information is fed back all the way to early visual areas".) What is needed is a clear presentation of the prior findings in one place in the introduction to the paper, followed by an articulation and motivation of the new questions addressed in this paper. If I were writing the paper, I would focus on the cross-decodability between foveal and peripheral stimuli, as I think that is the most revealing finding.

      We agree that the structure of the introduction did not sufficiently place our work in the context of prior literature. We have now expanded upon our Introduction section to discuss past studies of saccade- and fixation-related foveal feedback (pg 2 ln 49-59), laying out how this effect has been studied previously. We also removed the claim that "it remains unknown 1) whether information is fed back all the way to early visual areas", where our intention was to specifically focus on foveal prediction. We realize that this was not clear and hence removed this section. Instead, we now place a stronger focus on the cross-decodability finding (pg 2 ln 60-61).

      Reviewer #2 (Public review):

      Summary:

      This study investigated whether the identity of a peripheral saccade target object is predictively fed back to the foveal retinotopic cortex during saccade preparation, a critical prediction of the foveal prediction hypothesis proposed by Kroell & Rolfs (2022). To achieve this, the authors leveraged a gaze-contingent fMRI paradigm, where the peripheral saccade target was removed before the eyes landed near it, and used multivariate decoding analysis to quantify identity information in the foveal cortex. The results showed that the identity of the saccade target object can be decoded based on foveal cortex activity, despite the fovea never directly viewing the object, and that the foveal feedback representation was similar to passive viewing and not explained by spillover effects. Additionally, exploratory analysis suggested IPS as a candidate region mediating such foveal decodability. Overall, these findings provide neural evidence for the foveal cortex processing the features of the saccade target object, potentially supporting the maintenance of perceptual stability across saccadic eye movements.

      Strengths:

      This study is well-motivated by previous theoretical findings (Kroell & Rolfs, 2022), aiming to provide neural evidence for a potential neural mechanism of trans-saccadic perceptual stability. The question is important, and the gaze-contingent fMRI paradigm is a solid methodological choice for the research goal. The use of stimuli allowing orthogonal decoding of stimulus category vs stimulus shape is a nice strength, and the resulting distinctions in decoded information by brain region are clean. The results will be of interest to readers in the field, and they fill in some untested questions regarding pre-saccadic remapping and foveal feedback.

      We thank the reviewer for the positive assessment of our study.

      Weaknesses:

      The conclusions feel a bit over-reaching; some strong theoretical claims are not fully supported, and the framing of prior literature is currently too narrow. A critical weakness lies in the inability to test a distinction between these findings (claiming to demonstrate that "feedback during saccade preparation must underlie this effect") and foveal feedback previously found during passive fixation (Williams et al., 2008). Discussions (and perhaps control analysis/experiments) about how these findings are specific to the saccade target and the temporal constraints on these effects are lacking. The relationship between the concepts of foveal prediction, foveal feedback, and predictive remapping needs more thorough treatment. The choice to use only 4 stimuli is justified in the manuscript, but remains an important limitation. The IPS results are intriguing but could be strengthened by additional control analysis. Finally, the manuscript claims the study was pre-registered ("detailing the hypotheses, methodology, and planned analyses prior to data collection"), but on the OSF link provided, there is just a brief summary paragraph, and the website says "there have been no completed registrations of this project".

      We thank the reviewer for these helpful considerations. We agree that some of the claims were not sufficiently supported by the evidence, and in the revised manuscript, we added nuance to those claims (pg 8 ln 211-240). Furthermore, we now address more directly the distinction between foveal feedback during fixation and foveal feedback (foveal prediction) during saccade preparation. In particular, we now describe the literature about these two effects separately in the introduction (pg 2 ln 46-59), and we have added a new section in the discussion (“Foveal feedback during saccade preparation”) that more thoroughly explains why a passive fixation condition would have been unlikely to produce the same results we find (pg 8 ln 211-227). We also adapted the section about “Saccadic remapping or foveal prediction”, clearly delineating foveal prediction from feature remapping and predictive updating of attention pointers. As recommended by the reviewer, we conducted the parametric modulation analyses on the control condition, strengthening the claim that our findings are saccade-related. These results were added as Supplementary Figure 2 and are discussed in (pg 7 ln 190-191) and (pg 8 ln 224-227). 

      Lastly, we would like to apologize about a mistake we made with the pre-registration. We realized that the pre-registration had indeed not been submitted. We have now done so without changing the pre-registration itself, which can be seen from the recent activity of the preregistration (screenshot attached in the end). After consulting an open science expert at the University of Leipzig, we added a note of this mistake to the methods section of the revised manuscript (pg 10 ln 326-332). We could remove reference to this preregistration altogether, but would keep it at the discretion of the editor. 

      Specifics:

      (1) In the eccentricity-dependent decoding results (Figure 2B), are there any statistical tests to support the results being a U-shaped curve? The dip isn't especially pronounced. Is 4 degrees lower than the further ones? Are there alternative methods of quantifying this (e.g., fitting it to a linear and quadratic function)?

      We statistically tested the U-shaped relationship using a weighted quadratic regression, which showed significant positive curvature for decoding between fovea and periphery in all early visual areas (V1: t(27) = 3.98, p = 0.008, V2: t(27) = 3.03, p = 0.02, V3: t(27)= 2.776, p = 0.025, one-sided). We now report these results in the revised manuscript (pg 5 ln 137-138).

      (2) In the parametric modulation analysis, the evidence for IPS being the only region showing stronger fovea vs peripheral beta values was weak, especially given the exploratory nature of this analysis. The raw beta value can reflect other things, such as global brain fluctuations or signal-to-noise ratio. I would also want to see the results of the same analysis performed on the control condition decoding results.

      We appreciate the reviewer’s suggestion and repeated the same parametric modulation analysis on the control condition to assess the influence of potential confounds on the overall beta values (Supplementary Figure 2). The results show a negative association between foveal decoding and FEF and IPS (likely because eye movements in the control condition lead to less foveal presentation of the stimulus) and a positive association with LO. Peripheral decoding was not associated with significant changes in any of the ROIs, indicating that global brain fluctuations alone are not responsible for the effects reported in the experimental condition. The results of this analysis thus show a specific positive association of IPS activity with the experimental condition, not the control condition, which is in line with the idea that the foveal feedback effect reported in this study may be related to saccade preparation.

      (3) Many of the claims feel overstated. There is an emphasis throughout the manuscript (including claims in the abstract) that these findings demonstrate foveal prediction, specifically that "image-specific feedback during saccade preparation must underlie this effect." To my understanding, one of the key aspects of the foveal prediction phenomenon that ties it closely to trans-saccadic stability is its specificity to the saccade target but not to other objects in the environment. However, it is not clear to what degree the observed findings are specific to saccade preparation and the peripheral saccade target. Should the observers be asked to make a saccade to another fixation location, or simply maintain passive fixation, will foveal retinotopic cortex similarly contain the object's identity information? Without these control conditions, the results are consistent with foveal prediction, but do not definitively demonstrate that as the cause, so claims need to be toned down.

      We fully agree with the reviewer and toned down claims about foveal prediction. We engage with the questions raised by the reviewer more thoroughly in the new discussion section “Foveal feedback during saccade preparation”.

      In addition, we agree that another condition in which subjects make a saccade towards a different location would have been a great addition that we also considered, but due to concerns with statistical power did not add. While including such a condition exceeds the scope of the current study, we included this limitation in the Discussion section (pg 10 ln 316) and hope that future studies will address this question.

      (4) Another critical aspect is the temporal locus of the feedback signal. In the paradigm, the authors ensured that the saccade target object was never foveated via the gaze-contingent procedure and a conservative data exclusion criterion, thus enabling the test of feedback signals to foveal retinotopic cortex. However, due to the temporal sluggishness of fMRI BOLD signals, it is unclear when the feedback signal arrives at the foveal retinotopic cortex. In other words, it is possible that the feedback signal arrives after the eyes land at the saccade target location. This possibility is also bolstered by Chambers et al. (2013)'s TMS study, where they found that TMS to the foveal cortex at 350-400 ms SOA interrupts the peripheral discrimination task. The authors should qualify their claims of the results occurring "during saccade preparation" (e.g., pg 1 ln 22) throughout the manuscript, and discuss the importance of temporal dynamics of the effect in supporting stability across saccades.

      We fully agree that the sluggishness of the fMRI signal presents an important challenge in investigating foveal feedback. We have now included this limitation in the discussion (pg 10 ln 306-318). We also clarify that our argument connects to previous studies investigating the temporal dynamics of foveal feedback using similar tasks (pg 10 ln 313-316). Specifically, in their psychophysical work, Kroell and Rolfs (2022) and (2025) showed that foveal feedback occurs before saccade execution with a peak around 80 ms before the eye movement. 

      (5) Relatedly, the claims that result in this paradigm reflect "activity exclusively related to predictive feedback" and "must originate from predictive rather than direct visual processes" (e.g., lines 60-65 and throughout) need to be toned down. The experimental design nicely rules out direct visual foveal stimulation, but predictive feedback is not the only alternative to that. The activation could also reflect mental imagery, visual working memory, attention, etc. Importantly, the experiment uses a block design, where the same exact image is presented multiple times over the block, and the activation is taken for the block as a whole. Thus, while at no point was the image presented at the fovea, there could still be more going on than temporally-specific and saccade-specific predictive feedback.

      We agree that those claims could have misled the reader. Our intention was to state that the activation originates from feedback rather than direct foveal stimulation because of the nature of the design. We have now clarified these statements (pg 2 ln 65) and also included a discussion of other effects including imagery and working memory in the limitations section (pg 10 ln 306-313).

      (6) The authors should avoid using the terms foveal feedback and foveal prediction interchangeably. To me, foveal feedback refers to the findings of Williams et al. (2008), where participants maintained passive fixation and discriminated objects in the periphery (see also Fan et al., 2016), whereas foveal prediction refers to the neural mechanism hypothesized by Kroell & Rolfs (2022), occurring before a saccade to the target object and contains task irrelevant feature information.

      We agree, and we have now adopted a clearer distinction between these terms, referring to foveal prediction only when discussing the distinct predictive nature of the effect discovered by Kroell and Rolfs (2022). Otherwise we referred to this effect as foveal feedback.

      (7) More broadly, the treatment of how foveal prediction relates to saccadic remapping is overly simplistic. The authors seem to be taking the perspective that remapping is an attentional phenomenon marked by remapping of only attentional/spatial pointers, but this is not the classic or widely accepted definition of remapping. Within the field of saccadic remapping, it is an ongoing debate whether (/how/where/when) information about stimulus content is remapped alongside spatial location (and also whether the attentional pointer concept is even neurophysiologically viable). This relationship between saccadic remapping and foveal prediction needs clarification and deeper treatment, in both the introduction and discussion.

      We thank the reviewer for their remarks. We reformulated the discussion section on “Saccadic remapping or foveal prediction” to include the nuances about spatial and feature remapping laid out in the reviewer’s comment (pg 8-9 ln 241-269). We also put a stronger focus on the special role the fovea seems to be playing regarding the feedback of visual features (pg 8-9 ln 265-269).

      (8) As part of this enhanced discussion, the findings should be better integrated with prior studies. E.g., there is some evidence for predictive remapping inducing integration of non-spatial features (some by the authors themselves; Harrison et al., 2013; Szinte et al., 2015). How do these findings relate to the observed results? Can the results simply be a special case of non-spatial feature integration between the currently attended and remapped location (fovea)? How are the results different from neurophysiological evidence for facilitation of the saccade target object's feature across the visual field (Burrow et al., 2014)? How might the results be reconciled with a prior fMRI study that failed to find decoding of stimulus content in remapped responses (Lescroart et al, 2016)? Might this reflect a difference between peripheral-to-peripheral vs peripheral-to-foveal remapping? A recent study by Chiu & Golomb (2025) provided supporting evidence for peripheral-to-fovea remapping (but not peripheral-to-peripheral remapping) of object-location binding (though in the post-saccadic time window), and suggested foveal prediction as the underlying mechanism.

      We thank the reviewer for raising these intriguing questions. We now address them in the revised discussion. We argue that the findings by Harrison et al., 2013 and Szinte et al., 2015 of presaccadic integration of features across two peripheral locations can be explained by presaccadic updating of spatial attention pointers rather than remapping of feature information (pg 8 ln 248-253). The lack of evidence for periphery-to-periphery remapping (Lescroart et al, 2016) and the recent study by Chiu & Golomb (2025) showing object-location binding from periphery to fovea nicely align with our characterization of foveal processing as unique in predicting feature information of upcoming stimuli (pg 8-9 ln 265-269). Finally, we argue that the global (i.e., space-invariant) selection task-irrelevant saccadic target features (Burrows et al., 2014) is well-established at the neural level, but does not suffice to explain the spatially specific nature of foveal prediction (pg 8 ln 220-224). We now include these studies in the revised discussion section.

      Reviewer #3 (Public review):

      Summary:

      In this paper, the authors used fMRI to determine whether peripherally viewed objects could be decoded from the foveal cortex, even when the objects themselves were never viewed foveally. Specifically, they investigated whether pre-saccadic target attributes (shape, semantic category) could be decoded from the foveal cortex. They found that object shape, but not semantic category, could be decoded, providing evidence that foveal feedback relies on low-mid-level information. The authors claim that this provides evidence for a mechanism underlying visual stability and object recognition across saccades.

      Strengths:

      I think this is another nice demonstration that peripheral information can be decoded from / is processed in the foveal cortex - the methods seem appropriate, and the experiments and analyses are carefully conducted, and the main results seem convincing. The paper itself was very clear and well-written.

      We thank the reviewer for this positive evaluation of our work. As discussed in our response to Reviewer 1, we now elaborate on the differences between previous work showing decoding of peripheral information from foveal cortex from the effect shown here. While there are important similarities between these findings, foveal prediction in our study occurs in a saccade condition and in the absence of a task that is specific to stimulus features. 

      Weaknesses:

      There are a couple of reasons why I think the main theoretical conclusions drawn from the study might not be supported, and why a more thorough investigation might be needed to draw these conclusions.

      (1) The authors used a blocked design, with each object being shown repeatedly in the same block. This meant that the stimulus was entirely predictable on each block, which weakens the authors' claims about this being a predictive mechanism that facilitates object recognition - if the stimulus is 100% predictable, there is no aspect of recognition or discrimination actually being tested. I think to strengthen these claims, an experiment would need to have unpredictable stimuli, and potentially combine behavioural reports with decoding to see whether this mechanism can be linked to facilitating object recognition across saccades.

      We appreciate the reviewer’s point and would like to highlight that it was not our intention to claim a behavioral effect on object recognition. We believe that an ambiguous formulation in the original abstract may have been interpreted this way, and we thus removed this reference. We also speculated in our Discussion that a potential reason for foveal prediction could be a headstart in peripheral object recognition and in the revised manuscript more clearly highlight that this is a  potential future direction only.

      (2)  Given that foveal feedback has been found in previous studies that don't incorporate saccades, how is this a mechanism that might specifically contribute to stability across saccades, rather than just being a general mechanism that aids the processing/discrimination of peripherally-viewed stimuli? I don't think this paper addresses this point, which would seem to be crucial to differentiate the results from those of previous studies.

      We fully agree that this point had not been sufficiently addressed in the previous version of the manuscript. As described in our responses to similar comments from reviewers 1 and 2, we included an additional section in the Discussion (“Foveal feedback during saccade preparation”) to more clearly delineate the present study from previous findings of foveal feedback. Previous studies (Williams et al., 2008) only found foveal feedback during narrow discrimination tasks related to spatial features of the target stimulus, not during color-discrimination or fixation-only tasks, concluding that the observed effect must be related to the discrimination behavior. In contrast, we found foveal feedback (as evidenced by decoding of target features) during a saccade condition that was independent of the target features, suggesting a different role of foveal feedback than hypothesized by Williams et al. (2008).

      Recommendations for the authors:  

      Reviewer #2 (Recommendations for the authors):

      (A) Minor comments:

      (1)  The task should be clarified earlier in the manuscript.

      We now characterise the task in the abstract and clarified its description in the third paragraph, right after introducing the main literature.

      (2) Is there actually only 0.5 seconds between saccades? This feels very short/rushed.

      The inter-trial-interval was 0.5 seconds, though effectively it varied because the target only appeared once participants fixated on the fixation dot. Note that this pacing is slower than the rate of saccades in natural vision (about 3 to 4 saccades per second).Participants did not report this paradigm as rushed.

      (3) Typo on pg2 ln64 (whooe).

      Fixed.

      (4)  Can the authors also show individual data points for Figures 3 and 4?

      We added individual data points for Figures 4 and S2

      (5) The MNI coordinates on Figure 4A seem to be incorrect.

      We took out those coordinates.

      (6) Pg4 ln126 and pg6 ln194, why cite Williams et al. (2008)?

      We included this reference here to acknowledge that Williams et al. raised the same issues. We added a “cf.” before this reference to clarify this.

      (7) Pg7 ln207 Fabius et al. (2020) showed slow post-saccadic feature remapping, rather than predictive remapping of spatial attention.

      We have corrected this mistake.

      (8) The OSF link is valid, but I couldn't find a pre-registration.

      The issue with the OSF link has been resolved. The pre-registration had been set up but not published. We now published it without changing the original pre-registration (see the screenshot attached).

      (9) I couldn't access the OpenNeuro repository.

      The issue with the OpenNeuro link has been resolved.

      (B) Additional references you may wish to include:

      (1) Burrows, B. E., Zirnsak, M., Akhlaghpour, H., Wang, M., & Moore, T.  (2014). Global selection of saccadic target features by neurons in area v4. Journal of Neuroscience.

      (2) Chambers, C. D., Allen, C. P., Maizey, L., & Williams, M. A. (2013). Is delayed foveal feedback critical for extra-foveal perception?. Cortex.

      (3) Chiu, T. Y., & Golomb, J. D. (2025). The influence of saccade target status on the reference frame of object-location binding. Journal of Experimental Psychology. General.

      (4) Harrison, W. J., Retell, J. D., Remington, R. W., & Mattingley, J. B. (2013). Visual crowding at a distance during predictive remapping. Current Biology.

      (5) Lescroart, M. D., Kanwisher, N., & Golomb, J. D. (2016). No evidence for automatic remapping of stimulus features or location found with fMRI. Frontiers in Systems Neuroscience.

      (6) Moran, C., Johnson, P. A., Hogendoorn, H., & Landau, A. N. (2025). The representation of stimulus features during stable fixation and active vision. Journal of Neuroscience.

      (7) Szinte, M., Jonikaitis, D., Rolfs, M., Cavanagh, P., & Deubel, H. (2016). Presaccadic motion integration between current and future retinotopic locations of attended objects. Journal of Neurophysiology.

      We thank the reviewer for pointing out these references. We have included them in the revised version of the manuscript.

      Reviewer #3 (Recommendations for the authors):

      I just have a few minor points where I think some clarifications could be made.

      (1) Line 64 - "whooe" should be "whoose" I think.

      Fixed.

      (2) Around line 53 - you might consider citing this review on foveal feedback - https://doi.org/10.1167/jov.20.12.2

      We included the reference (pg 2 ln 55).

      (3) Line 129 - you mention a u-shaped relationship for decoding - I wasn't quite sure of the significance/relevance of this relationship - it would be helpful to expand on this / clarify what this means.

      We have expanded this section and added statistical tests of the u-shaped relationship in decoding using a weighted quadratic regression. We found significant positive curvature in all early visual areas between fovea and periphery (V1: t(27) = 3.98, p = 0.008, V2: t(27) = 3.03, p = 0.02, V3: t(27)= 2.776, p = 0.025). These findings support a u-shaped relationship. We now report these results in the revised manuscript (pg 5 ln 137-138).

      (4) Figure 1 - it would be helpful to indicate how long the target was viewed in the "stim on" panels - I assume it was for the saccade latency, but it would be good to include those values in the main text.

      We included that detail in the text (pg 3 ln 96-97).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1

      (1) Related to comment 3, related to the spatial communication section, either provide a clearer worked example or adjust the framing to avoid implying a more developed capability than is shown.

      We appreciate the reviewer’s feedback regarding the framing of the spatial communication section. We have removed this section from the revised version.

      (2) Related to comment 4 about resolution, consider including explicit numerical estimates of spatial resolution (e.g., median patch diameter in micrometers) for at least one dataset to help users understand practical mapping granularity.

      We appreciate the suggestion. We have added explicit numerical estimates of spatial resolution to clarify our mappings. Specifically, we now (i) define “patch” precisely and (ii) report the median patch diameter (in µm) for representative datasets:

      10x Visium (mouse cortex): spot diameter = 55 µm; center-to-center spacing = 100 µm.

      Slide-seqV2 (mouse brain): bead diameter ≈ 10 µm. When we optionally coarse-grain to 5×5 bead tiles for robustness, the effective patch diameter is ~50 µm

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examines whether changes in pupil size index prediction-error-related updating during associative learning, formalised as information gain via Kullback-Leibler (KL) divergence. Across two independent tasks, pupil responses scaled with KL divergence shortly after feedback, with the timing and direction of the response varying by task. Overall, the work supports the view that pupil size reflects information-theoretic processes in a context-dependent manner.

      Strengths:

      This study provides a novel and convincing contribution by linking pupil dilation to informationtheoretic measures, such as KL divergence, supporting Zénon's hypothesis that pupil responses reflect information gain during learning. The robust methodology, including two independent datasets with distinct task structures, enhances the reliability and generalisability of the findings. By carefully analysing early and late time windows, the authors capture the timing and direction of prediction-error-related responses, oPering new insights into the temporal dynamics of model updating. The use of an ideal-learner framework to quantify prediction errors, surprise, and uncertainty provides a principled account of the computational processes underlying pupil responses. The work also highlights the critical role of task context in shaping the direction and magnitude of these ePects, revealing the adaptability of predictive processing mechanisms. Importantly, the conclusions are supported by rigorous control analyses and preprocessing sanity checks, as well as convergent results from frequentist and Bayesian linear mixed-ePects modelling approaches.

      Weaknesses:

      Some aspects of directionality remain context-dependent, and on current evidence cannot be attributed specifically to whether average uncertainty increases or decreases across trials. DiPerences between the two tasks (e.g., sensory modality and learning regime) limit direct comparisons of ePect direction and make mechanistic attribution cautious. In addition, subjective factors such as confidence were not measured and could influence both predictionerror signals and pupil responses. Importantly, the authors explicitly acknowledge these limitations, and the manuscript clearly frames them as areas for future work rather than settled conclusions.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate whether pupil dilation reflects information gain during associative learning, formalised as Kullback-Leibler divergence within an ideal observer framework. They examine pupil responses in a late time window after feedback and compare these to informationtheoretic estimates (information gain, surprise, and entropy) derived from two diPerent tasks with contrasting uncertainty dynamics.

      Strength:

      The exploration of task evoked pupil dynamics beyond the immediate response/feedback period and then associating them with model estimates was interesting and inspiring. This oPered a new perspective on the relationship between pupil dilation and information processing.

      Weakness:

      However, the interpretability of the findings remains constrained by the fundamental diPerences between the two tasks (stimulus modality, feedback type, and learning structure), which confound the claimed context-dependent ePects. The later time-window pupil ePects, although intriguing, are small in magnitude and may reflect residual noise or task-specific arousal fluctuations rather than distinct information-processing signals. Thus, while the study oPers valuable methodological insight and contributes to ongoing debates about the role of the pupil in cognitive inference, its conclusions about the functional significance of late pupil responses should be treated with caution.

      Reviewer #3 (Public review):

      Summary:

      Thank you for inviting me to review this manuscript entitled "Pupil dilation oPers a time-window on prediction error" by Colizoli and colleagues. The study examines prediction errors, information gain (Kullback-Leibler [KL] divergence), and uncertainty (entropy) from an information-theory perspective using two experimental tasks and pupillometry. The authors aim to test a theoretical proposal by Zénon (2019) that the pupil response reflects information gain (KL divergence). The conclusion of this work is that (post-feedback) pupil dilation in response to information gain is context dependent.

      Strengths:

      Use of an established Bayesian model to compute KL divergence and entropy.

      Pupillometry data preprocessing and multiple robustness checks.

      Weaknesses:

      Operationalization of prediction errors based on frequency, accuracy, and their interaction:

      The authors rely on a more model-agnostic definition of the prediction error in terms of stimulus frequency ("unsigned prediction error"), accuracy, and their interaction ("signed prediction error"). While I see the point, I would argue that this approach provides a simple approximation of the prediction error, but that a model-based approach would be more appropriate.

      Model validation:

      My impression is that the ideal learner model should work well in this case. However, the authors don't directly compare model behavior to participant behavior ("posterior predictive checks") to validate the model. Therefore, it is currently unclear if the model-derived terms like KL divergence and entropy provide reasonable estimates for the participant data.

      Lack of a clear conclusion:

      The authors conclude that this study shows for the first time that (post-feedback) pupil dilation in response to information gain is context dependent. However, the study does not oPer a unifying explanation for such context dependence. The discussion is quite detailed with respect to taskspecific ePects, but fails to provide an overarching perspective on the context-dependent nature of pupil signatures of information gain. This seems to be partly due to the strong diPerences between the experimental tasks.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I highly appreciate the care and detail in the authors' response and thank them for the ePort invested in revising the manuscript. They addressed the core concerns to a high standard, and the manuscript has substantially improved in methodological rigour (through additional controls/sanity checks and complementary mixed-ePects analyses) and in clarity of interpretation (by explicitly acknowledging context-dependence and tempering stronger claims). The present version reads clearly and is much strengthened overall. I only have a few minor points below:

      Minor suggestions:

      Abstract:

      In the abstract KL is introduced as abbreviation, but at first occurence it should be written out as "Kullback-Leibler (KL)" for readers not familiar with it.

      We thank the reviewer for catching this error. It has been correct in the version of record.

      Methods:

      I appreciate the additional bayesian LME analysis. I only had a few things that I thought were missing from knowing the parameters: 1) what was the target acceptance rate (default of .95?), 2) which family was used to model the response distribution: (default) "gaussian" or robust "student-t"? Depending on the data a student-t would be preferred, but since the author's checked the fit & the results corroborate the correlation analysis, using the default would also be fine! Just add the information for completeness.

      Thank you for bringing this to our attention. We have now noted that default parameters were used in all cases unless otherwise mentioned. 

      Thank you once again for your time and consideration.

      Reviewer #2 (Recommendations for the authors):

      Thanks to the authors' ePort on revision. I am happy with this new version of manuscript.

      Thank you once again for your time and consideration.

      Reviewer #3 (Recommendations for the authors):

      (1) Regarding comments #3 and #6 (first round) on model validation and posterior predictive checks, the authors replied that since their model is not a "generative" one, they can't perform posterior predictive checks. Crucially, in eq. 2, the authors present the p{tilde}^j_k variable denoting the learned probability of event k on trial j. I don't see why this can't be exploited for simulations. In my opinion, one could (and should) generate predictions based on this variable. The simplest implementation would translate the probability into a categorical choice (w/o fitting any free parameter). Based on this, they could assess whether the model and data are comparable.

      We thank the reviewer for this clarification. The reviewer suggests using the probability distributions at each trial to predict which event should be chosen on each trial. More specifically, the event(s) with the highest probability on trial j could be used to generate a prediction for the choice of the participant on trial j. We agree that this would indeed be an interesting analysis. However, the response options of each task are limited to two-alternatives. In the cue-target task, four events are modeled (representing all possible cue-target conditions) while the participants’ response options are only “left” and “right”. Similarly, in the letter-color task, 36 events are modeled while the participants’ response options are “match” and “no-match”. In other words, we do not know which event (either four or 36, for the two tasks) the participant would have indicated on each trial. As an approximation to this fine-grained analysis, we investigated the relationship between the information-theoretic variables separately for error and correct trials. Our rationale was that we would have more insight into how the model fits depended on the participants’ actual behavior as compared with the ideal learner model.

      (2) I recommend providing a plot of the linear mixed model analysis of the pupil data. Currently, results are only presented in the text and tables, but a figure would be much more useful.

      We thank the reviewer for the suggestion to add a plot of the linear mixed model results. We appreciate the value of visualizing model estimates; however, we feel that the current presentation in the text and tables clearly conveys the relevant findings. For this reason, and to avoid further lengthening the manuscript, we prefer to retain the current format.

      (3) I would consider only presenting the linear mixed ePects for the pupil data in the main results, and the correlation results in the supplement. It is currently quite long.

      We thank the reviewer for this recommendation. We agree that the results section is detailed; however, we consider the correlation analyses to be integral to the interpretation of the pupil data and therefore prefer to keep them in the main text rather than move them to the supplement.


      The following is the authors’ response to the original reviews

      eLife Assessment

      This important study seeks to examine the relationship between pupil size and information gain, showing opposite effects dependent upon whether the average uncertainty increases or decreases across trials. Given the broad implications for learning and perception, the findings will be of broad interest to researchers in cognitive neuroscience, decision-making, and computational modelling. Nevertheless, the evidence in support of the particular conclusion is at present incomplete - the conclusions would be strengthened if the authors could both clarify the differences between model-updating and prediction error in their account and clarify the patterns in the data.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates whether pupil dilation reflects prediction error signals during associative learning, defined formally by Kullback-Leibler (KL) divergence, an information-theoretic measure of information gain. Two independent tasks with different entropy dynamics (decreasing and increasing uncertainty) were analyzed: the cue-target 2AFC task and the lettercolor 2AFC task. Results revealed that pupil responses scaled with KL divergence shortly after feedback onset, but the direction of this relationship depended on whether uncertainty (entropy) increased or decreased across trials. Furthermore, signed prediction errors (interaction between frequency and accuracy) emerged at different time windows across tasks, suggesting taskspecific temporal components of model updating. Overall, the findings highlight that pupil dilation reflects information-theoretic processes in a complex, context-dependent manner.

      Strengths:

      This study provides a novel and convincing contribution by linking pupil dilation to informationtheoretic measures, such as KL divergence, supporting Zénon's hypothesis that pupil responses reflect information gained during learning. The robust methodology, including two independent datasets with distinct entropy dynamics, enhances the reliability and generalisability of the findings. By carefully analysing early and late time windows, the authors capture the temporal dynamics of prediction error signals, offering new insights into the timing of model updates. The use of an ideal learner model to quantify prediction errors, surprise, and entropy provides a principled framework for understanding the computational processes underlying pupil responses. Furthermore, the study highlights the critical role of task context - specifically increasing versus decreasing entropy - in shaping the directionality and magnitude of these effects, revealing the adaptability of predictive processing mechanisms.

      Weaknesses:

      While this study offers important insights, several limitations remain. The two tasks differ significantly in design (e.g., sensory modality and learning type), complicating direct comparisons and limiting the interpretation of differences in pupil dynamics. Importantly, the apparent context-dependent reversal between pupil constriction and dilation in response to feedback raises concerns about how these opposing effects might confound the observed correlations with KL divergence. 

      We agree with the reviewer’s concerns and acknowledge that the speculation concerning the directional effect of entropy across trials can not be fully substantiated by the current study. As the reviewer points out, the directional relationship between pupil dilation and information gain must be due to other factors, for instance, the sensory modality, learning type, or the reversal between pupil constriction and dilation across the two tasks. Also, we would like to note that ongoing experiments in our lab already contradict our original speculation. In line with the reviewer’s point, we noted these differences in the section on “Limitations and future research” in the Discussion. To better align the manuscript with the above mentioned points, we have made several changes in the Abstract, Introduction and Discussion summarized below: 

      We have removed the following text from the Abstract and Introduction: “…, specifically related to increasing or decreasing average uncertainty (entropy) across trials.”

      We have edited the following text in the Introduction (changes in italics) (p. 5):

      “We analyzed two independent datasets featuring distinct associative learning paradigms, one characterized by increasing entropy and the other by decreasing entropy as the tasks progressed. By examining these different tasks, we aimed to identify commonalities (if any) in the results across varying contexts. Additionally, the contrasting directions of entropy in the two tasks enabled us to disentangle the correlation between stimulus-pair frequency and information gain in the postfeedback pupil response.

      We have removed the following text from the Discussion:

      “…and information gain in fact seems to be driven by increased uncertainty.”

      “We speculate that this difference in the direction of scaling between information gain and the pupil response may depend on whether entropy was increasing or decreasing across trials.” 

      “…which could explain the opposite direction of the relationship between pupil dilation and information gain”

      “… and seems to relate to the direction of the entropy as learning progresses (i.e., either increasing or decreasing average uncertainty).” 

      We have edited the following texts in the Discussion (changes in italics):

      “For the first time, we show that the direction of the relationship between postfeedback pupil dilation and information gain (defined as KL divergence) was context dependent.” (p. 29):

      Finally, we have added the following correction to the Discussion (p. 30):

      “Although it is tempting to speculate that the direction of the relationship between pupil dilation and information gain may be due to either increasing or decreasing entropy as the task progressed, we must refrain from this conclusion. We note that the two tasks differ substantially in terms of design with other confounding variables and therefore cannot be directly compared to one another. We expand on these limitations in the section below (see Limitations and future research).”

      Finally, subjective factors such as participants' confidence and internal belief states were not measured, despite their potential influence on prediction errors and pupil responses.

      Thank you for the thoughtful comment. We agree with the reviewer that subjective factors, such as participants' confidence, can be important in understanding prediction errors and pupil responses. As per the reviewer’s point, we have included the following limitation in the Discussion (p. 33): 

      “Finally, while we acknowledge the potential relevance of subjective factors, such as the participants’ overt confidence reports, in understanding prediction errors and pupil responses, the current study focused on the more objective, model-driven measure of information-theoretic variables. This approach aligns with our use of the ideal learner model, which estimates information-theoretic variables while being agnostic about the observer's subjective experience itself. Future research is needed to explore the relationship between information-gain signals in pupil dilation and the observer’s reported experience of or awareness about confidence in their decisions.” 

      Reviewer #2 (Public review):

      Summary:

      The authors proposed that variability in post-feedback pupillary responses during the associative learning tasks can be explained by information gain, which is measured as KL divergence. They analysed pupil responses in a later time window (2.5s-3s after feedback onset) and correlated them with information-theory-based estimates from an ideal learner model (i.e., information gain-KL divergence, surprise-subjective probability, and entropy-average uncertainty) in two different associative decision-making tasks.

      Strength:

      The exploration of task-evoked pupil dynamics beyond the immediate response/feedback period and then associating them with model estimates was interesting and inspiring. This offered a new perspective on the relationship between pupil dilation and information processing.

      Weakness:

      However, disentangling these later effects from noise needs caution. Noise in pupillometry can arise from variations in stimuli and task engagement, as well as artefacts from earlier pupil dynamics. The increasing variance in the time series of pupillary responses (e.g., as shown in Figure 2D) highlights this concern.

      It's also unclear what this complicated association between information gain and pupil dynamics actually means. The complexity of the two different tasks reported made the interpretation more difficult in the present manuscript.

      We share the reviewer’s concerns. To make this point come across more clearly, we have added the following text to the Introduction (p. 5):

      “The current study was motivated by Zenon’s hypothesis concerning the relationship between pupil dilation and information gain, particularly in light of the varying sources of signal and noise introduced by task context and pupil dynamics. By demonstrating how task context can influence which signals are reflected in pupil dilation, and highlighting the importance of considering their temporal dynamics, we aim to promote a more nuanced and model-driven approach to cognitive research using pupillometry.”

      Reviewer #3 (Public review):

      Summary:

      This study examines prediction errors, information gain (Kullback-Leibler [KL] divergence), and uncertainty (entropy) from an information-theory perspective using two experimental tasks and pupillometry. The authors aim to test a theoretical proposal by Zénon (2019) that the pupil response reflects information gain (KL divergence). In particular, the study defines the prediction error in terms of KL divergence and speculates that changes in pupil size associated with KL divergence depend on entropy. Moreover, the authors examine the temporal characteristics of pupil correlates of prediction errors, which differed considerably across previous studies that employed different experimental paradigms. In my opinion, the study does not achieve these aims due to several methodological and theoretical issues.

      Strengths:

      (1)  Use of an established Bayesian model to compute KL divergence and entropy.

      (2)  Pupillometry data preprocessing, including deconvolution.

      Weaknesses:

      (1) Definition of the prediction error in terms of KL divergence:

      I'm concerned about the authors' theoretical assumption that the prediction error is defined in terms of KL divergence. The authors primarily refer to a review article by Zénon (2019): "Eye pupil signals information gain". It is my understanding that Zénon argues that KL divergence quantifies the update of a belief, not the prediction error: "In short, updates of the brain's internal model, quantified formally as the Kullback-Leibler (KL) divergence between prior and posterior beliefs, would be the common denominator to all these instances of pupillary dilation to cognition." (Zénon, 2019).

      From my perspective, the update differs from the prediction error. Prediction error refers to the difference between outcome and expectation, while update refers to the difference between the prior and the posterior. The prediction error can drive the update, but the update is typically smaller, for example, because the prediction error is weighted by the learning rate to compute the update. My interpretation of Zénon (2019) is that they explicitly argue that KL divergence defines the update in terms of the described difference between prior and posterior, not the prediction error.

      The authors also cite a few other papers, including Friston (2010), where I also could not find a definition of the prediction error in terms of KL divergence. For example [KL divergence:] "A non-commutative measure of the non-negative difference between two probability distributions." Similarly, Friston (2010) states: Bayesian Surprise - "A measure of salience based on the Kullback-Leibler divergence between the recognition density (which encodes posterior beliefs) and the prior density. It measures the information that can be recognized in the data." Finally, also in O'Reilly (2013), KL divergence is used to define the update of the internal model, not the prediction error.

      The authors seem to mix up this common definition of the model update in terms of KL divergence and their definition of prediction error along the same lines. For example, on page 4: "KL divergence is a measure of the difference between two probability distributions. In the context of predictive processing, KL divergence can be used to quantify the mismatch between the probability distributions corresponding to the brain's expectations about incoming sensory input and the actual sensory input received, in other words, the prediction error (Friston, 2010; Spratling, 2017)."

      Similarly (page 23): "In the current study, we investigated whether the pupil's response to decision outcome (i.e., feedback) in the context of associative learning reflects a prediction error as defined by KL divergence."

      This is problematic because the results might actually have limited implications for the authors' main perspective (i.e., that the pupil encodes prediction errors) and could be better interpreted in terms of model updating. In my opinion, there are two potential ways to deal with this issue:

      (a) Cite work that unambiguously supports the perspective that it is reasonable to define the prediction error in terms of KL divergence and that this has a link to pupillometry. In this case, it would be necessary to clearly explain the definition of the prediction error in terms of KL divergence and dissociate it from the definition in terms of model updating.

      (b) If there is no prior work supporting the authors' current perspective on the prediction error, it might be necessary to revise the entire paper substantially and focus on the definition in terms of model updating.

      We thank the reviewer for pointy out these inconsistencies in the manuscript and appreciate their suggestions for improvement. We take approach (a) recommended by the reviewer, and provide our reasoning as to why prediction error signals in pupil dilation are expected to correlate with information gain (defined as the KL divergence between posterior and prior belief distributions). This can be found in a new section in the introduction, copied here for convenience (p. 3-4):

      “We reasoned that the link between prediction error signals and information gain in pupil dilation is through precision-weighting. Precision refers to the amount of uncertainty (inverse variance) of both the prior belief and sensory input in the prediction error signals [6,64–67]. More precise prediction errors receive more weighting, and therefore, have greater influence on model updating processes. The precisionweighting of prediction error signals may provide a mechanism for distinguishing between known and unknown sources of uncertainty, related to the inherent stochastic nature of a signal versus insufficient information of the part of the observer, respectively [65,67,68]. In Bayesian frameworks, information gain is fundamentally linked to prediction error, modulated by precision [65,66,69–75]. In non-hierarchical Bayesian models, information gain can be derived as a function of prediction errors and the precision of the prior and likelihood distributions, a relationship that can be approximately linear [70]. In hierarchical Bayesian inference, the update in beliefs (posterior mean changes) at each level is proportional to the precision-weighted prediction error; this update encodes the information gained from new observations [65,66,69,71,72]. Neuromodulatory arousal systems are well-situated to act as precision-weighting mechanisms in line with predictive processing frameworks [76,77]. Empirical evidence suggests that neuromodulatory systems broadcast precisionweighted prediction errors to cortical regions [11,59,66,78]. Therefore, the hypothesis that feedback-locked pupil dilation reflects a prediction error signal is similarly in line with Zenon’s main claim that pupil dilation generally reflects information gain, through precision-weighting of the prediction error. We expected a prediction error signal in pupil dilation to be proportional to the information gain.”

      We have referenced previous work that has linked prediction error and information gain directly (p. 4): “The KL divergence between posterior and prior belief distributions has been previously considered to be a proxy of (precision-weighted) prediction errors [68,72].”

      We have taken the following steps to remedy this error of equating “prediction error” directly with the information gain.

      First, we have replaced “KL divergence” with “information gain” whenever possible throughout the manuscript for greater clarity. 

      Second, we have edited the section in the introduction defining information gain substantially (p. 4): 

      “Information gain can be operationalized within information theory as the KullbackLeibler (KL) divergence between the posterior and prior belief distributions of a Bayesian observer, representing a formalized quantity that is used to update internal models [29,79,80]. Itti and Baldi (2005)81 termed the KL divergence between posterior and prior belief distributions as “Bayesian surprise” and showed a link to the allocation of attention. The KL divergence between posterior and prior belief distributions has been previously considered to be a proxy of (precision-weighted) prediction errors[68,72]. According to Zénon’s hypothesis, if pupil dilation reflects information gain during the observation of an outcome event, such as feedback on decision accuracy, then pupil size will be expected to increase in proportion to how much novel sensory evidence is used to update current beliefs [29,63]. ” 

      Finally, we have made several minor textual edits to the Abstract and main text wherever possible to further clarify the proposed relationship between prediction errors and information gain.

      (2) Operationalization of prediction errors based on frequency, accuracy, and their interaction:

      The authors also rely on a more model-agnostic definition of the prediction error in terms of stimulus frequency ("unsigned prediction error"), accuracy, and their interaction ("signed prediction error"). While I see the point here, I would argue that this approach offers a simple approximation to the prediction error, but it is possible that factors like difficulty and effort can influence the pupil signal at the same time, which the current approach does not take into account. I recommend computing prediction errors (defined in terms of the difference between outcome and expectation) based on a simple reinforcement-learning model and analyzing the data using a pupillometry regression model in which nuisance regressors are controlled, and results are corrected for multiple comparisons.

      We agree with the reviewer’s suggestion that alternatively modeling the data in a reinforcement learning paradigm would be fruitful. We adopted the ideal learner model as we were primarily focused on Information Theory, stemming from our aim to test Zenon’s hypothesis that information gain drives pupil dilation. However, we agree with the reviewer that it is worthwhile to pursue different modeling approaches in future work. We have now included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times (explained in more detail below in our response to your point #4). Results including correction for multiple comparisons was reported for all pupil time course data as detailed in Methods section 2.5. 

      (3) The link between model-based (KL divergence) and model-agnostic (frequency- and accuracy-based) prediction errors:

      I was expecting a validation analysis showing that KL divergence and model-agnostic prediction errors are correlated (in the behavioral data). This would be useful to validate the theoretical assumptions empirically.

      The model limitations and the operalization of prediction error in terms of post-feedback processing do not seem to allow for a comparison of information gain and model-agnostic prediction errors in the behavioral data for the following reasons. First, the simple ideal learner model used here is not a generative model, and therefore, cannot replicate or simulate the participants responses (see also our response to your point #6 “model validation” below). Second, the behavioral dependent variables obtained are accuracy and reaction times, which both occur before feedback presentation. While accuracy and reaction times can serve as a marker of the participant’s (statistical) confidence/uncertainty following the decision interval, these behavioral measures cannot provide access to post-feedback information processing. The pupil dilation is of interest to us because the peripheral arousal system is able to provide a marker of post-feedback processing. Through the analysis presented in Figure 3, we indeed aimed to make the comparison of the model-based information gain to the model-agnostic prediction errors via the proxy variable of post-feedback pupil dilation instead of behavioral variables. To bridge the gap between the “behaviorally agnostic” model parameters and the actual performance of the participants, we examined the relationship between the model-based information gain and the post-feedback pupil dilation separately for error and correct trials as shown in Figure 3D-F & Figure 3J-L. We hope this addresses the reviewers concern and apologize in case we did not understand the reviewers suggestion here.

      (4) Model-based analyses of pupil data:

      I'm concerned about the authors' model-based analyses of the pupil data. The current approach is to simply compute a correlation for each model term separately (i.e., KL divergence, surprise, entropy). While the authors do show low correlations between these terms, single correlational analyses do not allow them to control for additional variables like outcome valence, prediction error (defined in terms of the difference between outcome and expectation), and additional nuisance variables like reaction time, as well as x and y coordinates of gaze.

      Moreover, including entropy and KL divergence in the same regression model could, at least within each task, provide some insights into whether the pupil response to KL divergence depends on entropy. This could be achieved by including an interaction term between KL divergence and entropy in the model.

      In line with the reviewer’s suggestions, we have included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times. We compared the performance of two models on the post-feedback pupil dilation in each time window of interest: Modle 1 had no interaction between information gain and entropy and Model 2 included an interaction term as suggested. We did not include the x- and y- coordinates of gaze in the mixed linear model analysis, as there are multiple values of these coordinates per trial. Furthermore, regressing out the x and y- coordinates of gaze can potentially remove signal of interest in the pupil dilation data in addition to the gaze-related confounds and we did not measure absolute pupil size (Mathôt, Melmi & Castet, 2015; Hayes & Petrov, 2015). We present more sanity checks on the pre-processing pipeline as recommended by Reviewer 1.  

      This new analysis resulted in several additions to the Methods (see Section 2.5) and Results. In sum, we found that including an interaction term for information gain and entropy did not lead to better model fits, but sometimes lead to significantly worse fits. Overall, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the pre-feedback baseline pupil and preceeding reaction time differences. There was only one difference to note between the correlation and linear mixed modeling analyses: for the error trials in the cue-target 2AFC task, including entropy in the model accounted for the variance previously explained by surprise.

      (5) Major differences between experimental tasks:

      More generally, I'm not convinced that the authors' conclusion that the pupil response to KL divergence depends on entropy is sufficiently supported by the current design. The two tasks differ on different levels (stimuli, contingencies, when learning takes place), not just in terms of entropy. In my opinion, it would be necessary to rely on a common task with two conditions that differ primarily in terms of entropy while controlling for other potentially confounding factors. I'm afraid that seemingly minor task details can dramatically change pupil responses. The positive/negative difference in the correlation with KL divergence that the authors interpret to be driven by entropy may depend on another potentially confounding factor currently not controlled.

      We agree with the reviewer’s concerns and acknowledge that the speculation concerning the directional effect of entropy across trials can not be fully substantiated by the currect study. We note that Review #1 had a similar concern. Our response to Reviewer #1 addresses this concern of Reviewer #3 as well. To better align the manuscript with the above mentioned points, we have made several changes that are detailed in our response to Reviewer #1’s public review (above). 

      (6) Model validation:

      My impression is that the ideal learner model should work well in this case. However, the authors don't directly compare model behavior to participant behavior ("posterior predictive checks") to validate the model. Therefore, it is currently unclear if the model-derived terms like KL divergence and entropy provide reasonable estimates for the participant data.

      Based on our understanding, posterior predictive checks are used to assess the goodness of fit between generated (or simulated) data and observed data. Given that the “simple” ideal learner model employed in the current study is not a generative model, a posterior predictive check would not apply here (Gelman, Carlin, Stern, Dunson, Vehtari, & Rubin (2013). The ideal learner model is unable to simulate or replicate the participants’ responses and behaviors such as accuracy and reaction times; it simply computes the probability of seeing each stimulus type at each trial based on the prior distribution and the exact trial order of the stimuli presented to each participant. The model’s probabilities are computed directly from a Dirichlet distribution of values that represent the number of occurences of each stimulus-pair type for each task. The information-theoretic variables are then directly computed from these probabilities using standard formulas. The exact formulas used in the ideal learner model can be found in section 2.4.

      We have now included a complementary linear mixed model analysis which also provides insight into the amount of explained variance of these information-theoretic predictors on the post-feedback pupil response, while also including the pre-feedback baseline pupil and reaction time differences (see section 3.3, Tables 3 & 4). The R<sup>2</sup> values ranged from 0.16 – 0.50 across all conditions tested.

      (7) Discussion:

      The authors interpret the directional effect of the pupil response w.r.t. KL divergence in terms of differences in entropy. However, I did not find a normative/computational explanation supporting this interpretation. Why should the pupil (or the central arousal system) respond differently to KL divergence depending on differences in entropy?

      The current suggestion (page 24) that might go in this direction is that pupil responses are driven by uncertainty (entropy) rather than learning (quoting O'Reilly et al. (2013)). However, this might be inconsistent with the authors' overarching perspective based on Zénon (2019) stating that pupil responses reflect updating, which seems to imply learning, in my opinion. To go beyond the suggestion that the relationship between KL divergence and pupil size "needs more context" than previously assumed, I would recommend a deeper discussion of the computational underpinnings of the result.

      Since we have removed the original speculative conclusion from the manuscript, we will refrain from discussing the computational underpinnings of a potential mechanism. To note as mentioned above, we have preliminary data from our own lab that contradicts our original hypothesis about the relationship between entropy and information gain on the post-feedback pupil response. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Apart from the points raised in the public review above, I'd like to use the opportunity here to provide a more detailed review of potential issues, questions, and queries I have:

      (1) Constriction vs. Dilation Effects:

      The study observes a context-dependent relationship between KL divergence and pupil responses, where pupil dilation and constriction appear to exhibit opposing effects. However, this phenomenon raises a critical concern: Could the initial pupil constriction to visual stimuli (e.g., in the cue-target task) confound correlations with KL divergence? This potential confound warrants further clarification or control analyses to ensure that the observed effects genuinely reflect prediction error signals and are not merely a result of low-level stimulus-driven responses.

      We agree with the reviewers concern and have added the following information to the limitations section in the Discussion (changes in italics below; p. 32-33).

      “First, the two associative learning paradigms differed in many ways and were not directly comparable. For instance, the shape of the mean pupil response function differed across the two tasks in accordance with a visual or auditory feedback stimulus (compare Supplementary Figure 3A with Supplementary Figure 3D), and it is unclear whether these overall response differences contributed to any differences obtained between task conditions within each task. We are unable to rule out whether so-called “low level” effects such as the initial constriction to visual stimuli in the cue-target 2AFC task as compared with the dilation in response auditory stimuli in letter-color 2AFC task could confound correlations with information gain. Future work should strive to disentangle how the specific aspects of the associative learning paradigms relate to prediction errors in pupil dilation by systematically manipulating design elements within each task.”

      Here, I also was curious about Supplementary Figure 1, showing 'no difference' between the two tones (indicating 'error' or 'correct'). Was this the case for FDR-corrected or uncorrected cluster statistics? Especially since the main results also showed sig. differences only for uncorrected cluster statistics (Figure 2), but were n.s. for FDR corrected. I.e. can we be sure to rule out a confound of the tones here after all?

      As per the reviewer’s suggestion, we verified that there were also no significant clusters after feedback onset before applying the correction for multiple comparisons. We have added this information to Supplemenatary section 1.2 as follows: 

      “Results showed that the auditory tone dilated pupils on average (Supplementary Figure 1C). Crucially, however, the two tones did not differ from one another in either of the time windows of interest (Supplementary Figure 1D; no significant time points after feedback onset were obtained either before or after correcting for multiple comparisons using cluster-based permutation methods; see Section 2.5.” 

      Supplementary Figure 1 is showing effects cluster-corrected for multiple comparisons using cluster-based permutation tests from the MNE software package in Python (see Methods section 2.5). We have clarified that the cluster-correction was based on permutation testing in the figure legend. 

      (2) Participant-Specific Priors:

      The ideal learner models do not account for individualised priors, assuming homogeneous learning behaviour across participants. Could incorporating participant-specific priors better reflect variability in how individuals update their beliefs during associative learning?

      We have clarified in the Methods (see section 2.4) that the ideal learner models did account for participant-specific stimuli including participant-specific priors in the letter-color 2AFC task. We have added the following texts: 

      “We also note that while the ideal learner model for the cue-target 2AFC task used a uniform (flat) prior distribution for all participants, the model parameters were based on the participant-specific cue-target counterbalancing conditions and randomized trial order.” (p. 13)

      “The prior distributions used for the letter-color 2AFC task were estimated from the randomized letter-color pairs and randomized trial order presentation in the preceding odd-ball task; this resulted in participant-specific prior distributions for the ideal learner model of the letter-color 2AFC task. The model parameters were likewise estimated from the (participant-specific) randomized trial order presented in the letter-color 2AFC task.” (p. 13)

      (3) Trial-by-Trial Variability:

      The analysis does not account for random effects or inter-trial variability using mixed-effects models. Including such models could provide a more robust statistical framework and ensure the observed relationships are not influenced by unaccounted participant- or trial-specific factors.

      We have included a complementary linear mixed model analysis in which “subject” was modeled as a random effect on the post-feedback pupil response in each time window of interest and for each task. Across all trials, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the prefeedback baseline pupil and preceeding reaction time differences (see section 3.3, Tables 3 & 4).

      (4) Preprocessing/Analysis choices:

      Before anything else, I'd like to highlight the authors' effort in providing public code (and data) in a very readable and detailed format!

      We appreciate the compliment - thank you for taking the time to look at the data and code provided.

      I found the idea of regressing the effect of Blinks/Saccades on the pupil trace intriguing. However, I miss a complete picture here to understand how well this actually worked, especially since it seems to be performed on already interpolated data. My main points here are:

      (4.1) Why is the deconvolution performed on already interpolated data and not on 'raw' data where there are actually peaks of information to fit?

      To our understanding, at least one critical reason for interpolating the data before proceeding with the deconvolution analysis is that the raw data contain many missing values (i.e., NaNs) due to the presence of blinks. Interpolating over the missing data first ensures that there are valid numerical elements in the linear algebra equations. We refer the reviewer to the methods detailed in Knapen et al. (2016) for more details on this pre-processing method. 

      (4.2) What is the model fit (e.g. R-squared)? If this was a poor fit for the regressors in the first place, can we trust the residuals (i.e. clean pupil trace)? Is it possible to plot the same Pupil trace of Figure 1D with a) the 'raw' pupil time-series, b) after interpolation only (both of course also mean-centered for comparison), on top of the residuals after deconvolution (already presented), so we can be sure that this is not driving the effects in a 'bad' way? I'd just like to make sure that this approach did not lead to artefacts in the residuals rather than removing them.

      We thank the reviewer for this suggestion. In the Supplementary Materials, we have included a new figure (Supplementary Figure 2, copied below for convience), which illustrates the same conditions as in Figure 1D and Figure 2D, with 1) the raw data, and 2) the interpolated data before the nuisance regression. Both the raw data and interpolated data have been band-pass filtered as was done in the original pre-processing pipeline and converted to percent signal change. These figures can be compared directly to Figure 1D and Figure 2D, for the two tasks, respectively. 

      Of note is that the raw data seem to be dominated by responses to blinks (and/or saccades). Crucially, the pattern of results remains overall unchaged between the interpolated-only and fully pre-processed version of the data for both tasks. 

      In the Supplementary Materials (see Supplementary section 2), we have added the descriptives of the model fits from the deconvolution method. Model fits (R<sup>2</sup>) for the nuisance regression were generally low: cue-target 2AFC task, M = 0.03, SD = 0.02, range = [0.00, 0.07]; letter-color visual 2AFC, M = 0.08, SD = 0.04, range = [0.02, 0.16].

      Furthermore, a Pearson correlation analysis between the interpolated and fully pre-processed data within the time windows of interest for both task indicated high correspondence: 

      Cue-target 2AFC task

      Early time window: M = 0.99, SD = 0.01, range = [0.955, 1.000]

      Late time window: M = 0.99, SD = 0.01, range = [0.971, 1.000]

      Letter-color visual 2AFC

      Early time window: M = 0.95, SD = 0.04, range = [0.803, 0.998]

      Late time window: M = 0.97, SD = 0.02, range = [0.908, 0.999]

      In hindsight, including the deconvolution (nuisance regression) method may not have changed the pattern of results much. However, the decision to include this deconvolution method was not data-driven; instead, it was based on the literature establishing the importance of removing variance (up to 5 s) of these blinks and saccades from cognitive effects of interest in pupil dilation (Knapen et al., 2016). 

      (4.3) Since this should also lead to predicted time series for the nuisance-regressors, can we see a similar effect (of what is reported for the pupil dilation) based on the blink/saccade traces of a) their predicted time series based on the deconvolution, which could indicate a problem with the interpretation of the pupil dilation effects, and b) the 'raw' blink/saccade events from the eye-tracker? I understand that this is a very exhaustive analysis so I would actually just be interested here in an averaged time-course / blink&saccade frequency of the same time-window in Figure 1D to complement the PD analysis as a sanity check.

      Also included in the Supplementary Figure 2 is the data averaged as in Figure 1D and Figure 2D for the raw data and nuisance-predictor time courses (please refer to the bottom row of the sub-plots). No pattern was observed in either the raw data or the nuisance predictors as was shown in the residual time courses. 

      (4.4) How many samples were removed from the time series due to blinks/saccades in the first place? 150ms for both events in both directions is quite a long bit of time so I wonder how much 'original' information of the pupil was actually left in the time windows of interest that were used for subsequent interpretations.

      We thank the reviewer for bringing this issue to our attention. The size of the interpolation window was based on previous literature, indicating a range of 100-200 ms as acceptable (Urai et al., 2017; Knapen et al., 2016; Winn et al., 2018). The ratio of interpolated-to-original data (across the entire trial) varied greatly between participants and between trials: cue-target 2AFC task, M = 0.262, SD = 0.242, range = [0,1]; letter-color 2AFC task, M = 0.194, SD = 0.199, range = [0,1]. 

      We have now included a conservative analysis in which only trials with more than half (threshold = 60%) of original data are included in the analyses. Crucially, we still observe the same pattern of effects as when all data are considered across both tasks (compare the second to last row in the Supplementary Figure 2 to Figure 1D and Figure 2D).

      (4.5) Was the baseline correction performed on the percentage change unit?

      Yes, the baseline correction was performed on the pupil timeseries after converting to percentsignal change. We have added that information to the Methods (section 2.3).

      (4.6) What metric was used to define events in the derivative as 'peaks'? I assume some sort of threshold? How was this chosen?

      The threshold was chosen in a data-driven manner and was kept consistent across both tasks. The following details have been added to the Methods:

      “The size of the interpolation window preceding nuisance events was based on previous literature [13,39,99]. After interpolation based on data-markers and/or missing values, remaining blinks and saccades were estimated by testing the first derivative of the pupil dilation time series against a threshold rate of change. The threshold for identifying peaks in the temporal derivative is data-driven, partially based on past work[10,14,33]. The output of each participant’s pre-processing pipeline was checked visually. Once an appropriate threshold was established at the group level, it remained the same for all participants (minimum peak height of 10 units).” (p. 8 & 11).

      (5) Multicollinearity Between Variables:

      Lastly, the authors state on page 13: "Furthermore, it is expected that these explanatory variables will be correlated with one another. For this reason, we did not adopt a multiple regression approach to test the relationship between the information-theoretic variables and pupil response in a single model". However, the very purpose of multiple regression is to account for and disentangle the contributions of correlated predictors, no? I might have missed something here.

      We apologize for the ambiguity of our explanation in the Methods section. We originally sought to assess the overall relationship between the post-feedback response and information gain (primarily), but also surprise and entropy. Our reasoning was that these variables are often investigated in isolation across different experiments (i.e., only investigating Shannon surprise), and we would like to know what the pattern of results would look like when comparing a single information-theoretic variable to the pupil response (one-by-one). We assumed that including additional explanatory variables (that we expected to show some degree of collinearity with each other) in a regression model would affect variance attributed to them as compared with the one-on-one relationships observed with the pupil response (Morrissey & Ruxton 2018). We also acknowledge the value of a multiple regression approach on our data. Based on the suggestions by the reviewers we have included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times.  

      This new analysis resulted in several additions to the Methods (see Section 2.5) and Results (see Tables 3 and 4). Overall, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the prefeedback baseline pupil and preceeding reaction time differences. There was only one difference to note between the correlation and linear mixed modeling analyses: for the error trials in the cue-target 2AFC task, including entropy in the model accounted for the variance previously explained by surprise. 

      Reviewer #2 (Recommendations for the authors):

      (1) Given the inherent temporal dependencies in pupil dynamics, characterising later pupil responses as independent of earlier ones in a three-way repeated measures ANOVA may not be appropriate. A more suitable approach might involve incorporating the earlier pupil response as a covariate in the model.

      We thank the reviewer for bringing this issue to our attention. From our understanding, a repeated-measures ANOVA with factor “time window” would be appropriate in the current context for the following reasons. First, autocorrelation (closely tied to sphericity) is generally not considered a problem when only two timepoints are compared from time series data (Field, 2013; Tabachnick & Fidell, 2019). Second, the repeated-measures component of the ANOVA takes the correlated variance between time points into account in the statistical inference. Finally, as a complementary analysis, we present the results testing the interaction between the frequency and accuracy conditions across the full time courses (see Figures 1D and 2D); in these pupil time courses, any difference between the early and late time windows can be judged by the reader visually and qualitatively. 

      (2) Please clarify the correlations between KL divergence, surprise, entropy, and pupil response time series. Specifically, state whether these correlations account for the interrelationships between these information-theoretic measures. Given their strong correlations, partialing out these effects is crucial for accurate interpretation.

      As mentioned above, based on the suggestions by the reviewers we have included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times.  

      This new analysis resulted in several additions to the Methods (see Section 2.5) and Results (see Tables 3 and 4). Overall, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the prefeedback baseline pupil and preceeding reaction time differences. There was only one difference to note between the correlation and linear mixed modeling analyses: for the error trials in the cue-target 2AFC task, including entropy in the model accounted for the variance previously explained by surprise. 

      (3) The effects observed in the late time windows appear weak (e.g., Figure 2E vs. 2F, and the generally low correlation coefficients in Figure 3). Please elaborate on the reliability and potential implications of these findings.

      We have now included a complementary linear mixed model analysis which also provides insight into the amount of explained variance of these information-theoretic predictors on the post-feedback pupil response, while also including the pre-feedback baseline pupil and reaction time differences (see section 3.3, Tables 3 & 4). The R<sup>2</sup> values ranged from 0.16 – 0.50 across all conditions tested. Including the pre-feedback baseline pupil dilation as a predictor in the linear mixed model analysis consistently led to more explained variance in the post-feedback pupil response, as expected.  

      (4) In Figure 3 (C-J), please clarify how the trial-by-trial correlations were computed (averaged across trials or subjects). Also, specify how the standard error of the mean (SEM) was calculated (using the number of participants or trials).

      The trial-by-trial correlations between the pupil signal and model parameters were computed for each participant, then the coefficients were averaged across participants for statistical inference. We have added several clarifications in the text (see section 2.5 and legends of Figure 3 and Supplementary Figure 4).

      We have added “the standard error of the mean across participants” to all figure labels.

      (5) For all time axes (e.g., Figure 2D), please label the ticks at 0, 0.5, 1, 1.5, 2, 2.5, and 3 seconds. Clearly indicate the duration of the feedback on the time axes. This is particularly important for interpreting the pupil dilation responses evoked by auditory feedback.

      We have labeled the x-ticks every 0.5 seconds in all figures and indicated the duration of the auditory feedback in the letter-color decision task and as well as the stimuli presented in the control tasks in the Supplementary Materials. 

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction page 3: "In information theory, information gain quantifies the reduction of uncertainty about a random variable given the knowledge of another variable. In other words, information gain measures how much knowing about one variable improves the prediction or understanding of another variable."

      (2) In my opinion, the description of information gain can be clarified. Currently, it is not very concrete and quite abstract. I would recommend explaining it in the context of belief updating.

      We have removed these unclear statements in the Introduction. We now clearly state the following:

      “Information gain can be operationalized within information theory as the KullbackLeibler (KL) divergence between the posterior and prior belief distributions of a Bayesian observer, representing a formalized quantity that is used to update internal models [29,79,80].” (p. 4)

      (3) Page 4: The inconsistencies across studies are described in extreme detail. I recommend shortening this part and summarizing the inconsistencies instead of listing all of the findings separately.

      As per the reviewer’s recommendation, we have shortened this part of the introduction to summarize the inconsistencies in a more concise manner as follows: 

      “Previous studies have shown different temporal response dynamics of prediction error signals in pupil dilation following feedback on decision outcome: While some studies suggest that the prediction error signals arise around the peak (~1 s) of the canonical impulse response function of the pupil [11,30,41,61,62,90], other studies have shown evidence that prediction error signals (also) arise considerably later with respect to feedback on choice outcome [10,25,32,41,62]. A relatively slower prediction error signal following feedback presentation may suggest deeper cognitive processing, increased cognitive load from sustained attention or ongoing uncertainty, or that the brain is integrating multiple sources of information before updating its internal model. Taken together, the literature on prediction error signals in pupil dilation following feedback on decision outcome does not converge to produce a consistent temporal signature.” (p. 5)

      We would like to note some additional minor corrections to the preprint:

      We have clarified the direction of the effect in Supplementary Figure 3 with the following: 

      “Participants who showed a larger mean difference between the 80% as compared with the 20% frequency conditions in accuracy also showed smaller differences (a larger mean difference in magnitude in the negative direction) in pupil responses between frequency conditions (see Supplementary Figure 4).”

      The y-axis labels in Supplementary Figure 3 were incorrect and have been corrected as the following: “Pupil responses (80-20%)”.

      We corrected typos, formatting and grammatical mistakes when discovered during the revision process. Some minor changes were made to improve clarity. Of course, we include a version of the manuscript with Tracked Changes as instructed for consideration.

    1. Reviewer #2 (Public review):

      Summary

      This study investigates the role of GMCL1 in regulating the mitotic surveillance pathway (MSP), a protective mechanism that activates p53 following prolonged mitosis. The authors identify a physical interaction between 53BP1 and GMCL1, but not with GMCL2. They propose that the ubiquitin ligase complex CRL3-GMCL1 targets 53BP1 for degradation during mitosis, thereby preventing the formation of the "mitotic stopwatch" complex (53BP1-USP28-p53) and subsequent p53 activation. The authors show that high GMCL1 expression correlates with resistance to paclitaxel in cancer cell lines that express wild-type p53. Importantly, loss of GMCL1 restores paclitaxel sensitivity in these cells, but not in p53-deficient lines. They propose that GMCL1 overexpression enables cancer cells to bypass MSP-mediated p53 activation, promoting survival despite mitotic stress. Targeting GMCL1 may thus represent a therapeutic strategy to re-sensitize resistant tumors to taxane-based chemotherapy.

      Strengths

      This manuscript presents potentially interesting observations. The major strength of this article is the identification of GMCL1 as 53BP1 interaction partner. The authors identified relevant domains and show that GMCL1 controls 53BP1 stability. The authors further show a potentially interesting link between GMCL1 status and sensitivity to Taxol.

      Weaknesses

      A major limitation of the original manuscript was that the functional relevance of GMCL1 in regulating 53BP1 within an appropriate model system was not clearly demonstrated. In the revised version, the authors attempt to address this point. However, the new experiment is insufficiently controlled, making it difficult to interpret the results. State-of-the-art approaches would typically rely on single-cell tracking to monitor cell fate following release from a moderately prolonged mitosis.

      In contrast, the authors use a population-based assay, but the reported rescue from arrest is minimal. If the assay were functioning robustly, one would expect that nearly all cells depleted of USP28 or 53BP1 should have entered S-phase at a defined time after release. Thus, the very small rescue effect of siTP53BP1 suggests that the current assay is not suitable. It is also likely that release from a 16-hour mitotic arrest induces defects independent of the 53BP1-dependent p53 response.

      Furthermore, the cell-cycle duration of RPE1 cells is less than 20 hours. It is therefore unclear why cells are released for 30 hours before analysis. At this time point, many cells are likely to have progressed into the next cell cycle, making it impossible to draw conclusions regarding the immediate consequences of prolonged mitosis. As a result, the experiment cannot be evaluated due to inadequate controls.

      To strengthen this part of the study, I recommend that the authors first establish an assay that reliably rescues the mitotic-arrest-induced G1 block upon depletion of p53, 53BP1, or USP28. Once this baseline is validated, GMCL1 knockout can then be introduced to quantify its contribution to the response.

      A broader conceptual issue is that the evidence presented does not form a continuous line of reasoning. For example, it is not demonstrated that GMCL1 interacts with or regulates 53BP1 in RPE1 cells-the system in which the limited functional experiments are conducted.

      There are also a number of inconsistencies and issues with data presentation that need to be addressed:

      (1) Figure 2C: p21 levels appear identical between GMCL1 KO and WT rescue. If GMCL1 regulates p53 through 53BP1, p21 should be upregulated in the KO.

      (2) Figure 2A vs. 2C: GMCL1 KO affects chromatin-bound 53BP1 in Figure 2A, yet in Figure 2C it affects 53BP1 levels specifically in G1-phase cells. This discrepancy requires clarification.

      (3) Figure 2C quantification: The three biological repeats show an unusual pattern, with one repeat's data points lying exactly between the other two. It is unclear what the line represents; please clarify.

      (4) Figure nomenclature: Some abbreviations (e.g., FLAG-KI in Fig. 1F, WKE in Fig. 1C-D, ΔMFF in Fig. 1E) are not defined in the figure legends. All abbreviations must be explained.

      (5) Figure 2D: Please indicate how many times the experiment was reproduced. Quantification with statistical testing would strengthen the result. Pull-downs of 53BP1 with calculation of the ubiquitinated/total ratio could also support the conclusion.

      (6) Figures 3A and 3C: The G1 bars share the same color as the error bars, making the graphs difficult to interpret. Please adjust the color scheme.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1(Public review):

      In this manuscript, Pagano and colleagues test the idea that the protein GMCL1 functions as a substrate receptor for a Cullin RING 3 E3 ubiquitin ligase (CUL3) complex. Using a pulldown approach, they identify GMCL1 binding proteins, including the DNA damage scaffolding protein 53BP1. They then focus on the idea that GMCL1 recruits 53BP1 for CUL3-dependent ubiquitination, triggering subsequent proteasomal degradation of ubiquitinated 53BP1.

      In addition to its DNA damage signalling function, in mitosis, 53BP1 is reported to form a stopwatch complex with the deubiquitinating enzyme USP28 and the transcription factor p53 (PMID: 38547292). These 53BP1-stopwatch complexes generated in mitosis are inherited by G1 daughter cells and help promote p53-dependent cell cycle arrest independent from DNA damage (PMID: 38547292). Several studies show that knockout of 53BP1 overcomes G1 cell cycle arrest after mitotic delays caused by anti-mitotic drugs or centrosome ablation (PMID: 27432897, 27432896). In this model, it is crucial that 53BP1 remains stable in mitosis and more stopwatch complex is formed after delayed mitosis.

      Major concerns:

      Pagano and coworkers suggest that 53BP1 levels can sometimes be suppressed in mitosis if the cells overexpress GMCL1. They carry out a bioinformatic analysis of available public data for p53 wild-type cancer cell lines resistant to the anti-mitotic drug paclitaxel and related compounds. Stratifying GMCL1 into low and high expression groups reveals a weak (p = 0.05 or ns) correlation with sensitivity to taxanes. It is unclear on what basis the authors claim paclitaxel-resistant and p53 wild-type cancer cell lines bypass the mitotic surveillance/timer pathway. They have not tested this. Figure 3 is a correlation assembled from public databases but has no experimental tests. Figure 4 looks at proliferation but not cell cycle progression or the length of mitosis. The main conclusions relating to cell cycle progression and specifically the link to mitotic delays are therefore not supported by experimental data. There is no imaging of the cell cycle or cell fate after mitotic delays, or analysis of where the cells arrest in the cell cycle. Most of the cell lines used have been reported to lack a functional mitotic surveillance pathway in the recent work by Meitinger. To support these conclusions, the stability of endogenous 53BP1 under different conditions in cells known to have a functional mitotic surveillance pathway needs to be examined. A key suggestion in the work is that the level of GMCL1 expression correlates with resistance to taxanes. For the mitotic surveillance pathway, the type of drug (nocodazole, taxol, etc) used to induce a delay isn't thought to be relevant, only the length of the delay. Do GMCL1-overexpressing cells show resistance to anti-mitotics in general?

      We thank the reviewer for this insightful comment. We propose that GMCL1 promotes CUL3-dependent ubiquitination of 53BP1 during prolonged mitotic arrest, thereby facilitating its proteasome-dependent degradation. To evaluate the potential clinical relevance of this mechanism, we stratified cancer cell lines based on GMCL1 mRNA expression using publicly available datasets from DepMap (PMID: 39468210). We observed correlations between GMCL1 expression levels and taxane sensitivity that appear to reflect specific cancer type-drug combinations. To experimentally evaluate this correlation and obtain mechanistic insights, we performed knockdown experiments in hTERT-RPE1 cells, which are known to possess an intact mitotic surveillance pathway. Silencing of GMCL1 alone inhibited cell proliferation and induced apoptosis, while co-depletion of either TP53BP1 or USP28 significantly rescued these effects. These results suggest that GMCL1 modulates the stability of 53BP1 and therefore the availability of the 53BP1-USP28-p53 ternary complex in cells with a functional mitotic surveillance pathway (MSP) (new Figure 5I,J) directly linking GMCL1 to the regulation of the MSP complex. Moreover, to further support our mechanism, we assessed the effect of GMCL1 levels on cell cycle progression. Briefly, following nocodazole synchronization and release, we treated cells with EdU and performed FACS analyses at different times. Knockdown of GMCL1 alone led to a delayed cell cycle progression, but co-depletion of either TP53BP1 or USP28 restored this phenotype (new Figure 3A and new Supplementary Figure 3A-C). These results are consistent with our proliferation data and suggest that the observed effects of GMCL1 are specific to mitotic exit. Finally, overexpression of GMCL1 accelerates cell cycle progression (as assessed by FACS analyses) upon release from prolonged mitotic arrest (new Figure 3B and new Supplementary Figure 3D-E). 

      Importantly, if GMCL1 specifically degrades 53BP1 during prolonged mitotic arrests, the authors should show what happens during normal cell divisions without any delays or drug treatments. How much 53BP1 is destroyed in mitosis under those conditions? Does 53BP1 destruction depend on the length of mitosis, drug treatment, or does 53BP1 get degraded every mitosis regardless of length? Testing the contribution of key mitotic E3 ligase activities on mitotic 53BP1 stability, such as the anaphase-promoting complex/cyclosome (APC/C) is important in this regard. One previous study reported an analysis of putative APC/C KEN-box degron motifs in 53BP1 and concluded these play a role in 53BP1 stability in anaphase (PMID: 28228263).

      Physiological mitosis under unperturbed conditions is typically brief (approximately 30 minutes), making protein quantification during this window challenging. Despite this, we tried by synchronizing cells using RO-3306 and releasing them into drug-free medium to assess GMCL1 dynamics during normal mitosis. Under these conditions, GMCL1 expression was similar to that in asynchronous cells and higher than the levels upon extended mitosis. However, when we attempted to measure the half-life of proteins using cycloheximide, most cells died, likely due to the toxic effect of cycloheximide in cells subjected to co-treatment with RO-3306 or nocodazole. This is the same reasons why in Figure 2C, we assessed 53BP1 in daughter cells rather than mitotic cells. 

      There is no direct test of the proposed mechanism, and it is therefore unclear if 53BP1 is ubiquitinated by a GMCL1-CUL3 ligase in cells, and how efficient this process would be at different cell cycle stages. A key issue is the lack of experimental data explaining why the proposed mechanism would be restricted to mitosis. Indirect effects, such as loss of 53BP1 from the chromatin fraction during M phase upon GMCL1 overexpression, do not necessarily mean that 53BP1 is degraded. PLK1-dependent chromatin-cytoplasmic shuttling of 53BP1 during mitotic delays has been described previously (PMID: 38547292, 37888778). These papers are cited in the text, but the main conclusions of those papers on 53BP1 incorporation into a stopwatch complex during mitotic delays have been ignored. Are the authors sure that 53BP1 is destroyed in mitosis and not simply re-localised between chromatin and non-chromatin fractions? At the very least, these reported findings should be discussed in the text.

      To examine whether GMCL1 promotes 53BP1 ubiquitination in cells, we expressed in cells Trypsin-Resistant Tandem Ubiquitin-Binding Entity (TR-TUBE), a protein that binds polyubiquitin chains. Abundant, endogenous ubiquitinated 53BP1 co-precipitated with TR-TUBE constructs only when wild-type GMCL1 but not the E142K GMCL1 mutant, was expressed (new Figure 2D).  The PLK1-dependent incorporation of 53BP1 into the stopwatch complex and the chromatin-cytoplasmic shuttling of 53BP1 during mitotic delays is now discussed in the text. That said, compared to parental cells, 53BP1 levels in the chromatin fraction are high in two different GMCL1 KO clones in M phase arrested cells (Figure 2A-B).  This increase does not correspond to a decrease in the 53BP1 soluble fraction (Figure 2A and new Supplementary Figure 2D), suggesting decreased 53BP1 is not due to re-localization. The increased half-life of 53BP1 in daughter cells (Figure 2C), also supports this hypothesis. 

      The authors use a variety of cancer cell line models throughout their study, most of which have been reported to lack a functional mitotic surveillance pathway. U2OS and HCT116 cells do not respond normally to mitotic delays, despite being annotated as p53 WT. Other studies have used p53 wild-type hTERT RPE-1 cells to study the mitotic surveillance pathway. If the model is correct, then over-expressing GMCL1 in hTERT-RPE1 cells should suppress cell cycle arrest after mitotic delays, and GMCL1 KO should make the cells more sensitive to delays. These experiments are needed to provide an adequate test of the proposed model.

      We greatly appreciate the reviewer’s suggestion regarding overexpression of GMCL1 in hTERT-RPE1 cells. To address this, we generated stable RPE1 cells expressing V5-tagged GMCL1 and conducted EdU incorporation assays following nocodazole synchronization and release. Overexpression of GMCL1 enhanced cell cycle progression compared to control cells (new Figure 3B and new Supplementary Figure 3D-E) after mitotic arrest, consistent with our model. We, therefore, propose that GMCL1 controls 53BP1 stability to suppress p53-dependent cell cycle arrest.

      We also want to point out that while some papers suggest that HCT116 and U2OS cells do not have an intact mitotic surveillance pathway, others have shown that the MSP is indeed functioning in HCT116 cells and can be triggered with variable efficiency in U2OS cells (PMID: 38547292). This is likely due to high heterogeneity and extensive clonal diversity of cancer cell lines grown in different labs. Please see examples in PMIDs: 3620713, 30089904, and 30778230. In particular, PMID: 30089904 shows that this heterogeneity correlates with considerably different drug responses. 

      To conclude, while the authors propose a potentially interesting model on how GMCL1 overexpression could regulate 53BP1 stability to limit p53-dependent cell cycle arrest, it is unclear what triggers this pathway or when it is relevant. 53BP1 is known to function in DNA damage signalling, and GMCL1 might be relevant in that context. The manuscript contains the initial description of GMCL1-53BP1 interaction but lacks a proper analysis of the function of this interaction and is therefore a preliminary report.

      We hope that the new experiments, along with the clarifications provided in this response letter and revised manuscript, offer the reviewer increased confidence in the robustness and validity of our proposed model.

      Reviewer #2 (Public review):

      This study investigates the role of GMCL1 in regulating the mitotic surveillance pathway (MSP), a protective mechanism that activates p53 following prolonged mitosis. The authors identify a physical interaction between 53BP1 and GMCL1, but not with GMCL2. They propose that the ubiquitin ligase complex CRL3-GMCL1 targets 53BP1 for degradation during mitosis, thereby preventing the formation of the "mitotic stopwatch" complex (53BP1-USP28-p53) and subsequent p53 activation. The authors show that high GMCL1 expression correlates with resistance to paclitaxel in cancer cell lines that express wild-type p53. Importantly, loss of GMCL1 restores paclitaxel sensitivity in these cells, but not in p53-deficient lines. They propose that GMCL1 overexpression enables cancer cells to bypass MSP-mediated p53 activation, promoting survival despite mitotic stress. Targeting GMCL1 may thus represent a therapeutic strategy to re-sensitize resistant tumors to taxane-based chemotherapy.

      Strengths:

      This manuscript presents potentially interesting observations. The major strength of this article is the identification of GMCL1 as a 53BP1 interaction partner. The authors identified relevant domains and showed that GMCL1 controls 53BP1 stability. The authors further show a potentially interesting link between GMCL1 status and sensitivity to Taxol.

      Weaknesses:

      However, the manuscript is significantly weakened by unsubstantiated mechanistic claims, overreliance on a non-functional model system (U2OS), and overinterpretation of correlative data. To support the conclusions of the manuscript, the authors must show that the GMCL1-dependent sensitivity to Taxol depends on the mitotic surveillance pathway.

      To demonstrate that GMCL1-dependent taxane sensitivity is mediated through the mitotic surveillance pathway (MSP), we now performed experiments using hTERT-RPE1 (RPE1) cells, a widely used, non-transformed cell line known to possess a functional MSP.  We compared RPE1 cells with knockdown of GMCL1 alone to those with simultaneous knockdown of GMCL1 and either TP53BP1 or USP28. Upon paclitaxel (Taxol) treatment, cells with GMCL1 knockdown exhibited suppressed proliferation and increased apoptosis. Notably, these phenotypes were rescued by co-depletion of TP53BP1 or USP28 (new Figure 5I,J). These results support the notion that GMCL1 contributes to MSP activity, at least in part, through its regulation of 53BP1.       

      To further strengthen our mechanistic experiments, we assessed the effect of GMCL1 levels on cell cycle progression. Following nocodazole synchronization and release, we treated cells with EdU and performed FACS analyses at different times. Knockdown of GMCL1 alone led to a delay in cell cycle progression, but co-depletion of either TP53BP1 or USP28 alleviate this phenotype (new Figure 3A and new Supplementary Figure 3A, B). These results are consistent with our proliferation data.

      Reviewer #3 (Public review):

      Summary:

      In this study, Kito et al follow up on previous work that identified Drosophila GCL as a mitotic substrate recognition subunit of a CUL3-RING ubiquitin ligase (CRL3) complex.

      Here they characterize mutants of the human ortholog of GCL, GMCL1, that disrupt the interaction with CUL3 (GMCL1E142K) and that lack the substrate interaction domain (GMCL1 BBO). Immunoprecipitation followed by mass spectrometry identified 9 proteins that interacted with wild-type FLAG-GMCL1 and GMCL1 EK but not GMCL1 BBO. These proteins included 53BP1, which plays a well-characterized role in double-strand break repair but also functions in a USP28-p53-53BP1 "mitotic stopwatch" complex that arrests the cell cycle after a substantially prolonged mitosis. Consistent with the IP-MS results, FLAG-GMCL1 immunoprecipitated 53BP1. Depletion of GMCL1 during mitotic arrest increased protein levels of 53BP1, and this could be rescued by wild-type GMCL1 but not the E142K mutant or a R433A mutant that failed to immunoprecipitate 53BP1.

      Using a publicly available dataset, the authors identified a relatively small subset of cell lines with high levels of GMCL1 mRNA that were resistant to the taxanes paclitaxel, cabazitaxel, and docetaxel. This type of analysis is confounded by the fact that paclitaxel and other microtubule poisons accumulate to substantially different levels in various cell lines (DOI: 10.1073/pnas.90.20.9552 , DOI: 10.1091/mbc.10.4.947 ), so careful follow-up experiments are required to validate results. The correlation between increased GMCL1 mRNA and taxane resistance was not observed in lung cancer cell lines. The authors propose this was because nearly half of lung cancers harbor p53 mutations, and lung cancer cell lines with wild-type but not mutant p53 showed the correlation between increased GMCL1 mRNA and taxane resistance. However, the other cancer cell types in which they report increased GMCL1 expression correlates with taxane sensitivity also have high rates of p53 mutation. Furthermore, p53 status does not predict taxane response in patients (DOI: 10.1002/1097-0142(20000815)89:4<769::aid-cncr8>3.0.co;2-6 , DOI: 10.1002/(SICI)1097-0142(19960915)78:6<1203::AID-CNCR6>3.0.CO;2-A , PMID: 10955790).

      The authors then depleted GMCL1 and reported that it increased apoptosis in two cell lines with wild-type p53 (MCF7 and U2OS) due to activation of the mitotic stopwatch. This is surprising because the mitotic stopwatch paper they cite (DOI: 10.1126/science.add9528 ) reported that U2OS cells have an inactive stopwatch and that activation of the stopwatch results in cell cycle arrest rather than apoptosis in most cell types, including MCF7. Beyond this, it has recently been shown that the level of taxanes and other microtubule poisons achieved in patient tumors is too low to induce mitotic arrest (DOI: 10.1126/scitranslmed.3007965 , DOI: 10.1126/scitranslmed.abd4811 , DOI: 10.1371/journal.pbio.3002339 ), raising concerns about the relevance of prolonged mitosis to paclitaxel response in cancer. The findings here demonstrating that GMCL1 mediates degradation of 53BP1 during mitotic arrest are solid and of interest to cell biologists, but it is unclear that these findings are relevant to paclitaxel response in patients.

      Strengths:

      This study identified 53BP1 as a target of CRL3GMCL1-mediated degradation during mitotic arrest. AlphaFold3 predictions of the binding interface, followed by mutational analysis, identified mutants of each protein (GMCL1 R433A and 53BP1 IEDI1422-1425AAAA) that disrupted their interaction. Knock-in of a FLAG tag into the C-terminus of GMCL1 in HCT116 cells, followed by FLAG immunoprecipitation, confirmed that endogenous GMCL1 interacts with endogenous CUL3 and 53BP1 during mitotic arrest.

      Weaknesses:

      The clinical relevance of the study is overinterpreted. The authors have not taken relevant data about the clinical mechanism of taxanes into account. Supraphysiologic doses of microtubule poisons cause mitotic arrest and can activate the mitotic stopwatch. However, in physiologic concentrations of clinically useful microtubule poisons, cells proceed through mitosis and divide their chromosomes on mitotic spindles that are at least transiently multipolar. Though these low concentrations may result in a brief mitotic delay, it is substantially shorter than the arrest caused by high concentrations of microtubule poisons, and the one mimicked here by 16 hours of 0.4 mg/mL nocodazole, which is not used clinically and does not induce multipolar spindles. Resistance to mitotic arrest occurs through different mechanisms than resistance to multipolar spindles. No evidence is presented in the current version of the manuscript that GMCL1 affects cellular response to clinically relevant doses of paclitaxel.

      We agree that it would be an overstatement to claim that GMCL1 and p53 regulates paclitaxel sensitivity in cancer patients in a clinical context. The correlations we observed were based on publicly available cancer cell lines from datasets catalogued in CCLE and DepMap, which do not fully account for clinical heterogeneity and patient-specific factors. In response to this important point, we have revised the text accordingly. 

      In the experiments shown in former Figure 4A-H (now Figure 5A-H) and in those shown in the new Figure 5I-J, we used 100 nM paclitaxel to test the hypothesis that low GMCL1 levels sensitizes cancer cells in a p53-dependent manner. Here, paclitaxel was chosen to mimic the conditions reported in the PRISM dataset (PMID: 32613204), which compiles the proliferation inhibitory activity of 4,518 compounds tested across 578 cancer cell lines. Consistent with our cell cycle findings, the paclitaxel sensitivity caused by GMCL1 depletion was reverted by silencing 53BP1 or USP28 (new Figure 5I-J), again supporting the involvement of the stopwatch complex. We are unsure about how to model the “physiologic concentrations of clinically useful microtubule poisons” in cell-based studies. A recent review notes that “The time above a threshold paclitaxel plasma concentration (0.05 mmol/L) is important for the efficacy and toxicity of the drug” (PMID: 28612269).  Two other reviews mention that the clinically relevant concentration of paclitaxel is considered to be plasma levels between 0.05–0.1 μmol/L (approximately 50–100 nM) and that in clinical dosing, typical patient plasma concentrations after paclitaxel infusion range from 80–280 nM, with corresponding intratumoral concentrations between 1.1–9.0 μM, due to drug accumulation in tumor tissue (PMIDs: 24670687 and  29703818).  We have now emphasized in the revised text the rationale for using 100 nM paclitaxel in our experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General comments on the Figures:

      (1) Western blots lack molecular weight markers on most panels and are often over-exposed and over-contrasted, rendering them hard to interpret.

      We have now included molecular weight markers in all Western blot panels. We have also reprocessed the images to avoid overexposure and excessive contrast, ensuring that the bands are clearly visible and interpretable.

      (2) Input and IP samples do not show percentage loading, so it is hard to interpret relative enrichments.

      In the revised figures, we have indicated what % of the input was loaded.

      (3) The authors change between cell line models for their experiments, and this is not clear in the figures. These are important details for interpreting the data, as many of the cell lines used are not functional for the mitotic surveillance pathway.

      In the revised manuscript, we have clearly indicated the specific cell lines used in each experiment in the figure legends. Additionally, to address concerns regarding the mitotic surveillance pathway, we have included new experiments using hTERT-RPE1 cells, which have been reported to possess a functional mitotic surveillance pathway (MSP) (Figure 4I-J).

      (4) No n-numbers are provided in the figure legends. Are the Western blots provided done once, or are they reproducible? Many of the blots would benefit from quantification and presentation via graphs to test for reproducible changes to 53BP1 levels under the different conditions.

      As now indicated in the methods section, we have conducted each Western blot no less than three times, yielding results that exhibit a high degree of reproducibility. A representative Western blot has been selected for each figure. We did not include densiometric quantification of immunoblots, given that the semi-quantitative nature of this technique would lead to an overinterpretation of our data; unfortunately, this is a limitation of the technique. In fact, eLife and other similar scientific journals do not adhere to the practice of quantifying Western blots. One exception to this norm is for protein half-life studies, which is done to measure the kinetics of decay rates and their internal comparisons. Accordingly, the experiments in Figure 2C were quantified.

      (5) Graphs displayed in the supplementary figures are blacked out, and individual data points cannot be visualised. All graphs should have individual data points clearly visible.

      We revised the quantified graphs and replaced them with scatter plots to clearly display individual data points, showing sample distribution.

      Additional experiments with specific comments on Figures:

      (1) Figure 1C-D: the relative amount of 53BP1 co-precipitating with FLAG-tagged GMCL1 WT appears very different between the two experiments. If the idea is that MLN4924 (Cullin neddylation inhibitor) makes the interaction easier to capture, then this should be explained in the text, and ideally shown on the same gel/blot -/+ MLN4924.

      We now present the samples treated with and without MLN4924 on the same gel/blot to allow direct comparison (new Figure 1D) and clarified this point in the text.

      (2) Figure 1E: The figure legend states that GMCL1 was immunoprecipitated, but the Figure looks as though FLAG-tagged 53BP1 was the bait protein being immunoprecipitated? Can the authors clarify?

      We thank the reviewer for pointing out the discrepancy between the figure and the figure legend in Figure 1E. The immunoprecipitation was indeed performed using FLAG-tagged 53BP1, and we have now rectified the figure legend accordingly. 

      (3) Figure 1F: Rather than parental cell lysate, the better control would be to IP FLAG from another FLAG-tagged expressing cell line, to rule out non-specific binding with the FLAG tag at the non-overexpressed level. 

      Figure 1F shows interaction at the endogenous level. The specificity of binding with overexpressed proteins is shown in Figures 1C and 1D.

      The USP28 blot is over-exposed and makes it hard to see any changes in electrophoretic mobility - it looks as though there is a change between the parental and the KI cell line? It is surprising that USP28 would co-IP with GMCL1 (presumably because USP28 is bound to 53BP1) if the function of GMCL1-53BP1 interaction is to promote 53BP1 degradation. Can the authors reconcile this? Crucially, if the authors claim that the 53BP1-GMCL1 interaction is specific to prolonged mitosis, then this experiment should be repeated and performed with asynchronous, normal-length mitosis, and prolonged mitosis conditions. This is vital for supporting the claim that this interaction only occurs during prolonged mitoses and does not occur in every mitosis regardless of length.

      This is a good point. Unfortunately, many of the protein-protein interactions occur post lysis. Therefore, we could not observe differences in asynchronous vs. mitotic cells.

      (4) Figure S1F: Label on blot should be CUL3 not CUI3.

      We thank the reviewer for pointing this out and we have corrected the typo.

      (5) Figure 2A: The authors suggest an increase in chromatin-bound 53BP1 in GMCL1 KO U2OS cells, specifically in M phase. Again, is this time in mitosis dependent, or would this be evident in every mitosis, regardless of length? Such an experiment would benefit from repetition and quantification to test whether the observed effect is reproducibly consistent. If the authors' model is correct, simply treating U2OS WT mitotic cells with MG132 during the mitotic arrest and performing the same fractionation should bring 53BP1 levels up to that seen in GMCL1 KO cells under the same conditions.

      The reviewer’s suggestion to assess 53BP1 accumulation in wild-type U2OS cells treated with MG132 during mitotic arrest is indeed highly relevant. However, treatment with MG132 during prolonged mitosis consistently led to significant cell death, making it technically challenging to evaluate 53BP1 levels under these conditions.

      (6) Figure 2B: The authors restore GMCL1 expression in the KO U2OS cells using WT and 2 distinct mutant cDNAs. However, the expression of these constructs is not equivalent, and thus their effects cannot be directly compared. It is also surprising that GMCL1 is much higher in M phase samples in this experiment (shouldn't it be destroyed?), when no such behaviour has been observed in the other figures.

      There is no evidence in our study or others that GMCL1 should be destroyed in M phase.  We show that the R433A mutant is expressed at a level very similar to the WT protein, yet it doesn’t promote the degradation of 53BP1. It is true that the E142K is expressed less in mitotic cells whereas is the most expressed in asynchronous cells. For some reason, this mutant has an inverse behavior compared to the WT, limiting the interpretation of this result. We now mention this in the text. 

      (7) Figure 2C: The CHX experiment would benefit from inclusion of a control protein known to have a short half-life (e.g. c-myc, p53). Is GMCL1 known to have a relatively short half-life? It looks as though GMCL1 disappears after 1 h CHX treatment (although hard to definitively tell in the absence of molecular weight markers). 53BP1 appears to continue declining in the absence of GMCL1, which is surprising if p53BP1 degradation requires GMCL1. How can the authors reconcile this?

      As a control for the CHX chase experiments, we included p21, whose protein levels decreased in a CHX-dependent. GMCL1 itself also appeared to undergo degradation upon CHX treatment, but it doesn’t disappear completely.

      (8) Supplemental Figure 2:

      Transcription is largely inhibited in M phase, so the p53 target gene transcripts present in M phase are inherited from the preceding G2 phase. The qPCR's thus need a reference sample to compare against. I.e., was p21/PUMA/NOXA mRNA already low in G2 in the GMCL1 KO + WT cells before they entered mitosis? Or is the mRNA stability affected during M phase specifically? Is this effect on the mRNA dependent on the time in mitosis?

      It is well established that transcription is not entirely shut down during mitosis, particularly for a subset of genes involved in cell cycle regulation. For example, p21, PUMA, NOXA, and p53 mRNAs have been shown to remain actively transcribed during mitosis (see Table S5 in PMID: 28912132). However, we currently lack direct evidence that p53 activation during mitosis, specifically through the mitotic surveillance pathway, drives the transcription of p21, PUMA, or NOXA mRNAs during M phase. In the absence of such mechanistic data, we opted to exclude these analyses from the final figures.

      Panel B: blots are too over-exposed to see differences in p53 stability under the different conditions. Mitotic samples should be included to show how these differ from the G1 samples.

      The background of all blot images has been adjusted to ensure clarity and consistency.

      Panel D: The authors show no significant difference in the cell cycle profiles of the GMCL1 KO and reconstituted cells compared to parental U2OS cells. This should also be performed in the G1 daughter cells following a prolonged mitosis, to test the effect of the different GMCL1 constructs on G1 cell cycle arrest. U2OS cells have been reported not to have a functional mitotic surveillance pathway (Meitinger et al, Science, 2024), so U2OS cells are perhaps not a good model for testing this.

      We performed cell cycle profiling using EdU incorporation in hTERT-RPE1 cells, which possess a functional MSP, to evaluate cell cycle progression in daughter cells following prolonged mitosis. We observed that GMCL1 knockdown alone leads to G1-phase arrest. In contrast, co-depletion of GMCL1 with either 53BP1 or USP28 bypasses this arrest, indicating that GMCL1 regulates cell cycle progression in an MSP-dependent manner. Please see also the answer to the public review above. 

      (9) Figure 3:

      The authors show expression data for GMCL1 in the different cancer cell lines. This should be validated for a subset of cancer cell lines at the GMCL1 protein level, and cross-correlated to their MSP/mitotic timer status. Does GMCL1 depletion or knockout in p53 wild-type cancer cell lines overexpressing GMCL1 protein restore mitotic surveillance function?

      We were unable to assess GMCL1 protein levels using publicly available proteomics datasets, as GMCL1 expression was not detected. In p53 wild-type hTERT-RPE1 cells, GMCL1 knockdown impaired the mitotic surveillance pathway, as evidenced by G1-phase arrest following prolonged mitosis (new Figure 3A and new Supplementary Figure 3A, B). This arrest was rescued by co-depletion of either TP53BP1 or USP28, indicating that GMCL1 acts upstream of the MSP.

      (10) Figure 4:

      The authors show siRNA experiments depleting GMCL1 and testing the effects of GMCL1 loss on cell viability and apoptosis induction. This is performed in different cell line backgrounds. However, there is no demonstration that any of the observed effects are due to a lack of GMCL1 activity on 53BP1. These experiments need to be repeated in 53BP1 co-depleted cells to test for rescue. Without this, the interpretation is purely correlative.

      We assessed the effects of GMCL1 knockdown, alone or in combination with TP53BP1 or USP28 knockdown, on cell viability and apoptosis in hTERT-RPE1 cells using siRNA. Knockdown of GMCL1 alone led to a significant reduction in cell viability and an increase in apoptosis. However, co-depletion of GMCL1 with either TP53BP1 or USP28 restored both cell viability and apoptosis levels to those observed in control cells (new Figure 5I,J).

      (11) Text comments:

      Line 257: HeLa cells supress p53 through the E6 viral protein and are not "mutant" for p53.

      The authors should cite early work by Uetake and Sluder describing the effects of spindle poisons on the mitotic surveillance pathway.

      We appreciate the reviewer’s comments – We have now made the necessary corrections.

      Reviewer #2 (Recommendations for the authors):

      Major Points:

      (1) Unsubstantiated Mechanistic Claims:

      In Figures 3 and 4, the authors show correlations between GMCL1 expression and sensitivity to Taxol. However, they fail to demonstrate that the mitotic stopwatch is mechanistically involved. To support this conclusion, the authors must test whether deletion of 53BP1, USP28, or disruption of their interaction rescues Taxol sensitivity in GMCL1-depleted cells. Since 53BP1 also plays a role in DNA damage response, such rescue experiments are necessary to distinguish between mitotic surveillance-specific and broader stress-response effects. Deletion of USP28 would be particularly informative.

      We sought to experimentally determine whether GMCL1 is involved in regulating the mitotic stopwatch. Knockdown of GMCL1 alone resulted in reduced cell proliferation and increased apoptosis. In contrast, co-depletion of GMCL1 with either TP53BP1 or USP28 restored both proliferation and apoptosis levels to those observed in control cells (new Figure 5I, J). To further strengthen our mechanistic experiments, we assessed the effect of GMCL1 levels on cell cycle progression. We conducted EdU incorporation assays following nocodazole synchronization and release. Knockdown of GMCL1 alone led to a delay in G1 progression, whereas co-depletion of either TP53BP1 or USP28 rescued normal cell cycle progression (new Figure 3A and new Supplementary Figure 3A, B). These results are consistent with our proliferation data and suggest that GMCL1 functions upstream of the ternary complex, likely by regulating 53BP1 protein levels.

      (2) Model System Limitations (U2OS Cells):

      The use of U2OS cells is highly problematic for investigating the mitotic surveillance pathway. U2OS cells lack a functional mitotic stopwatch and do not arrest following prolonged mitosis in a 53BP1/USP28-dependent manner (PMID: 38547292). Therefore, conclusions drawn from this model system about the function of the mitotic surveillance pathway are not substantiated. Key experiments should be repeated in a cell line with an intact pathway, such as RPE1.

      We now performed all key experiments also hTERT-RPE1 cells (see above). We also would like to point out that while some papers suggest that HCT116 and U2OS cells do not have an intact mitotic surveillance pathway, others have showed that the MSP is indeed functioning in HCT116 cells and can be triggered with variable efficiency in U2OS cells (PMID: 38547292).  This is likely due to high heterogeneity and extensive clonal diversity of cancer cell lines grown in different labs. Please see examples in PMIDs: 3620713, 30089904, and 30778230. In particular, PMID: 30089904 shows that this heterogeneity correlates with considerably different drug responses. 

      (3) Misinterpretation of p53 Activity Timing:

      The manuscript states that "GMCL1 KO cells led to decreased mRNA levels of p21 and NOXA during mitosis" (line 194). However, it is well established that the mitotic surveillance pathway activates p53 in the G1 phase following prolonged mitosis-not during mitosis itself (PMID: 38547292). Therefore, the observed changes in mRNA levels during mitosis are unlikely to be relevant to this pathway.

      We currently lack direct evidence that p53 activated during mitosis through the mitotic surveillance pathway directly influences the transcription of p21, PUMA, or NOXA mRNAs during M phase. Therefore, we have chosen to exclude these data from the final figures.

      (4) Incorrect Interpretation of 53BP1 Chromatin Binding:

      The authors claim that 53BP1 remains associated with chromatin during mitosis, which contradicts established literature. It is known that 53BP1 is released from chromatin during mitosis via mitosis-specific phosphorylation (PMID: 24703952), and this is supported by more recent findings (PMID: 38547292). A likely explanation for the discrepancy may be contamination of mitotic fractions with interphase cells. The chromatin fraction data in Figure 2C must be interpreted with caution.

      Our method to synchronize in M phase is rather stringent (see Supplementary Figure 3D as an example). The literature indicates that the bulk of 53BP1 is released from chromatin during mitosis. Yet, even in the two publications mentioned by the reviewer, there is a difference in the observable amount of 53BP1 bound to chromatin (compare Figure 2B in PMID: 38547292 and Figure 5A in PMID: 24703952). The difference is likely due to the different biochemical approaches used to purify chromatin bound proteins (salt and detergent concentrations, sonication, etc.). Using our fractionation approach, we can reliably separate the soluble fraction (containing also the nucleoplasmic fraction) and chromatin associated proteins as indicated by the controls such as a-Tubulin and Histon H3.  We have now mentioned these limitations when comparing different fractionation methods in our discussion section.

      (5) Inadequate Citation of Foundational Literature:

      The literature on the mitotic surveillance pathway is relatively limited, and it is essential that the authors provide a comprehensive and accurate account of its development. The foundational work by the Sluder lab (PMID: 20832310), demonstrating a p53-dependent arrest following prolonged mitosis, must be cited. Furthermore, the three key 2016 papers (PMID: 27432896, 27432897, 27432896) that identified the involvement of USP28 and 53BP1 in this pathway are critical and should be cited as the basis of the mitotic surveillance pathway.

      In contrast, the manuscript currently emphasizes publications that either contribute minimally or have been contradicted by prior and subsequent work. For example: PMID: 31699974, which proposes Ser15 phosphorylation of p53 as critical, has been contradicted by multiple groups (e.g., Holland, Oegema, and Tsou labs).

      PMID: 37888778, which suggests that 53BP1 must be released from kinetochores, is inconsistent with findings that indicate kinetochore localization is not relevant.

      The authors should thoroughly revise the Introduction to reflect what this reviewer would describe as a more accurate and scholarly approach to the literature.

      We have substantially revised both the Introduction and Discussion sections to incorporate important references kindly suggested by the reviewer.

      Minor Points:

      (1) Overexposed Western Blots:

      The Western blots throughout the manuscript are heavily overexposed and saturated, obscuring differences in protein levels and hindering data interpretation. The authors should provide properly exposed blots with quantification where appropriate.

      We have provided Western blot images with appropriate exposure levels and included quantification where appropriate (i.e., to measure the kinetics of decay rates as in Figure 2C). For all the other immunoblots, we did not include densiometric quantification, given that the semi-quantitative nature of this technique would lead to overinterpretation of our data. This is, unfortunately, a limitation of the technique. In fact, eLife and other similar scientific journals do not adhere to the practice of quantifying Western blot analyses. 

      (2) Missing information in the graphs in Figure 2C and 4; S2? How many repeats? What are the asterisks?

      Panels referenced above have been repeated several times, and further details are now provided in the figure legends.

      Reviewer #3 (Recommendations for the authors):

      (1)   The claim that GMCL1 modulates paclitaxel sensitivity in cancer should be toned down

      .

      We agree that it would be an overstatement to claim that GMCL1 regulates paclitaxel sensitivity in cancer patients in a clinical context. The correlations we observed were based on publicly available, cell line–based datasets, which do not fully account for clinical heterogeneity and patient-specific factors. In response to this important point, we have revised our statements and corresponding text accordingly. We now placed greater emphasis on our molecular and cell biology studies.

      (2) Additional experiments in low, physiologically relevant concentrations of paclitaxel would be interesting. It is possible that these concentrations activate the mitotic stopwatch in a portion of cells, in addition to inducing cell death due to chromosome loss, activation of an immune response, and chromothripsis. Results should be interpreted in the context of this complexity.

      Please see the response to the public review. 

      (3) It would be helpful to show that CUL3 interacts with 53BP1 only in the presence of GMCL1.

      We show that the binding of 53BP1 to GMCL1 is independent of the ability of GMCL1 to bind CUL3 (Figure 1C, D). The binding between 53BP1 and CUL3 is difficult to detect (Figure 1F) likely because it’s not direct but mediated by GMCL1.

      (4) The GMCL1 "KO" lines appear to still express a low level of GMCL1 (Figure 2A), which should be acknowledged

      We have included the GMCL1 mRNA expression data, as measured by RT-PCR, in Supplementary Figure 1G, demonstrating that GMCL1 expression was undetectable under the tested conditions.

      (5) Additional description of the methods is warranted. This is particularly true for the database analysis that forms the basis for the claim that GMCL1 overexpression causes resistance to paclitaxel and other taxanes presented in Figure 3, the methodology used to obtain M-phase cells, and the concentration and duration of taxol treatment.

      We have now extensively revised the Methods section.  

      (6) "Taxol" and "paclitaxel" are used interchangeably throughout the manuscript. Consistency would be preferable.

      We have revised the manuscript to maintain consistency in the use of the terms “Taxol” and “paclitaxel” and now refer to “paclitaxel” when discussing that individual compound; “taxanes” when referring collectively to cabazitaxel, docetaxel and paclitaxel; and “Taxol” has been removed entirely to avoid redundancy or confusion.    

      (7) It is unclear why it is claimed that GMCL1 interacts "specifically" with 53BP1 (line 176) since multiple interactors were identified in the IP-MS study

      We meant that the GMCL1 R433A mutant loses its ability to bind 53BP1, suggesting that the GMCL1-53BP1 interaction is not an artifact. We have now clarified the text. 

      (8) The bottom row in Figure S3 is misleading. Paclitaxel is not uniformly effective in every tumor of any given type, and so resistance occurs in every cancer type.

      We fully agree that cancer is highly heterogeneous and that paclitaxel efficacy varies across tumors, even within the same histological subtype. Our intension was not to suggest uniform sensitivity/resistance, but rather to provide a high-level overview using aggregated data. We acknowledge that this coarse-grained representation may unintentionally imply overly generalized conclusions. To avoid potential misinterpretation, we have removed the corresponding panel in the revised paper.

    1. Reviewer #1 (Public review):

      Summary:

      This paper introduces a dual-pathway model for reconstructing naturalistic speech from intracranial ECoG data. It integrates an acoustic pathway (LSTM + HiFi-GAN for spectral detail) and a linguistic pathway (Transformer + Parler-TTS for linguistic content). Output from the two components is later merged via CosyVoice2.0 voice cloning. Using only 20 minutes of ECoG data per participant, the model achieves high acoustic fidelity and linguistic intelligibility.

      Strengths:

      (1) The proposed dual-pathway framework effectively integrates the strengths of neural-to-acoustic and neural-to-text decoding and aligns well with established neurobiological models of dual-stream processing in speech and language.

      (2) The integrated approach achieves robust speech reconstruction using only 20 minutes of ECoG data per subject, demonstrating the efficiency of the proposed method.

      (3) The use of multiple evaluation metrics (MOS, mel-spectrogram R², WER, PER) spanning acoustic, linguistic (phoneme and word), and perceptual dimensions, together with comparisons against noise-degraded baselines, adds strong quantitative rigor to the study.

      Weaknesses:

      (1) It is unclear how much the acoustic pathway contributes to the final reconstruction results, based on Figures 3B-E and 4E. Including results from Baseline 2 + CosyVoice and Baseline 3 + CosyVoice could help clarify this contribution.

      (2) As noted in the limitations, the reconstruction results heavily rely on pre-trained generative models. However, no comparison is provided with state-of-the-art multimodal LLMs such as Qwen3-Omni, which can process auditory and textual information simultaneously. The rationale for using separate models (Wav2Vec for speech and TTS for text) instead of a single unified generative framework should be clearly justified. In addition, the adaptor employs an LSTM architecture for speech but a Transformer for text, which may introduce confounds in the performance comparison. Is there any theoretical or empirical motivation for adopting recurrent networks for auditory processing and Transformer-based models for textual processing?

      (3) The model is trained on approximately 20 minutes of data per participant, which raises concerns about potential overfitting. It would be helpful if the authors could analyze whether test sentences with higher or lower reconstruction performance include words that were also present in the training set.

      (4) The phoneme confusion matrix in Figure 4A does not appear to align with human phoneme confusion patterns. For instance, /s/ and /z/ differ only in voicing, yet the model does not seem to confuse these phonemes. Does this imply that the model and the human brain operate differently at the mechanistic level?

      (5) In general, is the motivation for adopting the dual-pathway model to better align with the organization of the human brain, or to achieve improved engineering performance? If the goal is primarily engineering-oriented, the authors should compare their approach with a pretrained multimodal LLM rather than relying on the dual-pathway architecture. Conversely, if the design aims to mirror human brain function, additional analysis, such as detailed comparisons of phoneme confusion matrices, should be included to demonstrate that the model exhibits brain-like performance patterns.

    2. Author response:

      Here we provide a provisional response addressing the public comments and outlining the revisions we are planning to make:

      (1) We will add additional baseline models to delineate the contributions of the acoustic and linguistic pathways.

      (2) We will show additional ablation analysis and other model comparison results, as suggested by the reviewers, to justify the choice of the DNN models.

      (3) We will clarify the use of the TIMIT dataset during pre-training. In fact, the TIMIT speech data (the speech corpora used in the test set) was not included or used when pre-training the acoustic or linguistic pathway. It was only used in fine-tuning the final speech synthesizer (the cosyvoice model). We will present results without this fine-tuning step, which will fully eliminate the usage of the TIMIT data during model training.

      (4) We will further analyze the phoneme confusion matrices and/or other data to evaluate the model behavior.

      (5) We will analyze the test sentences with high and low accuracies. We will also include results with partial training data (e.g. using 25%, 50%, 75% of the training set) to further evaluate the impact of the total amount of training data.

    1. Le Sneakernet : Repenser le Partage de Données à l'Ère de la Big Tech

      Résumé Exécutif

      Face à la mainmise croissante des géants de la technologie (Big Tech) sur les données personnelles et les infrastructures numériques, un mouvement alternatif émerge : le sneakernet.

      Ce concept, qui désigne le partage physique de fichiers hors ligne, s'oppose directement au modèle centralisé et commercial de l'internet actuel.

      Des collectifs d'artistes et d'activistes développent des initiatives concrètes pour réhabiliter des pratiques d'échange de données autonomes, locales et matérielles.

      Les principales conclusions de l'analyse des sources sont les suivantes :

      Le Problème Central : Les données hébergées sur des plateformes comme Google Drive, iCloud ou Instagram n'appartiennent pas aux utilisateurs mais aux entreprises qui les stockent.

      Celles-ci contrôlent l'accès, peuvent en modifier les conditions et exploitent les informations de navigation, créant une forte dépendance et un risque de surveillance.

      La Solution Sneakernet : Ce "réseau basket" repose sur l'échange physique de données (via clés USB, par exemple) à la "vitesse des jambes".

      Il représente une démarche de reprise de contrôle sur la circulation de l'information, en marge des infrastructures traditionnelles.

      Les Initiatives Clés :

      Les "Data-Foires" de l'Outdoor Computer Club : Des événements où les participants échangent des fichiers sur un ordinateur collectif, souvent alimenté par des sources d'énergie autonomes, promouvant une gestion consciente des ressources et une vision de la technologie comme un "commun".   

      Le projet "Dead Drops" d'Aram Bartholl : Un réseau mondial et participatif de clés USB scellées dans des murs, fonctionnant comme des "boîtes aux lettres mortes" anonymes pour l'échange de fichiers.   

      Les serveurs DIY du collectif Actinomy : Des ateliers pour construire ses propres mini-serveurs portables, permettant un hébergement local et privé, créant ainsi une "chambre à soi" numérique indépendante des grandes plateformes.

      La Philosophie Sous-jacente : Le mouvement critique "l'obésité de la donnée" et la course à la vitesse.

      Il propose de retrouver un "affect par rapport aux données" en privilégiant des échanges plus lents, plus intentionnels et en réutilisant des technologies plus anciennes pour répondre à des enjeux politiques actuels comme la surveillance.

      En conclusion, le sneakernet n'est pas une simple nostalgie technologique, mais une réponse politique et pratique à la structure de pouvoir de l'internet moderne.

      Il démontre que des alternatives artisanales et autonomes existent déjà pour échapper au contrôle des plateformes et repenser notre rapport à la technologie et aux données.

      1. La Problématique : Dépendance et Perte de Contrôle à l'Ère Numérique

      Le modèle dominant de l'internet actuel, contrôlé par un nombre restreint de grandes entreprises technologiques, pose un problème fondamental de souveraineté sur les données personnelles.

      Propriété des Données : Une fois stockés en ligne sur des services comme Google Drive, iCloud ou postés sur des réseaux sociaux, les fichiers, photos et documents "deviennent la propriété des plateformes numériques qui les hébergent."

      La perception de possession par l'utilisateur est une illusion, car ce dernier perd le contrôle direct sur ses propres créations.

      Contrôle de l'Accès : Les entreprises qui gèrent l'accès et le stockage (telles qu'Amazon, Microsoft, Oracle, Google et Meta) ont le pouvoir unilatéral de "décider de nous faire payer plus cher ou carrément de nous couper cet accès."

      Exploitation des Informations : L'acceptation des cookies autorise les entreprises à utiliser les informations de navigation des utilisateurs, transformant leurs goûts et intérêts en données monétisables.

      Dépendance Structurelle : La facilité d'utilisation de l'internet à haut débit et du stockage "infini" sur le cloud a créé une telle dépendance que l'on "n'imagine même plus comment faire sans".

      Cette situation est décrite comme une "mainmise des big tech sur nos vies."

      2. Le Sneakernet : Une Alternative au Réseau Global

      En réponse à cette centralisation, le sneakernet propose un paradigme radicalement différent, fondé sur l'échange physique et déconnecté.

      Définition : Le terme "sneakernet" signifie littéralement "réseau basket".

      Il désigne un réseau d'échange physique fonctionnant "à la vitesse des jambes".

      C'est "l'antithèse de l'internet actuel", où les données transitent par des infrastructures de câbles et d'ondes.

      Contrôle et Matérialité : Le principal avantage est le contrôle total sur le chemin de l'information.

      Comme le souligne un participant, "on a le contrôle sur par où l'information elle passe, dans ta poche, dans ta main et dans sa poche et donc en ça c'est hors du réseau."

      Une "Innovation" Rétro-Technologique : Le mouvement propose d'utiliser des technologies plus anciennes pour répondre aux problématiques contemporaines.

      Un organisateur explique : "Nous, on imagine qu'en prenant peut-être des technologies plus anciennes, on propose une autre vision de l'innovation."

      Cette approche est justifiée par le fait qu'elle a "du sens par rapport à ce qui se passe politiquement aujourd'hui autour de l'internet."

      3. Initiatives et Projets Phares

      Plusieurs collectifs d'artistes et d'activistes ont mis en place des projets concrets pour matérialiser les principes du sneakernet.

      Les "Data-Foires" de l'Outdoor Computer Club

      Ce collectif, dont les organisateurs utilisent les pseudonymes "Jeff Bisou" et "Xavier Nul", organise des événements d'échange de données hors ligne appelés "data-foires".

      Concept : Un ordinateur est installé dans un lieu (par exemple, une forêt), où les participants peuvent déposer et récupérer des données via des clés USB.

      Autonomie Énergétique : L'installation est souvent alimentée par des "batteries au lithium, de récupération" connectées à un convertisseur qui fournit un courant standard de 230 volts.

      Cette démarche soulève des questions sur la gestion collective de l'électricité, perçue non "comme une ressource infinie" mais en fonction des besoins réels.

      Contenus Partagés : Les échanges sont hétéroclites, incluant :

      ◦ Musique, logiciels, brochures. 

      ◦ Scans 3D.   

      ◦ Un documentaire de Mathieu Rigouste, "Nous sommes des champs de bataille".  

      ◦ Une thèse scientifique sur un hydrogel supramoléculaire utilisé pour cultiver des cellules cancéreuses.  

      ◦ Le site d'une maison d'édition, présenté en avant-première.

      Limites et Modération : Le système n'est pas parfait, avec la présence de "pas mal de fichiers corrompus" et de transferts incomplets.

      Cependant, la modération se fait "de manière fluide vu que tout le monde est là en présentiel", permettant de retrouver plus facilement une personne malveillante.

      Philosophie : L'initiative promeut l'idée de "penser l'ordinateur comme un commun".

      Le partage d'une machine unique s'oppose à l'usage individuel habituel et transforme la relation à la technologie en une pratique collective.

      Dead Drops par Aram Bartholl : Un Réseau d'Échange Anonyme

      Initié en 2010 par l'artiste et professeur d'art numérique Aram Bartholl, ce projet est l'une des incarnations les plus connues du sneakernet.

      Concept : Des centaines de clés USB sont scellées dans des murs et autres lieux publics à travers le monde, formant un "réseau d'échanges ouvert".

      Le nom "Dead Drops" est une référence aux "boîtes aux lettres mortes" utilisées en espionnage pour déposer des documents de façon anonyme.

      Caractère Participatif : Chacun peut installer une "dead drop" dans sa ville, contribuant ainsi à l'expansion du réseau.

      Évolution Politique : Initialement artistique, le projet a acquis une nouvelle signification militante avec la montée en puissance de Big Tech.

      Il invite désormais à s'interroger sur les moyens d'échapper "à la dépendance vis-à-vis des plateformes numériques, mais aussi à leur surveillance".

      Serveurs DIY du Collectif Actinomy : Reprendre le Contrôle de l'Hébergement

      Pour atteindre une autonomie complète, le collectif Actinomy, basé à Brême, propose d'apprendre à construire soi-même des serveurs locaux et privés.

      Concept : Lors d'ateliers "do-it-yourself", les participants fabriquent de "petits serveurs informatiques portables en forme de porte-clés".

      Fonctionnement : Ces mini-serveurs ne peuvent héberger qu'une page internet légère, accessible uniquement via un réseau Wi-Fi local très restreint.

      Le site n'est pas visible sur l'internet mondial mais sur un "petit réseau parallèle".

      Objectif : Cette démarche vise une "reprise de contrôle sur ses informations".

      Elle est comparée à la création de "sa propre chambre à soi dans la grande maison de l'Internet mondiale", en opposition aux immenses fermes de serveurs centralisées qui tournent 24/7.

      4. Principes et Philosophie du Mouvement Sneakernet

      Au-delà des aspects techniques, le sneakernet est porteur d'une vision critique et d'une philosophie alternative de la technologie.

      Critique de "l'Obésité de la Donnée" : Le mouvement remet en question la logique du "toujours plus" (plus de vitesse, plus de stockage).

      Il s'interroge : "Est-ce qu'on veut juste envoyer des fichiers hyper lourds le plus vite possible et tout, ou est-ce que on veut retrouver un certain affect par rapport aux données ?"

      Valorisation de la Donnée Précieuse : Dans un contexte de transfert physique, les participants ont tendance à apporter une "petite quantité de données", généralement celles qui "leur semblent précieuses", une sélectivité qui se perd avec le haut débit.

      L'Ordinateur comme un "Commun" : Le partage d'une seule machine lors des data-foires transforme l'ordinateur d'un objet personnel en une ressource collective, modifiant la relation individuelle à la technologie.

      Conscience Énergétique et Matérielle : L'utilisation de systèmes d'alimentation autonomes et de récupération force à une réflexion sur la gestion collective de l'énergie et sur l'empreinte matérielle des infrastructures numériques.

      Sécurité par la Proximité : La présence physique des participants lors des échanges crée une forme d'autorégulation et de responsabilité qui n'existe pas dans les interactions en ligne anonymes.

      5. Conclusion : Une Vision Alternative pour l'Avenir Numérique

      Le mouvement sneakernet, bien qu'il puisse paraître "utopique ou rétrograde", constitue une critique pertinente et une alternative tangible à l'écosystème numérique actuel.

      Il démontre que l'autonomie face aux grandes plateformes est possible.

      Le futur, selon cette perspective, pourrait impliquer de "moins stocker, moins partager et héberger en local pour vraiment échapper au contrôle des grandes plateformes".

      Ces alternatives artisanales et autonomes ne sont pas de simples expérimentations ; elles représentent une proposition politique concrète face aux défis de la surveillance et de la centralisation du pouvoir numérique.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03206

      Corresponding author(s____): Teresa M. Przytycka

      General Statements

      We thank all the reviewers for their time and their constructive criticism, based on which we have revised our manuscript. All review comments in are italics. Our responses are indicated in normal font except the excerpts from manuscript which are shown within double quote and in italics. The line numbers indicated here refer to those in the revised manuscript.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This paper addresses the interesting question of how cell size may scale with organ size in different tissues. The approach is to mine data from the fly single cell atlas (FCA) which despite its name is a database of gene expression levels in single isolated nuclei. Using this data, they infer cell size based on ribosomal protein gene expression, and based on this approach infer that there are tissue and sex specific differences in scaling, some of which may be driven by differences in ribosomal protein gene expression.

      Response: Indeed, using the FCA dataset, we infer sex-specific differences in both cell size and cell number, which we validated with targeted experiments. We show that Drosophila cell types scale through distinct strategies-via cell size, cell number, or a mix of both-in an allometric rather than uniform fashion. We further propose that these scaling differences are driven, at least in part, by variation in translational activity, reflected in the expression of ribosomal proteins, translation elongation factors, and Myc.

      -----------------------------------------------------------

      I think the idea of mining this database is a clever one, however there a number of concerns about whether the existing data can really be used to draw the conclusions that are stated.

      __Response: __We are pleased to see that the reviewer found the question and our approach interesting.

      -----------------------------------------------------------

      *One concern has to do with the assumption that RP (ribosome protein) expression is a proxy for cell size. It is well established that ribosome abundance scales with cell size, but is there reason to believe that ribosome nuclear gene EXPRESSION correlates with ribosome abundance? *

      I'm not saying that this can't be true, but it seems like a big assumption that needs to be justified with some data. Maybe this is well known in the Drosophila literature, but in that case the relevant literature really needs to be cited.

      __Response: __To avoid any misunderstanding: we use sex-biased RP expression as an indicator of sex differences in cell size only within the same cell type or subtype, as defined by expression-based clustering in the FCA-not as a general estimator of cell size. This measure is applied strictly within the same clusters, never between different ones. To prevent overinterpretation, we replaced the term 'proxy' with 'indicator,' since the earlier wording might have implied that ribosomal gene expression was being used to estimate cell size more broadly.

      We should have begun by providing more background on the well-established link between ribosomal protein gene dosage and cell growth. This context was missing from the introduction, so we have now added a full paragraph outlining what is known about this connection:

      *Added at line 85: *

      "Cell growth, which supports both cell enlargement and cell division, demands elevated protein synthesis, accomplished by boosting translation rates. Indeed, ribosome abundance is known to scale with cell size in many organisms (Schmoller and Skotheim 2015; Cadart and Heald 2022; Serbanescu et al. 2022). Long before it was known that DNA was the carrier of genetic information, Drosophila researchers had identified a large class of mutations known as "Minutes" (Schultz 1929). These were universally haplo-insufficient. A single wild type copy resulted in a tiny slowly growing fly, and the homozygous loss-of-function alleles were lethal. In clones, the Minute cells are clearly smaller and compete poorly with surrounding wild type cells. We now know that most of the Minute loci encode ribosomal proteins (Marygold et al. 2007). Similarly, the Drosophila diminutive locus, also characterized by small flies almost a century ago, is now known to encode the Myc oncogene (Gallant 2013). This is significant as Myc is a regulator of ribosomal protein encoding genes in metazoans, including Drosophila (Grewal et al. 2005). The ribosome is assembled in a specialized nuclear structure called the nucleolus (Ponti 2025). Across species, including Drosophila (Diegmiller et al. 2021) and C. elegans (Ma et al. 2018), nucleolar size scales with cell size and is broadly correlated with growth in cell size and/or cell number, processes that are directly relevant to sex-specific allometry. Collectively, these and many other studies offer compelling evidence that ribosomal biogenesis is positively associated with cell size and growth, underscoring the value of measuring ribosome biogenesis as a metric."

      We understand that the reviewer is asking whether reduced RP mRNA expression directly leads to reduced functional ribosome assembly. We do not have a definitive answer to that specific question. However, we directly measured translation in fat body cells (section: Female bias in ribosomal gene expression in fat body cells leads to sex-biased protein synthesis), and the results show a clear correlation between RP gene expression and biosynthetic activity; even though we did not track every step from transcription to ribosome assembly to polysome loading across all cell types. This would indeed be an excellent direction for future work, including polysome profiling and related assays. Importantly, we did examine the nucleolus (Figure 4), where ribosome assembly occurs, and showed that nucleolar volume scales with RP gene expression. This strongly supports the presence of sex-specific differences in ribosome biogenesis.

      Added at line 115:

      "Building on the earlier studies noted above, as well as our direct measurements of translation bias in the fat body, nucleolar size, and cell size, we used sex-biased expression of ribosomal proteins as an indicator of sex differences in per-nucleus cell size."

      -----------------------------------------------------------

      Second, the interpretation of RP expression as a proxy for cell size seems potentially at odds with the fact that some cells are multi-nucleate. Those cells are big because of multiple nuclei, and so they might not show any increase in ribosome expression per nucleus. presumably for multi-nucleate cells, RP expression if it reflects anything at all would be something to do with cell size PER nucleus.

      Response: Yes, this is a very important point, and this is why we chose multinucleated indirect flight muscles for our direct experimental analysis. We show that in indirect flight muscle cells, adult cell size is greatly influenced by the sex-specific number of nuclei per cell. The female muscle cells are larger and have larger nuclei count per cell. Additionally, they also have higher expression of ribosomal protein coding genes. As the latter data are from the single nucleus sequencing atlas, this already demonstrates what this reviewer is asking for: per nucleus, female muscle cells express more ribosome protein coding mRNAs.

      -----------------------------------------------------------

      *Third, it is well known that many tissues in Drosophila are polyploid or polytene. I don't know enough about the methodology used to produce the FCA to know whether this is somehow normalized. Otherwise, my hypothesis would be that nuclei showing higher RP expression might just be polyploid or polytene. You might say that this could be controlled by asking if all genes are similary upregulated, but that isn't the case since at least in polytene chromosomes it is well known that only a small number of genes are expressed at a given time, while many are silent. *

      Response: Yes, this is an excellent point. As noted above, our study does not distinguish among the different potential causes of sex differences in ribosomal mRNA copy number, as these may vary across cell types. We now explicitly acknowledge it in the discussion (line 327). Importantly, even in the cases when ribosomal gene expression bias primarily reflects differences in DNA content, this still represents a plausible mechanistic route linking ribosomal gene expression to increased nucleolar ribosome biogenesis and, ultimately, larger cell size. This possibility does not alter our main conclusions.

      -----------------------------------------------------------

      Overall, I think a lot more foundational work would need to be done in order to allow the inference of cell size from RP expression. In a way, it is a bit unfortunate that they chose to do this work in Drosophila where so many cells are polyploid, although I gather that even in humans some tissues have this issue, for example large neurons in the brain.

      Response: We acknowledge that we did not clearly reference some of the foundational work in the literature. To address this, we have expanded the introduction to provide additional background and context. We also clarify that our fat body experiment offers independent support for the relationship between ribosomal gene expression bias, nuclear size bias, and corresponding biases in protein synthesis, thereby reinforcing the use of sex-specific ribosomal gene expression as an indicator of sex-specific cell size. Importantly, we assess this bias only within clusters, not between them. These clusters are derived from gene-expression-based clustering and are therefore relatively homogeneous. For example, as discussed in our response to Reviewer #3, the fat body contains several clusters that correspond to expression-defined subtypes of fat body cells. Our previous terminology may have inadvertently implied that we were using ribosomal gene expression to estimate cell size more broadly, which was not our intention.

      As for the choice of the organism, most of the authors are Drosophila researchers and we benefit from the unique, highly replicated data from whole head and whole body of both sexes. Such data is necessary for a non-biased estimation of the differences in nuclear number.

      -----------------------------------------------------------

      *Reviewer #1 (Significance (Required)):

      The idea that gene regulatory networks could "program" differences in scaling by changing levels of ribosomal protein gene expression is a tremendously important one if it can be established, because it would show a simple way for size scaling to be placed under control of developmental regulatory pathways. My original concern when I first looked at the abstract was going to be that yeah the results are interesting but a mechanism is not provided, but as I read it, that concern went away. showing that RP gene expression, which could be programmed by various driving pathways, can affect allometric scaling, would be extremely impactful and really change how we think about scaling, but putting it into the framework of gene expression networks that control other aspects of developmewnht. it would not be necessary to show which pathways actually drive these expression differences, the fact that they are different would be interesting enough to make everyone want to read this paper. But as discussed above I am not, however, convinced by the evidence presented here. So while I think it would be very significant if true, I am not convinced that the conclusion is well supported. This doesn't mean I have a reason to think it is false, just that its not well supported for the reasons I have given.*

      Response: We are grateful to the reviewer for this positive assessment of our findings despite lack of a specific mechanism. We also regret that our initial writing did not clearly situate our work within the foundational literature on the relationship between ribosomal biogenesis and scaling. The key contribution of our study is to demonstrate that sex-biased ribosomal biogenesis plays a role in allometric scaling, providing a basis for future mechanistic exploration. We hope that the revised manuscript now offers clear and compelling support for the conclusion that RP gene expression bias can influence allometric scaling.

      -----------------------------------------------------------

      I hasten to point out that I could be entirely wrong, if the missing bits of logic (i.e. that RP expression matches ribosome abundance and that gene expression in the FCA dataset isn't influenced by ploidy of the nucleus). If suitable references can be provided to support these underlying assumptions, then in fact I think these concerns could be answered with very little effort. Otherwise, I think experiments would be needed to support these assumptions, and that might be non-trivial to do in a reasonable time frame. for that reason, in the next question I have put "cannot tell" for the time estimate.

      Response: While gene expression in some FCA cell types may indeed be influenced by ploidy, our analysis does not depend on distinguishing among the possible sources of gene expression bias, which may vary across cell types. Rather, our key point is that-regardless of its origin-an increase in ribosomal gene expression is associated with enhanced ribosome biogenesis in the nucleolus and, ultimately, larger cell size. Thus, our main conclusions do not rely on any specific mechanism underlying RP gene expression upregulation. We now include additional references supporting the relationship between RP expression bias and cell size bias. We also strengthen the link between ribosomal gene expression and biosynthetic activity by clarifying its relationship with sex-biased Myc expression and the strong correlation with expression bias of EF1. We now include additional references supporting the relationship between RP expression bias and cell size bias. We also strengthen the link between ribosomal gene expression and biosynthetic activity by clarifying its relationship with sex-biased Myc expression and the strong correlation with expression bias of EF1.

      We thank the reviewer for their thoughtful and constructive comments, which have prompted us to clarify both our reasoning and the relevant literature more fully.

      -----------------------------------------------------------

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors analyzed the FlyAtlas single-nucleus dataset to identify sex differences in gene expression and cell numbers. This led them to focus on muscles, cardiomyocytes, and fat body cells. They then measured cell and nucleolus size across different tissues and showed that reducing Myc function decreases sex differences in fat body cells. Overall, the manuscript provides a characterization of dimorphic differences in cell and organ size across three tissues.*

      Response: This is a nice synopsis of the work.

      -----------------------------------------------------------

      Major Comments: The major claims of the manuscript are well supported by the reported experiments and analyses. While Reviewer #2 considered the major claims of the manuscript to be well supported, by the reported experiments and analysesStatistical analyses appear adequate.

      Response: We agree, and we are glad that the reviewer found our work well supported.

      -----------------------------------------------------------

      *Minor Comments: The following minor issues should be addressed through textual edits:In the Introduction:

      "Disruptions in proportionality, whether due to undergrowth or overgrowth, can lead to reduced fitness or diseases such as cancer." Could the authors provide a reference for this statement, particularly for the claim that disruptions in proportion*

      Response: We apologize for this omission. The following explanation is now included starting at line 39:

      "For example, scaled cell growth is a driver of symmetry in Myc-dependent scaling of bone growth in the skeleton by chondrocyte proliferation (Ota et al. 2007; Zhou et al. 2011). Increased nucleolus size is a well known marker of cancer progression in a histopathological setting (Pianese 1896; Derenzini et al. 1998; Elhamamsy et al. 2022)."

      -----------------------------------------------------------

      *The authors state:

      "This study offers a comprehensive, cellular-resolution analysis of sexual size dimorphism in a model organism, uncovering how differences in cell number and size contribute to sex-specific body plans."*

      The study cannot be considered comprehensive, as not all organs were examined.

      Response: Indeed, "comprehensive" is a loaded word and in the revised manuscript we just omitted it.

      -----------------------------------------------------------

      *The following sentence from the abstract is unclear:

      "By uncovering how a conserved developmental system produces sex-specific proportions through distinct cellular strategies..."*

      * What do the authors mean by a conserved developmental system? Do they refer to a commonly used developmental model, or to a developmental system that is evolutionarily conserved?*

      Response: We acknowledge that the use of the word 'conserved' was inappropriate, and we have therefore removed it from the statement.

      -----------------------------------------------------------

      *Reviewer #2 (Significance (Required)):

      The manuscript presents a relevant exploration of sex-specific differences in cell size and cell number in Drosophila males and females. The limitations of the study are clearly acknowledged in the "Limitations" section. The work does not provide mechanistic insight into the causes or functional consequences of the observed differences. Nonetheless, the study extends our understanding of sexual dimorphism and establishes a foundation for future investigations into the autonomous and systemic mechanistic factors that regulate these differences.*

      Response: Thank you.

      -----------------------------------------------------------

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Pal and colleagues addresses an important question: the cellular mechanisms underlying sex differences in organ size. By leveraging single-nucleus transcriptomic data from the adult Drosophila Cell Atlas, the authors show that different cell types adopt distinct strategies to achieve sex differences in organ size-either by increasing cell size or by altering cell number. They then focus on three organs-the indirect flight muscles, the heart, and the fat body-and provide supporting evidence for their transcriptomic analyses.*

      Response: This is a nice summary of the study. Thank you.

      -----------------------------------------------------------

      This study tackles a highly relevant and often overlooked question, as our understanding of the molecular and cellular events driving sex differences remains incomplete. The work presents interesting observations; however, it is largely descriptive, establishing correlations without providing functional evidence or mechanistic insight.

      Response: We agree that this is an often overlooked problem that has been difficult to address experimentally without single-cell genomics. Our work aims to help fill this gap. While the paper does contain descriptive elements, we believe such characterization is important at the early stages of developing a new area of inquiry. The study explores a unique dataset and includes experimental validation to support key observations. We also propose how allometry may be shaped by cell division and cell size, drawing on well-established molecular mechanisms. Thus, the reviewer's comment regarding a lack of mechanistic insight likely pertains to the absence of a direct connection to the sex-determination pathway, which is beyond the scope of the current study.

      -----------------------------------------------------------

      Below are four main points that should be addressed before publication: 1. Introduction and contextualisation of prior work The introduction does not adequately present the current state of knowledge. Several key studies are missing or insufficiently discussed. In particular, the following works should be included and integrated into the introduction: - PMID: 26710087 - shows that the sex determination gene transformer regulates male-female differences in Drosophila body size. - PMID: 28064166 - describes how differences in Myc gene dosage contribute to sex differences in body size. - PMID: 26887495 - demonstrates that the intrinsic sexual identity of adult stem cells can control sex-biased organ size through sex-biased proliferation. - PMID: 28976974 - reveals that Sxl modulates body growth through both tissue-autonomous and non-autonomous mechanisms. - PMID: 39138201 - shows that transformer drives sex differences in organ size and body weight. Incorporating and discussing these references would provide a more comprehensive and up-to-date framework for the study.

      Response: We agree that the literature suggested by the reviewer strengthens the introduction and improves the contextualization of prior work relevant to our study. Although much of it was previously included in the discussion section on cell-autonomous and hormonal regulation, it has now been moved to the introduction, along with the discussion of the papers suggested by the reviewer (beginning at line 58).

      "In Drosophila melanogaster, adult females are substantially larger than males (Fig. 1A1), yet both sexes develop from genetically similar zygotes and share most organs and cell types. In wild type flies, sex is determined by the number of X chromosomes in embryos, with XX flies developing as females and X(Y) flies developing as males due to the activation and stable expression of Sex-lethal only in XX flies (Erickson and Quintero 2007). While it is not entirely clear how sexually dimorphic size is regulated, the sex determination pathway is implicated in size regulation. Sex-reversed flies often show a size based on the X chromosome number rather than sexual morphology. Female Sex-lethal contributes to larger female size independently of sexual identity (Cline 1984), and Sex-lethal expression in insulin producing neurons in the brain also impacts body size (Sawala and Gould 2017). Female-specific Transformer protein is produced as a consequence of female-specific Sex-lethal and also contributes to increased female size (Rideout et al. 2015). This size scaling also applies to individual organs. For example, the Drosophila female gut is longer than the male gut due Transformer activity (Hudry et al. 2016). It has also been suggested that Myc dose (it is X-linked) is a regulator of body size (Mathews et al. 2017), although the failed dosage compensation model proposed has not been demonstrated."

      And again at line 74:

      "These studies show that size is regulated, but they do not address whether scaling is uniform or non-uniform and the mechanism for sexual size differences (SSD). The origins of SSD can, in principle, arise from differences in (i) gene expression, (ii) the presence of sex-specific cell types, (iii) the number of cell-specific nuclei, or (iv) the size (per nucleus) of those cells. Previous research in Drosophila has largely focused on gene expression in sex-specific organs like the gonads (Arbeitman et al. 2002; Parisi et al. 2004; Graveley et al. 2011; Pal et al. 2023), which are governed by a well-characterized sex-determination pathway (Salz and Erickson 2010; Clough and Oliver 2012; Raz et al. 2023) However, whether and how scaling differences in shared, non-sex-specific tissues are achieved via changes in cell size and number remains largely unexamined (Fig. 1A2). These studies show that size is regulated, but they do not address whether scaling is uniform or non-uniform and the mechanism for size differences."

      -----------------------------------------------------------

      2. Use of ribosomal gene expression as a proxy for cell size The authors use ribosomal gene expression levels as a proxy for cell size, but this assumption is not adequately justified. The cited references (refs. 20-22) focus on unicellular organisms (bacteria and yeast) or cleavage divisions in frog embryos, which are fundamentally different from adult Drosophila tissues. The authors should provide evidence that ribosome abundance scales with cell size across the distinct adult Drosophila cell types. Given that most adult fly tissues are post-mitotic, it is more likely that ribosomal gene expression reflects protein synthesis activity rather than cell size, particularly in secretory cell types.

      Response: Reviewer 1 raised a similar point, and we agree. We recognize that the term "proxy" may have been misleading. We use this measure only in the context of sex bias within homogeneous cell clusters, and not between clusters, even when such clusters share the same cell-type annotation. To avoid overinterpretation, we changed "poxy" to "indicator".

      In response to the reviewer's concern, we have expanded our discussion of the relevant supporting literature (additional text starting line 75). We have also directly measured translation in the fat body cells (section: Female bias in ribosomal gene expression in fat body cells leads to sex biased protein synthesis), which clearly demonstrates a correlation between ribosomal protein gene expression and biosynthetic activity. Although, we have not traced the chain of events from expression to ribosome assembly to polysome loading in all cell types, we did examine the nucleolus (Figure 4), where ribosomes are assembled, and we make a strong point that the volume of the nucleolus scales like ribosome protein gene expression. This provides strong evidence for sex-specific ribosome biogenesis contributing to cell size.

      Furthermore, the observation that ribosomal gene expression likely reflects protein synthesis activity is not at odds with increased cell size: biosynthesis increases in larger cells (Schmoller and Skotheim 2015). We have added a panel to Figure 4 showing the relationship between ribosomal gene expression bias and the average expression bias of Eukaryotic Elongation Factor 1 (eEF1).

      -----------------------------------------------------------

      3. Relationship between Myc and sex-biased Rp expression The proposed link between Myc and sex-biased Rp expression is unclear. Panels D and E of Figure 1 show no consistent relationship: some cell types with strong Rp sex bias exhibit either high or low female Myc bias, or even a male bias. The linear regression in Figure 4I (R = 0.07, p = 0.59) confirms the lack of correlation. The authors should clarify this point and adopt a more cautious interpretation regarding Myc as a potential regulator of sex-biased Rp expression and cell size differences. Experimentally, using Myc hypomorph or heterozygous conditions would be more appropriate than complete knockdown to test its role.

      Response: Thank you for noting that the relationship between Myc expression bias and sex-biased RP expression required clarification. This response was prepared in consultation with Myc expert Dr. David Levens.

      We demonstrate that both Myc and RP gene expression exhibit an overall female bias in the body. The absence of a strong correlation across cell clusters does not invalidate this conclusion. Myc is a well-established master regulator of ribosome biogenesis, but its quantitative effects are complex. According to recent models of Myc-mediated gene regulation (Nie et al. 2012; Lin et al. 2012), Myc upregulates all actively transcribed genes. Because this regulation is global, the relationship between changes in Myc expression and corresponding changes in ribosomal protein gene expression depends on cell type. Moreover, (Lorenzin et al. 2016) demonstrated that ribosomal protein genes saturate at relatively low levels of Myc, which helps explain why we observe a correlation in head cell clusters-where Myc expression is lower-but not in body clusters.

      Importantly, on average, the female-specific Myc expression bias is stronger in body cell clusters than in head cell clusters, consistent with the stronger female bias in ribosomal protein gene expression observed in the head relative to the body.

      To make this relationship more transparent, we combined the head and body clusters, which yielded a strong overall correlation (Fig. 4J, replacing the previous Fig. 4H).

      To further strengthen the evidence linking ribosomal gene expression to cell size, we also examined the relationship between ribosomal gene expression bias and Elongation Factor 1 (eEF1) expression bias, a key component of protein biosynthesis during the elongation step of translation. The resulting correlation exceeds 0.9 (new Fig. 4H, added as an additional panel in Fig. 4).

      -----------------------------------------------------------

      4. Conclusions about fat body cell number I have concerns about drawing conclusions on sex differences in fat body cell number from single-nucleus transcriptomic data for two reasons:

      1- Drosophila fat body tissue is heterogeneous, comprising distinct subpopulations (e.g., visceral fat cells, subcuticular fat cells), some of which are sex-specific-such as fat cells associated with the spermathecae in females.

      Response: Thank you for giving us the opportunity to clarify our analysis of the FCA data. Our approach does account for subpopulations within the fat body as well as within other cell types. Based on gene expression profiles, we identify three fat body clusters, all of which are reported in Table S3. One small female-specific cluster (

      When all fat body clusters are combined into a single supercluster, this supercluster still shows a male bias. We have now clarified this point in the manuscript (line 113). Note that both subclusters of fat body are already shown in Fig. 1C and 1D.

      -----------------------------------------------------------

      2- Adult fat body cells can be multinucleated (PMID: 13723227). Apparent sex differences in nucleus number may reflect differences in specific subpopulations or degrees of multinucleation rather than true differences in cell number. To strengthen the conclusions, the analysis should be performed at the level of fat body subpopulations, distinguishing clusters where possible. Additionally, quantifying nuclei relative to actual cell number-as done for muscle tissue-would clarify whether observed sex differences reflect true variation in cell number or differences in nuclear content per cell.

      Response: Yes, some cells can be multinucleate. We specifically address this in the context of muscle cells, where multinucleation is prominent, and we also conducted experimental validation in this tissue. As noted above, our analysis is performed at the subpopulation level, since clusters are defined by expression similarity (Leiden resolution 4.0) rather than by annotation.

      Because our work relies on single-nucleus data, each nucleus is treated as an individual unit of analysis. Nevertheless, we observe genuine nuclear differences within each cluster. Importantly, the presence of multinucleated cells does not alter our conclusions; it simply represents one form of variation in cell number that can be thought of as a subcomponent of cell/nuclei number.

      -----------------------------------------------------------

      Minor corrections/points: 1-The term body size in the title does not accurately reflect the content of the paper. I recommend replacing it with organ size to better align with the study's focus.

      Response: Thank you for the suggestion.

      ----------------------------------------------------------- 2-The term sexual size dimorphism is somewhat inaccurate in this context. Sex differences in size would be more appropriate. The term sexual dimorphism typically refers to traits that exhibit two distinct forms in males and females-such as primary or secondary sexual characteristics like sex organs or sex combs. In contrast, size is a quantitative trait that follows a normal distribution. Although the average female may be larger than the average male, the distributions overlap, making the term dimorphism imprecise.

      Response: Thank you for the suggestion.

      -----------------------------------------------------------

      3-In Figure 2E, there appears to be an inconsistency between the text, figure legend, and the data presented. The text and legend state that the total volume of dorsal longitudinal flight muscle cells was quantified, whereas the graph indicates measurements of nuclear size. This discrepancy should be clarified.

      Response: Thank you for pointing this out. We figured out that Y-axis label in the graph was incorrect and it is now fixed.

      -----------------------------------------------------------

      4-The authors proposed: "This increased biosynthetic activity in fat body cells may contribute to cell size differences, but also to the regulation of body size via production of factors that mediate body growth via interorgan communication". Please note that this hypothesis has already been tested functionally in PMID: 39138201 and was shown to be incorrect. Sex differences in body size are completely independent of fat body sexual identity or any intrinsic sex differences within fat cells.

      __Response: __We thank the reviewer for the opportunity to discuss why the data shown in PMID 39138201 (Hérault et al. 2024) do not rule out a model in which the fat body contributes to the sex-specific regulation of body size via interorgan communication. The main reason data in Herault et al cannot rule out such a model is that they use wing size as a proxy for body size. This is in contrast to prior studies, such as (Rideout et al. 2015), in which pupal volume was used to directly measure body size and show a non-autonomous effect of sex determination gene transformer on body size. Measuring body size directly is a more precise readout of growth during the larval stages of development, as opposed to using adult wing area which reflects the growth of a single organ. It is also important to note that the diets used to rear flies in Herault and Rideout differ, which is an important consideration as females do not achieve their maximal size without high dietary protein levels (Millington et al. 2021). To ensure all these points are communicated to readers, we added text to this effect in the revised version of our manuscript.

      Added at line 254:

      "This increased biosynthetic activity in fat body cells may contribute to cell size differences, but also to the regulation of body size via production of factors that mediate body growth via interorgan communication (Colombani et al. 2003; Géminard et al. 2009; Rajan and Perrimon 2012; Sano et al. 2015; Koyama and Mirth 2016). Indeed, one study showed the sexual identity of the fat body influenced pupal volume, which is an accurate readout of larval growth (Rideout et al. 2015; Delanoue et al. 2010). While a recent study suggests that male-female differences in body size were regulated independently of fat body sexual identity (Hérault et al. 2024), this study measured the growth of a single organ, the wing, as a proxy for body size. Additional studies are therefore needed to resolve whether fat body protein synthesis plays an important role in regulating sex differences in body size."

      -----------------------------------------------------------

      *5-The authors state: "This demonstrate that Myc plays a key role in regulating the sex difference in nucleolar size." This is an overstatement given the functional data presented. The claim should be toned down to reflect the limited evidence.

      **Referee cross-commenting**

      I completely agree with the main comments of Reviewer 1, as they address the paper's core.*

      Response: We have addressed the comments of Reviewer 1 in the response to reviewer's comments above.

      -----------------------------------------------------------

      *Reviewer #3 (Significance (Required)):

      The main novelty and strongest aspect of this study is its use of single-nucleus transcriptomic data from the adult Drosophila Cell Atlas to investigate how different cell types adopt distinct strategies to generate sex differences in organ size-either by increasing cell size or by altering cell number. Previous studies have largely focused on specific tissues, whereas this work provides a comprehensive, organism-wide view that encompasses all tissues, enabling direct cross-comparison between organs. This represents a clear advance in the field, primarily from a technical perspective, by leveraging organism-wide single-cell transcriptomics. The main limitations lie in the lack of functional experiments and mechanistic insights. Moreover, the proposed mechanism-differences in Myc gene dosage or expression levels-is not entirely novel, as Myc dosage has previously been implicated in contributing to sex differences in body size (PMID: 28064166).*

      Response: We do have some functional testing in the 3 tissues, flight muscle, heart and fat body, however, providing mechanistic insights is beyond the scope of this paper. The paper suggested by the reviewer is an example of one attempt to provide such a mechanism, probably not the only one. We hope that our rich data that we have assembled in this paper provide resources for generating hypotheses and stimulate further research.

      -----------------------------------------------------------

      References

      Cadart, Clotilde, and Rebecca Heald. 2022. "Scaling of Biosynthesis and Metabolism with Cell Size." Molecular Biology of the Cell 33 (9): pe5. https://doi.org/10.1091/mbc.E21-12-0627.

      Diegmiller, Rocky, Caroline A. Doherty, Tomer Stern, Jasmin Imran Alsous, and Stanislav Y. Shvartsman. 2021. "Size Scaling in Collective Cell Growth." Development (Cambridge, England) 148 (18): dev199663. https://doi.org/10.1242/dev.199663.

      Gallant, Peter. 2013. "Myc Function in Drosophila." Cold Spring Harbor Perspectives in Medicine 3 (10): a014324. https://doi.org/10.1101/cshperspect.a014324.

      Grewal, Savraj S., Ling Li, Amir Orian, Robert N. Eisenman, and Bruce A. Edgar. 2005. "Myc-Dependent Regulation of Ribosomal RNA Synthesis during Drosophila Development." Nature Cell Biology 7 (3): 295-302. https://doi.org/10.1038/ncb1223.

      Hérault, Chloé, Thomas Pihl, and Bruno Hudry. 2024. "Cellular Sex throughout the Organism Underlies Somatic Sexual Differentiation." Nature Communications 15 (1): 6925. https://doi.org/10.1038/s41467-024-51228-6.

      Lin, Charles Y., Jakob Lovén, Peter B. Rahl, et al. 2012. "Transcriptional Amplification in Tumor Cells with Elevated C-Myc." Cell 151 (1): 56-67. https://doi.org/10.1016/j.cell.2012.08.026.

      Lorenzin, Francesca, Uwe Benary, Apoorva Baluapuri, et al. 2016. "Different Promoter Affinities Account for Specificity in MYC-Dependent Gene Regulation." eLife 5 (July): e15161. https://doi.org/10.7554/eLife.15161.

      Ma, Tian-Hsiang, Po-Hsiang Chen, Bertrand Chin-Ming Tan, and Szecheng J. Lo. 2018. "Size Scaling of Nucleolus in Caenorhabditis Elegans Embryos." Biomedical Journal 41 (5): 333-36. https://doi.org/10.1016/j.bj.2018.07.003.

      Marygold, Steven J., John Roote, Gunter Reuter, et al. 2007. "The Ribosomal Protein Genes and Minute Loci of Drosophila Melanogaster." Genome Biology 8 (10): R216. https://doi.org/10.1186/gb-2007-8-10-r216.

      Millington, Jason W., George P. Brownrigg, Charlotte Chao, et al. 2021. "Female-Biased Upregulation of Insulin Pathway Activity Mediates the Sex Difference in Drosophila Body Size Plasticity." eLife 10 (January): e58341. https://doi.org/10.7554/eLife.58341.

      Nie, Zuqin, Gangqing Hu, Gang Wei, et al. 2012. "C-Myc Is a Universal Amplifier of Expressed Genes in Lymphocytes and Embryonic Stem Cells." Cell 151 (1): 68-79. https://doi.org/10.1016/j.cell.2012.08.033.

      Ponti, Donatella. 2025. "The Nucleolus: A Central Hub for Ribosome Biogenesis and Cellular Regulatory Signals." International Journal of Molecular Sciences 26 (9): 4174. https://doi.org/10.3390/ijms26094174.

      Rideout, Elizabeth J., Marcus S. Narsaiya, and Savraj S. Grewal. 2015. "The Sex Determination Gene Transformer Regulates Male-Female Differences in Drosophila Body Size." PLOS Genetics 11 (12): e1005683. https://doi.org/10.1371/journal.pgen.1005683.

      Schmoller, Kurt M., and Jan M. Skotheim. 2015. "The Biosynthetic Basis of Cell Size Control." Trends in Cell Biology 25 (12): 793-802. https://doi.org/10.1016/j.tcb.2015.10.006.

      Schultz, J. 1929. "The Minute Reaction in the Development of DROSOPHILA MELANOGASTER." Genetics 14 (4): 366-419. https://doi.org/10.1093/genetics/14.4.366.

      Serbanescu, Diana, Nikola Ojkic, and Shiladitya Banerjee. 2022. "Cellular Resource Allocation Strategies for Cell Size and Shape Control in Bacteria." The FEBS Journal 289 (24): 7891-906. https://doi.org/10.1111/febs.16234.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript by Pal and colleagues addresses an important question: the cellular mechanisms underlying sex differences in organ size. By leveraging single-nucleus transcriptomic data from the adult Drosophila Cell Atlas, the authors show that different cell types adopt distinct strategies to achieve sex differences in organ size-either by increasing cell size or by altering cell number. They then focus on three organs-the indirect flight muscles, the heart, and the fat body-and provide supporting evidence for their transcriptomic analyses.

      This study tackles a highly relevant and often overlooked question, as our understanding of the molecular and cellular events driving sex differences remains incomplete. The work presents interesting observations; however, it is largely descriptive, establishing correlations without providing functional evidence or mechanistic insight.

      Below are four main points that should be addressed before publication:

      1. Introduction and contextualisation of prior work The introduction does not adequately present the current state of knowledge. Several key studies are missing or insufficiently discussed. In particular, the following works should be included and integrated into the introduction:
        • PMID: 26710087 - shows that the sex determination gene transformer regulates male-female differences in Drosophila body size.
        • PMID: 28064166 - describes how differences in Myc gene dosage contribute to sex differences in body size.
        • PMID: 2688749 - demonstrates that the intrinsic sexual identity of adult stem cells can control sex-biased organ size through sex-biased proliferation.
        • PMID: 28976974 - reveals that Sxl modulates body growth through both tissue-autonomous and non-autonomous mechanisms.
        • PMID: 39138201 - shows that transformer drives sex differences in organ size and body weight. Incorporating and discussing these references would provide a more comprehensive and up-to-date framework for the study.
      2. Use of ribosomal gene expression as a proxy for cell size The authors use ribosomal gene expression levels as a proxy for cell size, but this assumption is not adequately justified. The cited references (refs. 20-22) focus on unicellular organisms (bacteria and yeast) or cleavage divisions in frog embryos, which are fundamentally different from adult Drosophila tissues. The authors should provide evidence that ribosome abundance scales with cell size across the distinct adult Drosophila cell types. Given that most adult fly tissues are post-mitotic, it is more likely that ribosomal gene expression reflects protein synthesis activity rather than cell size, particularly in secretory cell types.
      3. Relationship between Myc and sex-biased Rp expression The proposed link between Myc and sex-biased Rp expression is unclear. Panels D and E of Figure 1 show no consistent relationship: some cell types with strong Rp sex bias exhibit either high or low female Myc bias, or even a male bias. The linear regression in Figure 4I (R = 0.07, p = 0.59) confirms the lack of correlation. The authors should clarify this point and adopt a more cautious interpretation regarding Myc as a potential regulator of sex-biased Rp expression and cell size differences. Experimentally, using Myc hypomorph or heterozygous conditions would be more appropriate than complete knockdown to test its role.
      4. Conclusions about fat body cell number I have concerns about drawing conclusions on sex differences in fat body cell number from single-nucleus transcriptomic data for two reasons:

      1) Drosophila fat body tissue is heterogeneous, comprising distinct subpopulations (e.g., visceral fat cells, subcuticular fat cells), some of which are sex-specific-such as fat cells associated with the spermathecae in females.

      2) Adult fat body cells can be multinucleated (PMID: 13723227). Apparent sex differences in nucleus number may reflect differences in specific subpopulations or degrees of multinucleation rather than true differences in cell number. To strengthen the conclusions, the analysis should be performed at the level of fat body subpopulations, distinguishing clusters where possible. Additionally, quantifying nuclei relative to actual cell number-as done for muscle tissue-would clarify whether observed sex differences reflect true variation in cell number or differences in nuclear content per cell.

      Minor corrections/points:

      1. The term body size in the title does not accurately reflect the content of the paper. I recommend replacing it with organ size to better align with the study's focus.
      2. The term sexual size dimorphism is somewhat inaccurate in this context. Sex differences in size would be more appropriate. The term sexual dimorphism typically refers to traits that exhibit two distinct forms in males and females-such as primary or secondary sexual characteristics like sex organs or sex combs. In contrast, size is a quantitative trait that follows a normal distribution. Although the average female may be larger than the average male, the distributions overlap, making the term dimorphism imprecise.
      3. In Figure 2E, there appears to be an inconsistency between the text, figure legend, and the data presented. The text and legend state that the total volume of dorsal longitudinal flight muscle cells was quantified, whereas the graph indicates measurements of nuclear size. This discrepancy should be clarified.
      4. The authors proposed: "This increased biosynthetic activity in fat body cells may contribute to cell size differences, but also to the regulation of body size via production of factors that mediate body growth via interorgan communication". Please note that this hypothesis has already been tested functionally in PMID: 39138201 and was shown to be incorrect. Sex differences in body size are completely independent of fat body sexual identity or any intrinsic sex differences within fat cells.
      5. The authors state: "This demonstrate that Myc plays a key role in regulating the sex difference in nucleolar size." This is an overstatement given the functional data presented. The claim should be toned down to reflect the limited evidence.

      Referee cross-commenting

      I completely agree with the main comments of Reviewer 1, as they address the paper's core.

      Significance

      The main novelty and strongest aspect of this study is its use of single-nucleus transcriptomic data from the adult Drosophila Cell Atlas to investigate how different cell types adopt distinct strategies to generate sex differences in organ size-either by increasing cell size or by altering cell number. Previous studies have largely focused on specific tissues, whereas this work provides a comprehensive, organism-wide view that encompasses all tissues, enabling direct cross-comparison between organs. This represents a clear advance in the field, primarily from a technical perspective, by leveraging organism-wide single-cell transcriptomics. The main limitations lie in the lack of functional experiments and mechanistic insights. Moreover, the proposed mechanism-differences in Myc gene dosage or expression levels-is not entirely novel, as Myc dosage has previously been implicated in contributing to sex differences in body size (PMID: 28064166).

    1. Reviewer #1 (Public review):

      A summary of what the authors were trying to achieve:

      (1) Identify probiotic candidates based on the phylogenetic proximity and their presence in the lower respiratory tract based on phylogenetic analysis and on meta-analysis of 16S rRNA sequencing of mouse lung samples.

      (2) Predefine probiotic candidates with overlapping and competing metabolic profiles based on a simple and easy-to-applicable score, taking carbon source use into consideration.

      (3) Confirm the functionality of these candidate probiotics in vitro and define their mechanism of action (niche exclusion by either metabolic competition or active antibacterial strategies).

      (4) Confirm the probiotic action in vivo.

      Strengths:

      The authors attempt to go the whole 9 yards from rational choice of phylogenetic close lower respiratory tract probiotics, over in silico modelling of niche index based on use of similar carbon sources with in vitro confirmation, to in vivo competition experiments in mice.

      Weaknesses:

      (1) The use of a carbon source is defined as growth to OD600 two SD above the blank level. While allowing a clear cutoff, this procedure does not take into account larger differences in the preferences of carbon sources between the pathogen and the probiotic candidate. If the pathogen is much better at taking up and processing a carbon source, the competition by the probiotic might be biologically irrelevant.

      (2) The authors do not take into account the growth of candidate probiotics in the presence of Bt. In monoculture, three of the four most potent candidate probiotics grow to comparable levels as Bt in LSM.

      (3) Niche exclusion in vivo is not shown. Mortality of hosts after infection with Bt is not a measure for competition of CP with the pathogen. Only Bt titers would prove a competitive effect. For CP17, less than half of the mice were actually colonized, but still, there is 100% protection. Activation of the host immune system would explain this and has to be excluded as an alternative reason for improved host survival.

      Appraisal:

      (1) Based on phylogenetic comparison and published resources on lower respiratory tract colonizing bacteria, the authors find a reasonably good number of candidate probiotics that grow in LSM and successfully compete with the pathogenic target bacterium Bt in vitro.

      (2) In vivo, only host survival was tested, and a direct competition of CP with Bt by testing for Bt titers was not shown.

      Impact:

      Niche exclusion based on competition for environmentally provided metabolites is not a new concept and was experimentally tested, e.g. in the intestine. The authors show here that this concept could be translated into the resource-poor environment of the respiratory tract. It remains to be tested if the LSM growth-based competition data in vitro can be translated into niche exclusion in vivo.

    1. R0:

      Reviewer #1: The review is important to improve outcomes on cholera surveillance and response. However, there are a number of critical issues that must be addressed to ensure the manuscript conforms to the standard of scientific writing and scoping review. 1. Certain sections were ommitted e.g Quality assessment and Data analysis 2. The roles of the authors in the scooping exercise also omitted 3. The results and discussion sections are mixed up. The authors began discussing the findings in the result.

      Reviewer #2: Given the ongoing cholera pandemic and its recurrent outbreaks in sub-Saharan Africa, it is commendable that the authors undertook a comprehensive mapping of cholera research in Kenya. 1.For the search strategy, the query “cholera AND Kenya” across all databases is overly restrictive and likely excluded studies using alternative terminology such as “Vibrio cholerae”, “waterborne disease”, or “WASH-related cholera”. I would recommend providing the full keywords, filters and timelines used for each database, to help in reproducibility, as stated in the PRISMA-ScR Checklist (Item 8). 2.Please provide the last search date or timeframe. 3.The authors mentioned the systematic search of five databases, including Google Scholar, Web of Science, PubMed, Embase, and Scopus. However, in the PRISMA flow diagram (Figure 1), there is no data for Google Scholar. 4.The use of Rayyan is recognized. However, reviewer roles, conflict resolution, and data extraction validation are not stated. 5.The authors mentioned the inclusion of non-primary studies, such as reviews, but stated “ineligible study design” as a reason for exclusion in Figure 1. A clarification on this is could be beneficial. 6.For each included study, the authors should present the characteristics of the data charted with respective citations in a table. 7.In section 3.2, the authors provide an informative table which shows the geographic focus of the studies across multiple countries, including Kenya. For a scoping review centered on Kenya, a similar table or map that shows the distribution of studies/ data on the county-level could be added. 8.Themes such as mortality and risk factors of cholera could be explored and discussed further to strengthen the manuscript. 9.The Results-Discussion boundary seems blurred. Discussion begins to appear within “Future directions” paragraphs under each theme. I would recommend that the authors consolidate all “Future directions” into a single Discussion summarising what is known and unknown.

    1. So You Want To Speak At Software Conferences?
      • Motivations and Realism: Define personal success (e.g., promotion, networking, paid speaking) and commit to the long-term effort—e.g., 7 years from first user group talk to international conference.
      • Year 1 - Get Good: Craft a unique talk, deliver at local user groups/Meetups, iterate based on feedback (fix demos, slides, length), repeat multiple times.
      • Year 2 - Get Seen: Submit to small community conferences (e.g., DDD events), network actively (pre-event dinners, stay engaged, follow up via chats/LinkedIn/email), secure video recordings for credibility.
      • Year 3 - Get Accepted: Build talks and network, submit 2-3 focused abstracts to CfPs (use lists like codeasaur.us), leverage connections, keep content fresh, track rejection stats realistically.
      • Year 4 - Get Bored (Sustain): Assess burnout, align with goals (fun, leads, pay), vary event types, always respect audience—stop or pivot if disinterested.
      • Final Advice: Work hard, enjoy, teach valuably; offers abstract/slide reviews at dylanbeattie@gmail.com.
    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This research group has consistently performed cutting-edge research aiming to understand the role of hormones in the control of social behaviors, specifically by utilizing the genetically-tractable teleost fish, medaka, and the current work is no exception. The overall claim they make, that estrogens modulate social behaviors in males and females is supported, with important caveats. For one, there is no evidence these estrogens are generated by "neurons" as would be assumed by their main claim that it is NEUROestrogens that drive this effect. While indeed the aromatase they have investigated is expressed solely in the brain, in most teleosts, brain aromatase is only present in glial cells (astrocytes, radial glia). The authors should change this description so as not to mislead the reader. Below I detail more specific strengths and weaknesses of this manuscript.

      We thank the reviewer for this positive evaluation of our work and for the helpful comments and suggestions. Regarding the concern that the term “neuroestrogens” may be misleading, we addressed this in the previous revision by consistently replacing it throughout the manuscript with “brain-derived estrogens” or “brain estrogens.”

      In addition, the following sentence was added to the Introduction (line 61): “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (Forlano et al., 2001; Diotel et al., 2010; Takeuchi and Okubo, 2013).”

      Strenghth:

      Excellent use of the medaka model to disentangle the control of social behavior by sex steroid hormones 

      The findings are strong for the most part because deficits in the mutants are restored by the molecule (estrogens) that was no longer present due to the mutation 

      Presentation of the approach and findings are clear, allowing the reader to make their own inferences and compare them with the authors' 

      Includes multiple follow-up experiments, which leads to tests of internal replication and an impactful mechanistic proposal 

      Findings are provocative not just for teleost researchers, but for other species since, as the authors point out, the data suggest mechanisms of estrogenic control of social behaviors may be evolutionary ancient 

      We thank the reviewer again for their positive evaluation of our work.

      Weakness:

      As stated in the summary, the authors are attributing the estrogen source to neurons and there isn't evidence this is the case. The impact of the findings doesn't rest on this either

      As mentioned above, we addressed this in the previous revision by replacing “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript. In addition, the following sentence was added to the Introduction (line 61): “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (Forlano et al., 2001; Diotel et al., 2010; Takeuchi and Okubo, 2013).”

      The d4 versus d8 esr2a mutants showed different results for aggression. The meaning and implications of this finding are not discussed, leaving the reader wondering

      This comment is the same as one raised in the first review (Reviewer #1’s comment 2 on weaknesses), which we already addressed in our initial revision. For the reviewer’s convenience, we provide the response below:

      Line 300: As the reviewer correctly noted, circles were significantly reduced in mutant males of the Δ8 line, whereas no significant reduction was observed in those of the Δ4 line. However, a tendency toward reduction was evident in the Δ4 line (P = 0.1512), and both lines showed significant differences in fin displays. Based on these findings, we believe our conclusion that esr2a<sup>−/−</sup> males exhibit reduced aggression remains valid. To clarify this point and address potential reader concerns, we have revised the text as follows: “esr2a<sup>−/−</sup> males exhibited significantly fewer fin displays (P = 0.0461 and 0.0293 for Δ8 and Δ4 lines, respectively) and circles (P = 0.0446 and 0.1512 for Δ8 and Δ4 lines, respectively) than their wild-type siblings (Fig. 5L; Fig. S8E), suggesting less aggression” was edited to read “esr2a<sup>−/−</sup> males from both the Δ8 and Δ4 lines exhibited significantly fewer fin displays than their wild-type siblings (P = 0.0461 and 0.0293, respectively). Circles followed a similar pattern, with a significant reduction in the Δ8 line (P = 0.0446) and a comparable but non-significant decrease in the Δ4 line (P =0.1512) (Figure 5L, Figure 5—figure supplement 3E), showing less aggression.”

      Lack of attribution of previous published work from other research groups that would provide the proper context of the present study

      This comment is also the same as one raised in the first review (Reviewer #1’s comment 3 on weaknesses). In our previous revision, in response to this comment, we cited the relevant references (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015; Yong et al., 2017; Alward et al., 2020; Ogino et al., 2023) in the appropriate sections. We also added the following new references and revised the Introduction and Discussion accordingly:

      (2) Alward BA, Laud VA, Skalnik CJ, York RA, Juntti SA, Fernald RD. 2020. Modular genetic control of social status in a cichlid fish. Proceedings of the National Academy of Sciences of the United States of America 117:28167–28174. DOI: https://doi.org/10.1073/pnas.2008925117

      (39) O’Connell LA, Hofmann HA. 2012. Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology 153:1341–1351. DOI:https://doi.org/10.1210/en.2011-1663

      (54) Yong L, Thet Z, Zhu Y. 2017. Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. Journal of Experimental Biology 220:3017–3021.DOI:https://doi.org/10.1242/jeb.161596

      There are a surprising number of citations not included; some of the ones not included argue against the authors' claims that their findings were "contrary to expectation"

      In our previous revision, we cited the relevant references (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015) in the Introduction. We also revised the text to remove phrases such as “contrary to expectation” and “unexpected.”

      The experimental design for studying aggression in males has flaws. A standard test like a residentintruder test should be used.

      Following this comment, we have attempted additional aggression assays using the resident-intruder paradigm. However, these experiments did not produce consistent or interpretable results. As noted in our previous revision, medaka naturally form shoals and exhibit weak territoriality, and even slight differences in dominance between a resident and an intruder can markedly increase variability, reducing data reliability. Therefore, we believe that the approach used in the present study provides a more suitable assessment of aggression in medaka, regardless of territorial tendencies. We will continue to explore potential refinements in future studies and respectfully ask the reviewer to evaluate the present work based on the assay used here.

      While they investigate males and females, there are fewer experiments and explanations for the female results, making it feel like a small addition or an aside

      While we did not adopt this comment in our previous revision, we have carefully reconsidered the reviewers’ feedback and have now decided to remove the female data. This change allows us to present a more focused and cohesive story centered on males. The specific revisions are outlined below:

      Abstract

      Line 25: The text “, thereby revealing a previously unappreciated mode of action of brain-derived estrogens. We additionally show that female fish lacking Cyp19a1b are less receptive to male courtship and conversely court other females, highlighting the significance of brain-derived estrogens in establishing sex-typical behaviors in both sexes.” has been revised to “. Taken together, these findings reveal a previously unappreciated mode of action of brain-derived estrogens in shaping male-typical behaviors.”

      Results

      Line 88: The text “Loss of cyp19a1b function in these fish was verified by measuring brain and peripheral levels of sex steroids. As expected, brain estradiol-17β (E2) in both male and female homozygous mutants (cyp19a1b<sup>−/−</sup>) was significantly reduced to 16% and 50%, respectively, of the levels in their wild-type (cyp19a1b<sup>+/+</sup>) siblings (P = 0.0037, males; P = 0.0092, females) (Fig. 1, A and B). In males, brain E2 in heterozygotes (cyp19a1b<sup>−/−</sup>) was also reduced to 45% of the level in wild-type siblings (P = 0.0284) (Fig. 1A), indicating a dosage effect of cyp19a1b mutation. In contrast, peripheral E2 levels were unaltered in both cyp19a1b<sup>−/−</sup> males and females (Fig. S1, C and D), consistent with the expected functioning of Cyp19a1b primarily in the brain. Strikingly, brain levels of testosterone, as opposed to E2, increased 2.2-fold in cyp19a1b<sup>−/−</sup> males relative to wild-type siblings (P = 0.0006) (Fig. 1A). Similarly, brain 11KT levels in cyp19a1b<sup>−/−</sup> males and females increased 6.2- and 1.9-fold, respectively, versus wild-type siblings (P = 0.0007, males; P = 0.0316, females) (Fig. 1, A and B). These results show that cyp19a1b-deficient fish have reduced estrogen levels coupled with increased androgen levels in the brain, confirming the loss of cyp19a1b function. They also suggest that the majority of estrogens in the male brain and half of those in the female brain are synthesized locally in the brain. In addition, peripheral 11KT levels in cyp19a1b<sup>−/−</sup> males and females increased 3.7- and 1.8-fold, respectively (P = 0.0789, males; P = 0.0118, females) (Fig. S1, C and D), indicating peripheral influence in addition to central effects.” has been revised to “Loss of cyp19a1b function in these fish was verified by measuring brain and peripheral levels of sex steroids in males. As expected, brain estradiol-17β (E2) in homozygous mutants (cyp19a1b<sup>−/−</sup>) was significantly reduced to 16% of the levels in wild-type (cyp19a1b<sup>+/+</sup>) siblings (P = 0.0037) (Figure 1A). Brain E2 in heterozygotes (cyp19a1b<sup>+/−</sup>) was also reduced to 45% of wild-type levels (P = 0.0284) (Figure 1A), indicating a dosage effect of the cyp19a1b mutation. In contrast, peripheral E2 levels were unaltered in cyp19a1b<sup>−/−</sup> males (Figure 1B), consistent with the expected functioning of Cyp19a1b primarily in the brain. Strikingly, brain testosterone levels, as opposed to E2, increased 2.2-fold in cyp19a1b<sup>−/−</sup> males relative to wild-type siblings (P = 0.0006) (Figure 1A). Similarly, brain 11KT levels increased 6.2-fold (P = 0.0007) (Figure 1A). These results indicate that cyp19a1b-deficient males have reduced estrogen coupled with elevated androgen levels in the brain, confirming the loss of cyp19a1b function. They also suggest that the majority of estrogens in the male brain are synthesized locally in the brain. Peripheral 11KT levels also increased 3.7-fold in cyp19a1b<sup>−/−</sup> males (P = 0.0789) (Figure 1B), indicating peripheral influence in addition to central effects.”

      Line 211: “expression of vt in the pNVT of cyp19a1b<sup>−/−</sup> males was significantly reduced to 18% as compared with cyp19a1b<sup>+/+</sup> males (P = 0.0040), a level comparable to that observed in females” has been revised to “expression of vt in the pNVT of cyp19a1b<sup>−/−</sup> males was significantly reduced to 18% as compared with cyp19a1b<sup>+/+</sup> males (P = 0.0040).”

      The subsection entitled “cyp19a1b-deficient females are less receptive to males and instead court other females,” which followed line 311, has been removed.

      Discussion

      The two paragraphs between lines 373 and 374, which addressed the female data, have been removed.

      Materials and methods

      Line 433: “males and females” has been changed to “males”.

      Line 457: “focal fish” has been changed to “focal male”.

      Line 458: “stimulus fish” has been changed to “stimulus female”.

      Line 458: “Fig. 6, E and F, ” has been deleted.

      Line 460: “; wild-type males in Fig. 6, A to C” has been deleted.

      Line 466: The text “The period of interaction/recording was extended to 2 hours in tests of courtship displays received from the stimulus esr2b-deficient female and in tests of mating behavior between females, because they take longer to initiate courtship (12). In tests using an esr2b-deficient female as the stimulus fish, where the latency to spawn could not be calculated because these fish were unreceptive to males and did not spawn, the sexual motivation of the focal fish was instead assessed by counting the number of courtship displays and wrapping attempts in 30 min. The number of these mating acts was also counted in tests to evaluate the receptivity of females. In tests of mating behavior between two females, the stimulus female was marked with a small notch in the caudal fin to distinguish it from the focal female.” has been revised to “In tests using an esr2b-deficient female as the stimulus fish, the latency to spawn could not be calculated because the female was unreceptive to males and did not spawn. Therefore, the sexual motivation of the focal male was assessed by counting the number of courtship displays and wrapping attempts in 30 min. To evaluate courtship displays performed by stimulus esr2bdeficient females toward focal males, the recording period was extended to 2 hours, as these females take longer to initiate courtship (Nishiike et al., 2021). In all video analyses, the researcher was blind to the fish genotype and treatment.”

      Line 499: “brains dissected from males and females of the cyp19a1b-deficient line (analysis of ara, arb, vt, gal, npba, and esr2b) and males of the esr1-, esr2a-, and esr2b-deficient lines” has been revised to “male brains from the cyp19a1b-deficient line (analysis of ara, arb, vt, and gal) and from the esr1-, esr2a-, and esr2b-deficient lines.”

      Line 504: “After color development for 15 min (gal), 40 min (npba), 2 hours (vt), or overnight (ara, arb, and esr2b)” has been revised to “After color development for 15 min (gal), 2 hours (vt), or overnight (ara and arb).”

      Line 516: “Thermo Fisher Scientific, Waltham, MA” has been changed to “Thermo Fisher Scientific” to avoid redundancy.

      Line 565: The subsection entitled “Measurement of spatial distances between fish” has been removed.

      Line 585: “6/10 cyp19a1b<sup>+/+</sup>, 3/10 cyp19a1b<sup>+/−</sup>, and 6/10 cyp19a1b<sup>−/−</sup> females were excluded in Fig. 6B;” has been deleted.

      References

      The following references have been removed:

      Capel B. 2017. Vertebrate sex determination: evolutionary plasticity of a fundamental switch. Nature Reviews Genetics 18:675–689. DOI: https://doi.org/10.1038/nrg.2017.60

      Hiraki T, Nakasone K, Hosono K, Kawabata Y, Nagahama Y, Okubo K. 2014. Neuropeptide B is femalespecifically expressed in the telencephalic and preoptic nuclei of the medaka brain. Endocrinology 155:1021–1032. DOI: https://doi.org/10.1210/en.2013-1806

      Juntti SA, Hilliard AT, Kent KR, Kumar A, Nguyen A, Jimenez MA, Loveland JL, Mourrain P, Fernald RD. 2016. A neural basis for control of cichlid female reproductive behavior by prostaglandin F2α. Current Biology 26:943–949. DOI: https://doi.org/10.1016/j.cub.2016.01.067

      Kimchi T, Xu J, Dulac C. 2007. A functional circuit underlying male sexual behaviour in the female mouse brain. Nature 448:1009–1014. DOI: https://doi.org/10.1038/nature06089

      Kobayashi M, Stacey N. 1993. Prostaglandin-induced female spawning behavior in goldfish (Carassius auratus) appears independent of ovarian influence. Hormones and Behavior 27:38–55.

      DOI:https://doi.org/10.1006/hbeh.1993.1004

      Liu H, Todd EV, Lokman PM, Lamm MS, Godwin JR, Gemmell NJ. 2017. Sexual plasticity: a fishy tale. Molecular Reproduction and Development 84:171–194. DOI: https://doi.org/10.1002/mrd.22691

      Munakata A, Kobayashi M. 2010. Endocrine control of sexual behavior in teleost fish. General and Comparative Endocrinology 165:456–468. DOI: https://doi.org/10.1016/j.ygcen.2009.04.011

      Nugent BM, Wright CL, Shetty AC, Hodes GE, Lenz KM, Mahurkar A, Russo SJ, Devine SE, McCarthy MM. 2015. Brain feminization requires active repression of masculinization via DNA methylation. Nature Neuroscience 18:690–697. DOI: https://doi.org/10.1038/nn.3988

      Shaw K, Therrien M, Lu C, Liu X, Trudeau VL. 2023. Mutation of brain aromatase disrupts spawning behavior and reproductive health in female zebrafish. Frontiers in Endocrinology 14:1225199.

      DOI:https://doi.org/10.3389/fendo.2023.1225199

      Stacey NE. 1976. Effects of indomethacin and prostaglandins on the spawning behaviour of female goldfish. Prostaglandins 12:113–126. DOI: https://doi.org/10.1016/s0090-6980(76)80010-x

      Figure 1

      Panel B, which originally showed steroid levels in female brains, has been replaced with steroid levels in the periphery of males, originally presented in Figure S1, panel C. Accordingly, the legend “(A and B) Levels of E2, testosterone, and 11KT in the brain of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (A) and females (B) (n = 3 per genotype and sex).” has been revised to “(A, B) Levels of E2, testosterone, and 11KT in the brain (A) and periphery (B) of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (n = 3 per genotype).”

      Figure 3

      The female data have been deleted from Figure 3. The revised Figure 3 is presented.

      The corresponding legend text has been revised as follows:

      Line 862: “males and females (n = 4 and 5 per genotype for males and females, respectively)” has been changed to “males (n = 4 per genotype)”.

      Line 864: “males and females (n = 4 except for cyp19a1b<sup>+/+</sup> males, where n = 3)” has been changed to “males (n = 3 and 4, respectively)”.

      Figure 6

      Figure 6 and its legend have been removed.

      Figure 1—figure supplement 1

      Panel C, showing male data, has been moved to Figure 1B, as described above, while panel D, showing female data, has been deleted. The corresponding legend “(C and D) Levels of E2, testosterone, and 11KT in the periphery of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (C) and females (D) (n = 3 per genotype and sex). Statistical differences were assessed by Bonferroni’s post hoc test (C and D). Error bars represent SEM. *P < 0.05.” has also been removed.

      Line 804: Following this change, the figure title has been updated from “Generation of cyp19a1bdeficient medaka and evaluation of peripheral sex steroid levels” to “Generation of cyp19a1b-deficient medaka.”

      The statistics comparing "experimental to experimental" and "control to experimental" isn't appropriate 

      This comment is the same as one raised in the first review (Reviewer #1’s comment 7 on weaknesses), which we already addressed in our initial revision. For the reviewer’s convenience, we provide the response below:

      The reviewer raised concerns about the statistical analysis used for Figures 4C and 4E, suggesting that Bonferroni’s test should be used instead of Dunnett’s test. However, Dunnett’s test is commonly used to compare treatment groups to a reference group that receives no treatment, as in our study. Since we do not compare the treated groups with each other, we believe Dunnett’s test is the most appropriate choice.

      Line 576: The reviewer’s concern may have arisen from the phrase “comparisons between control and experimental groups” in the Materials and methods. We have revised it to “comparisons between untreated and E2-treated groups in Figure 4C and D” for clarity.

      Reviewer #3 (Public Review):

      Summary:

      Taking advantage of the existence in fish of two genes coding for estrogen synthase, the enzyme aromatase, one mostly expressed in the brain (Cyp19a1b) and the other mostly found in the gonads (Cyp19a1a), this study investigates the role of brain-derived estrogens in the control of sexual and aggressive behavior in medaka. The constitutive deletion of Cyp19a1b markedly reduced brain estrogen content in males and to a lesser extent in females. These effects are accompanied by reduced sexual and aggressive behavior in males and reduced preference for males in females. These effects are reversed by adult treatment with supporting a role for estrogens. The deletion of Cyp19a1b is associated with a reduced expression of the genes coding for the two androgen receptors, ara and arb, in brain regions involved in the regulation of social behavior. The analysis of the gene expression and behavior of mutants of estrogen receptors indicates that these effects are likely mediated by the activation of the esr1 and esr2a isoforms. These results provide valuable insight into the role of estrogens in social behavior in the most abundant vertebrate taxon, however the conclusion of brain-derived estrogens awaits definitive confirmation.

      We thank this reviewer for their positive evaluation of our work and comments that have improved the manuscript.

      Strength:

      Evaluation of the role of brain "specific" Cyp19a1 in male teleost fish, which as a taxon are more abundant and yet proportionally less studied that the most common birds and rodents. Therefore, evaluating the generalizability of results from higher vertebrates is important. This approach also offers great potential to study the role of brain estrogen production in females, an understudied question in all taxa.

      Results obtained from multiple mutant lines converge to show that estrogen signaling, likely synthesized in the brain drives aspects of male sexual behavior.

      The comparative discussion of the age-dependent abundance of brain aromatase in fish vs mammals and its role in organization vs activation is important beyond the study of the targeted species.  - The authors have made important corrections to tone down some of the conclusions which are more in line with the results. 

      We thank the reviewer again for their positive evaluation of our work and the revisions we have made.

      weaknesses:

      No evaluation of the mRNA and protein products of Cyp19a1b and ESR2a are presented, such that there is no proper demonstration that the mutation indeed leads to aromatase reduction. The conclusion that these effects dependent on brain derived estrogens is therefore only supported by measures of E2 with an EIA kit that is not validated. No discussion of these shortcomings is provided in the discussion thus further weakening the conclusion manuscript.

      In response to this and other comments, we have now provided direct validation that the cyp19a1b mutation in our medaka leads to loss of function. Real-time PCR analysis showed that cyp19a1b transcript levels in the brain were reduced by approximately half in cyp19a1b<sup>+/−</sup> males and were nearly absent in cyp19a1b<sup>−/−</sup> males, consistent with nonsense-mediated mRNA decay

      In addition, AlphaFold 3-based structural modeling indicated that the mutant Cyp19a1b protein lacks essential motifs, including the aromatic region and heme-binding loop, and exhibits severe conformational distortion (see figure; key structural features are annotated as follows: membrane helix (blue), aromatic region (red), and heme-binding loop (orange)). 

      Results:

      Line 101: The following text has been added: “Loss of cyp19a1b function was further confirmed by measuring cyp19a1b transcript levels in the brain and by predicting the three-dimensional structure of the mutant protein. Real-time PCR revealed that transcript levels were reduced by half in cyp19a1b<sup>+/−</sup> males and were nearly undetectable in cyp19a1b<sup>−/−</sup> males, presumably as a result of nonsense-mediated mRNA decay (Lindeboom et al., 2019) (Figure 1C). The wild-type protein, modeled by AlphaFold 3, exhibited a typical cytochrome P450 fold, including the membrane helix, aromatic region, and hemebinding loop, all arranged in the expected configuration (Figure 1—figure supplement 1C). The mutant protein, in contrast, was severely truncated, retaining only the membrane helix (Figure 1—figure supplement 1C). The absence of essential domains strongly indicates that the allele encodes a nonfunctional Cyp19a1b protein. Together, transcript and structural analyses consistently demonstrate that the mutation generated in this study causes a complete loss of cyp19a1b function.”

      Materials and methods

      Line 438: A subsection entitled “Real-time PCR” has been added. The text of this subsection is as follows: “Total RNA was isolated from the brains of cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males using the RNeasy Plus Universal Mini Kit (Qiagen, Hilden, Germany). cDNA was synthesized with the SuperScript VILO cDNA Synthesis Kit (Thermo Fisher Scientific, Waltham, MA). Real-time PCR was performed on the LightCycler 480 System II using the LightCycler 480 SYBR Green I Master (Roche Diagnostics). Melting curve analysis was conducted to verify that a single amplicon was obtained in each sample. The β-actin gene (actb; GenBank accession number NM_001104808) was used to normalize the levels of target transcripts. The primers used for real-time PCR are shown in Supplementary file 2.”

      Line 448: A subsection entitled “Protein structure prediction” has been added. The text of this subsection is as follows: “Structural predictions of Cyp19a1b proteins were conducted using AlphaFold 3 (Abramson et al., 2024). Amino acid sequences corresponding to the wild-type allele and the mutant allele generated in this study were submitted to the AlphaFold 3 prediction server. The resulting models were visualized with PyMOL (Schrödinger, New York, NY), and key structural features, including the membrane helix, aromatic region, and heme-binding loop, were annotated.”

      References

      The following two references have been added:

      Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung CC, O'Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, CowenRivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM. 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630:493–500. DOI: https://doi.org/10.1038/s41586-024-07487-w

      Lindeboom RGH, Vermeulen M, Lehner B, Supek F. 2019. The impact of nonsense-mediated mRNA decay on genetic disease, gene editing and cancer immunotherapy. Nature Genetics 51:1645–1651.DOI:https://doi.org/10.1038/s41588-019-0517-5

      Figure 1

      The real-time PCR results described above have been incorporated in Figure 1, panel C, with the corresponding legend provided below (line 788).

      (C) Brain cyp19a1b transcript levels in cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (n = 6 per genotype). Mean value for cyp19a1b<sup>+/+</sup> males was arbitrarily set to 1.

      The subsequent panels have been renumbered accordingly. The entirety of the revised Figure 1.

      Figure 1—figure supplement 1

      The AlphaFold 3-generated structural models described above have been incorporated in Figure 1— figure supplement 1, panel C, with the corresponding legend provided below (line 811).

      (C) Predicted three-dimensional structures of wild-type (left) and mutant (right) Cyp19a1b proteins. Key structural features are annotated as follows: membrane helix (blue), aromatic region (red), and heme-binding loop (orange).

      The entirety of the revised Figure 1—figure supplement 1 is presented

      The information on the primers used for real-time PCR has been included in Supplementary file 2.

      The functional deficiency of esr2a was already addressed in the previous revision. For clarity, we have reproduced the relevant information here.

      A previous study reported that female medaka lacking esr2a fail to release eggs due to oviduct atresia (Kayo et al., 2019, Sci Rep 9:8868). Similarly, in this study, some esr2a-deficient females exhibited spawning behavior but were unable to release eggs, although the sample size was limited (Δ8 line: 2/3; Δ4 line: 1/1). In contrast, this was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function. To incorporate this information into the manuscript, the following text has been added to the Materials and methods (line 423): “A previous study reported that esr2a-deficient female medaka cannot release eggs due to oviduct atresia (Kayo et al., 2019). Likewise, some esr2a-deficient females generated in this study, despite the limited sample size, exhibited spawning behavior but were unable to release eggs (Δ8 line: 2/3; Δ4 line: 1/1), while such failure was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function.”

      Most experiments are weakly powered (low sample size).

      This comment is essentially the same as one raised in the first review (Reviewer #3’s comment 7 on weaknesses). We acknowledge the reviewer’s concern that the histological analyses were weakly powered due to the limited sample size. In our earlier revision, we responded as follows:

      Histological analyses were conducted with a relatively small sample size, as our previous experience suggested that interindividual variability in the results would not be substantial. Since significant differences were detected in many analyses, further increasing the sample size was deemed unnecessary.

      The variability of the mRNA content for a same target gene between experiments (genotype comparison vs E2 treatment comparison) raises questions about the reproducibility of the data (apparent disappearance of genotype effect).

      This comment is the same as one raised in the first review (Reviewer #3’s comment 8 on weaknesses), which we already addressed in our initial revision. For the reviewer’s convenience, we provide the response below:

      As the reviewer pointed out, the overall area of ara expression is larger in Figure 2J than in Figure 2F. However, the relative area ratios of ara expression among brain nuclei are consistent between the two figures, indicating the reproducibility of the results. Thus, this difference is unlikely to affect the conclusions of this study.

      Additionally, the differences in ara expression in pPPp and arb expression in aPPp between wild-type and cyp19a1b-deficient males appear less pronounced in Figures 2J and 2K than in Figures 2F and 2H. This is likely attributable to the smaller sample size used in the experiments for Figures 2J and 2K, resulting in less distinct differences. However, as the same genotype-dependent trends are observed in both sets of figures, the conclusion that ara and arb expression is reduced in cyp19a1b-deficient male brains remains valid.

      Conclusions:

      Overall, the claims regarding role of estrogens originating in the brain on male sexual behavior is supported by converging evidence from multiple mutant lines. The role of brain-derived estrogens on gene expression in the brain is weaker as are the results in females. 

      We appreciate the reviewer’s positive evaluation of our findings on male behavior. The concern regarding the role of brain-derived estrogens in gene expression has been addressed in our rebuttal, and the female data have been removed so that the analysis now focuses on males. The specific revisions for removing the female data are described in Response to reviewer #1’s comment 6 on weaknesses.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript is improved slightly. I am thankful the authors addressed some concerns, but for several concerns the referees raised, the authors acknowledged them yet did not make corresponding changes to the manuscript or disagreed that they were issues at all without explanation. All reviewers had issues with the imbalanced focus on males versus females and the male aggression assay. Yet, they did not perform additional experiments or even make changes to the framing and scope of the manuscript. If the authors had removed the female data, they may have had a more cohesive story, but then they would still be left with inadequate behavior assays in the males. If the authors don't have the time or resources to perform the additional work, then they should have said so. However, the work would be incomplete relative to the claims. That is a key point here. If they change their scope and claims, the authors avoid overstating their findings. I want to see this work published because I believe it moves the field forward. But the authors need to be realistic in their interpretations of their data. 

      In response to this and related comments, we have removed the female data and focused the manuscript on analyses in males. The specific revisions are described in Response to reviewer #1’s comment 6 on weaknesses. Additionally, we have validated that the cyp19a1b mutation in our medaka leads to loss of function (see Response to reviewer #3’s comment 1 on weaknesses), which further strengthens the reliability of our conclusions regarding male behavior.

      I agree with the reviewer who said we need to see validation of the absence of functional cyp19a1 b in the brain. However, the results from staining for the protein and performing in situ could be quizzical. Indeed, there aren't antibodies that could distinguish between aromatase a and b, and it is not uncommon for expression of a mutated gene to be normal. One approach they could do is measure aromatase activity, but they are *sort of* doing that by measuring brain E2. It's not perfect, but we teleost folks are limited in these areas. At the very least, they should show the predicted protein structure of the mutated aromatase alleles. It could show clearly that the tertiary structure is utterly absent, giving more support to the fact that their aromatase gene is non-functional. 

      As noted above, we have further validated the loss of cyp19a1b function by measuring cyp19a1b transcript levels in the brain and predicting the three-dimensional structure of the mutant protein. These analyses confirmed that cyp19a1b function is indeed lost, thereby increasing the reliability of our conclusions. For further details, please refer to Response to reviewer #3’s comment 1 on weaknesses.

      With all of this said, the work is important, and it is possible that with a reframing of the impact of their work in the context of their findings, I could consider the work complete. I think with a proper reframing, the work is still impactful. 

      In accordance with this feedback, and as described above, we have reframed the manuscript by removing the female data and focusing exclusively on males. This revision clarifies the scope of our study and reinforces the support for our conclusions. For further details, please refer to Response to reviewer #1’s comment 6 on weaknesses.

      (1) Clearly state in the Figure 1 legend that each data point for male aggressive behaviors represents the total # of behaviors calculated over the 4 males in each experimental tank.

      In response to this comment, we have revised the legend of Figure 1K (line 797). The original legend, “(K) Total number of each aggressive act observed among cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, or cyp19a1<sup>−/−</sup> males in the tank (n = 6, 7, and 5, respectively),” has been updated to “(K) Total number of each aggressive act performed by cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males. Each data point represents the sum of acts recorded for the 4 males of the same genotype in a single tank (n = 6, 7, and 5 tanks, respectively).” This clarifies that each data point reflects the total behaviors of the 4 males within each tank.

      (2) The authors wrote under "Response to reviewer #1's major comment "...the development of male behaviors may require moderate neuroestrogen levels that are sufficient to induce the expression of ara and arb, but not esr2b, in the underlying neural circuitry": "This may account for the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study.".

      What is meant by the latter statement? What accounts for the lack of aggression? The lack of increase in esr2b? Please clarify. 

      Line 365: In response to this comment, “This may account for the lack of aggression recovery in E2treated cyp19a1b-deficient males in this study.” has been revised to “Considering this, the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study may be explained by the possibility that the E2 dose used was sufficient to induce not only ara and arb but also esr2b expression in aggression-relevant circuits, which potentially suppressed aggression.”

      This revision clarifies that, while moderate brain estrogen levels are sufficient to promote male behaviors via induction of ara and arb, the E2 dose used in this study may have additionally induced esr2b in circuits relevant to aggression, potentially underlying the lack of aggression recovery.

      (3) This is a continuation of my comment/concern directly above. If the induction of ara and arb aren't enough, then how can, as the authors state, androgen signaling be the primary driver of these behaviors? 

      In response to this follow-up comment, we would like to clarify that, as described above, the lack of aggression recovery in E2-treated cyp19a1b-deficient males is not due to insufficient induction of ara and arb, but instead is likely because esr2b was also induced in aggression-relevant circuits, which may have suppressed aggression. Therefore, the concern that androgen signaling cannot be the primary driver of these behaviors is not applicable.

      (4) The authors' point about sticking with the terminology for the ar genes as "ara" and "arb" is not convincing. The whole point of needing a change to match the field of neuroendocrinology as a whole (that is, across all vertebrates) is researchers, especially those with high standing like the Okubo group, adopt the new terminology. Indeed, the Okubo group is THE leader in medaka neuroendocrinology. It would go a long way if they began adopting the new terminology of "ar1" and "ar2". I understand this may be laborious to a degree, and each group can choose to use their terminology, but I'd be remiss if I didn't express my opinion that changing the terminology could help our field as a whole. 

      We sincerely appreciate the reviewer’s thoughtful comments regarding nomenclature consistency in vertebrate neuroendocrinology. We understand the motivation behind the suggestion to adopt ar1 and ar2. However, we consider the established nomenclature of ara and arb to be more appropriate for the following reasons.

      First, adopting the ar1/ar2 nomenclature would introduce a discrepancy between gene and protein symbols. According to the NCBI International Protein Nomenclature Guidelines (Section 2B.Abbreviations and symbols;

      https://www.ncbi.nlm.nih.gov/genbank/internatprot_nomenguide/), the ZFIN Zebrafish Nomenclature Conventions (Section 2. PROTEINS:https://zfin.atlassian.net/wiki/spaces/general/pages/1818394635/ZFIN+Zebrafish+Nomenclature+Con ventions), and the author guidelines of many journal

      (e.g.,https://academic.oup.com/molehr/pages/Gene_And_Protein_Nomenclature), gene and protein symbols should be identical (with proteins designated in non-italic font and with the first letter capitalized). Maintaining consistency between gene and protein symbols helps avoid unnecessary confusion. The ara/arb nomenclature allows this, whereas ar1/ar2 does not.

      Second, the two androgen receptor genes in teleosts are paralogs derived from the third round of wholegenome duplication that occurred early in teleost evolution. For such duplicated genes, the ZFIN Zebrafish Nomenclature Conventions (Section 1.2. Duplicated genes) recommend appending the suffixes “a” and “b” to the approved symbol of the human or mouse ortholog. This convention clearly indicates that these genes are whole-genome duplication paralogs and provides an intuitive way to represent orthologous and paralogous relationships between teleost genes and those of other vertebrates. As a result, it has been widely adopted, and we consider it logical and beneficial to apply the same principle to androgen receptors.

      In light of these considerations, we respectfully maintain that the ara/arb nomenclature is more suitable for the present manuscript than the alternative ar1/ar2 system.

      (5) In the discussion please discuss these potentially unexpected findings.

      (a) gal was unaffected in female cyp19a1 mutants, but they exhibit mating behaviors towards females. Given gal is higher in males and these females act like females, what does this mean about the function of gal/its utility in being a male-specific marker (is it one??)? 

      (b) esr2b expression is higher in female cyp19a1 mutants. this is unexpected as well given esr2b is required for female-typical mating and is higher in females compared to males and E2 increases esr2b expression. please explain...well, what this means for our idea of what esr2b expression tell us. 

      We thank the reviewer for the insightful comments. As the female data have been removed from the manuscript, discussion of these findings in female cyp19a1b mutants is no longer necessary.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed a number of answers to the reviewer's comments, notably they provided missing methodological information and rephrased the text. However, the authors have not addressed the main issues raised by the reviewers. Notably, it is regrettable that the reduced amount of brain aromatase cannot be confirmed, this seems to be the primary step when validating a new mutant. Even if protein products of the two genes may not be discriminated (which I can understand), it should be possible to evaluate the expression of a common messenger and/or peptide and confirm that aromatase expression is reduced in the brain. Since Cyp19a1b is relatively more abundant in the brain Cyp19a1a, this would strengthen the conclusion and provide confidence that the mutant indeed does silence aromatase expression in the brain. Although these short comings are acknowledged in the rebuttal letter, this is not mentioned in the discussion. Doing so would make the manuscript more transparent and clearer. 

      As noted in Response to reviewer #3’s comment 1 on weaknesses, we have validated the loss of Cyp19a1b function by measuring its transcript levels in the brain and predicting the three-dimensional structure of the mutant protein. These analyses confirmed that Cyp19a1b function is indeed lost, thereby increasing the reliability of our conclusions.

      FigS1 - panels C&D please indicate in which tissue were hormones measured. Blood?

      We thank the reviewer for pointing this out. In our study, “peripheral” refers to the caudal half of the body excluding the head and visceral organs, not blood. Accordingly, we have revised the figure legend and the description in the Materials and Methods section as follows:

      Legend for Figure 1B (line 787) now reads: “Levels of E2, testosterone, and 11KT in the brain (A) and peripheral tissues (caudal half of the body) (B) of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (n = 3 per genotype).”

      Materials and methods (line 431): The sentence “Total lipids were extracted from the brain and peripheral tissues (from the caudal half) of” has been revised to “Total lipids were extracted from the brain and from peripheral tissues, specifically the caudal half of the body excluding the head and visceral organs, of.”

      Additional Alterations:

      We have reformatted the text and supporting materials to comply with the journal’s Author Guidelines. The following changes have been made:

      (1) Figures and supplementary files are now provided separately from the main text.

      (2) The title page has been reformatted without any changes to its content.

      (3) In-text citations have been changed from numerical references to the author–year format.

      (4) Figure labels have been revised from “Fig. 1,” “Fig. S1,” etc., to “Figure 1,” “Figure 1—figure supplement 1,” etc.

      (5) Table labels have been revised from “Table S1,” etc., to “Supplementary file 1,” etc.

      (6) Line 324: The typo “is” has been corrected to “are”.

      (7) Line 382: The section heading “Materials and Methods” has been changed to “Materials and methods” (lowercase “m”).

      (8) Line 383: The Key Resources Table has been placed at the beginning of the Materials and methods section.

      (9) Line 389: The sentence “Sexually mature adults (2–6 months) were used for experiments, and tissues were consistently sampled 1–5 hours after lights on.” has been revised to “Sexually mature adults (2–6 months) were used for experiments and assigned randomly to experimental groups. Tissues were consistently sampled 1–5 hours after lights on.”

      (10)  Line 393: The sentence “All fish were handled in accordance with the guidelines of the Institutional Animal Care and Use Committee of the University of Tokyo.” has been removed.

      (11)  Line 589: The following sentence has been added: “No power analysis was conducted due to the lack of relevant data; sample size was estimated based on previous studies reporting inter-individual variation in behavior and neural gene expression in medaka.”

      (12)  Line 598: The reference list has been reordered from numerical sequence to alphabetical order by author.

      (13)  In the figure legends, notations such as “A and B” have been revised to “A, B.”

    1. Reviewer #1 (Public review):

      Summary:

      This paper provides a novel method to improve the accuracy of predictions of the impact of ITN strategies, by using sub-national estimates of the duration of ITN access and use over time from cross-sectional survey data and annual country ITNs received.

      Strengths:

      The approach is novel, makes use of available data, and has considered all of the relevant components of ITN distributions.

      Weaknesses:

      (1) The main message of the paper was not very clear, and did not seem to fit the title. The title focuses on sub-national tailoring of ITN, but the abstract did not feature results directly about SNT. It was not very clear what the main result of the paper was - there are several ITN observations in the results and discussion. Most did not seem to be directly about SNT, but rather sub-national differences in use and access were accounted for in the analyses. It was not clear if the same conclusions would be reached without accounting for sub-national differences, but the estimates and predictions could be expected to be more accurate.

      (2) Some of the results seemed to me to be apparent even without a modelling exercise (eg high coverage could not be maintained between campaigns, use would be higher with 2-yearly distributions rather than 3-yearly) or were not in themselves new insights (eg estimates of the duration of use). It would be helpful to clearly state what the novel results are in the abstract, the first paragraph of the discussion and the conclusions, and to make sure that the title is consistent.

      (3) On L236, the link to SNT is stated: "the models indicate trends that can support sub-national tailoring of ITNs". They could indeed, but SNT itself is not done in this paper. It seems to be about improving sub-national predictions of the impact of single ITN strategies, by taking into account sub-national variation in access and use duration. This is useful, and the model developed has novel aspects.

      (4) Individual countries may have records on when nets were distributed to the regions rather than needing to use the annual country number of nets together with the DHS data. It could be helpful to say what the analysis steps would be in that case.

      (5) There were several assumptions that needed to be made in building the model. There is some validation of the timing of the distributions (L633 "verified where possible through discussion with interested parties nationally and internationally") and the fit of estimated access and use to survey data, and agreement between predictions of prevalence and MAP estimates. It would be helpful to say which assumptions are important for the results (and would be key knowledge gaps) and which would not make a difference. It might be possible to validate the net timing model using a country where net distributions are known reasonably well.

      (6) What was assumed about what happens to old nets after a mass campaign was not clear. This assumption is likely to affect the predictions of access for the biennial distributions.

      (7) L312 and elsewhere: That use given access declines with net age is plausible. However, I wondered if this could be partly a consequence of the assumptions in the model (eg the two exponential decays for access and use, the possible assumption that new nets displace the current ones when there is a mass campaign).

      (8) The Methods section on Estimating historical use and access seemed to be aimed at readers familiar with formulae, but I think it could lose other interested readers. It could be useful to explain a little more about what is happening at each step and also why.

      (9) The model was fitted to MAP estimates of PfPR2-10, which themselves come from a model. It may be that there is different uncertainty in the MAP estimates for different regions. I couldn't see this on the graph, but maybe the uncertainty is small. Was this taken into account in the fitting?

      (10) Was uncertainty from each estimated component integrated into the other components?

      (11) Eyeballing Figure 2 (Burkina Faso), there is a general pattern of decline in all the regions, some differences between the regions and some differences in how well the model fits between the regions. If possible, it could be helpful to say how much better the fit was when using region-specific compared to countrywide parameter values for access and use, and how different the results would be.

      (12) The question of moving from a campaign every three to every two years may not be the most pertinent question in the current funding landscape. I realise that a paper is in development for a long time, but it would be helpful to comment on what else the model could be used for when fewer rather than more nets are likely to be available.

    2. Reviewer #2 (Public review):

      Summary:

      The authors design a custom Bayesian model to estimate the probabilities of access, use and use given access of insecticide-treated nets in six African countries, providing sub-national estimates and inferring the average duration of ITN use and access. An individual-based model was employed to simulate malaria epidemics and estimate the effectiveness of different ITN distribution strategies. The study finds that the mean probability of use or access did not reach 80% (a universal coverage formely targeted by WHO) for any of the regions, even for biennial campaigns, demonstrates that switching from triennial to biennial distribution campaigns increases population use by 7.9%, and evaluates the impact of employing more efficient ITNs on P. falciparum prevalence.

      Strengths:

      The authors developed a data-driven model that accounts for data collection imperfections and sources of uncertainty while differentiating between ITN use and access. They developed a methodology to infer the timing of a mass campaign from publicly available data instead of assuming fixed dates. The probability of use given access allows for determining the regions where ITN distribution is least effective. This work can help better inform future interventions by identifying regions where increasing mass campaign frequency or employing better ITNs are most effective. Finally, in addition to insights on ITN access and use for the six countries analyzed, the paper contributes a methodological framework that can likely be extended to other countries.

      Weaknesses:

      Since the models employed are rather complex, the description of the methodology may be hard to follow for most readers. In addition, the models assume many hypotheses, including:

      (1) Exponential decay of ITN use/access.

      (2) The decay rates for the probability of the ITN repelling and killing a mosquito are the same.

      (3) Given a time instant, all individuals in the same administrative unit and have the same probability of using a net;

      (4) ITN use/access decay models do not depend on the distribution strategy (e.g. bienal vs trienal distribution).

      (5) The Bayesian model assumes some narrow prior distributions.

      The impact of these hypotheses on the estimated parameters is not explored in the paper, and no sensitivity analyses are performed, although some limitations are discussed.

    1. Reviewer #1 (Public review):

      Summary:

      Metabolic dysfunction-associated steatotic liver disease (MASLD) ranges from simple steatosis, steatohepatitis, fibrosis/cirrhosis, and hepatocellular carcinoma. In the current study, the authors aimed to determine the early molecular signatures differentiating patients with MASLD associated fibrosis from those patients with early MASLD but no symptoms. The authors recruited 109 obese individuals before bariatric surgery. They separated the cohorts as no MASLD (without histological abnormalities) and MASLD. The liver samples were then subjected to transcriptomic and metabolomic analysis. The serum samples were subjected to metabolomic analysis. The authors identified dysregulated lipid metabolism, including glyceride lipids, in the liver samples of MASLD patients compared to the no MASLD ones. Circulating metabolomic changes in lipid profiles slightly correlated with MASLD, possibly due to the no MASLD samples derived from obese patients. Several genes involved in lipid droplet formation were also found elevated in MASLD patients. Besides, elevated levels of amino acids, which are possibly related to collagen synthesis, were observed in MASLD patients. Several antioxidant metabolites were increased in MASLD patients. Furthermore, dysregulated genes involved in mitochondrial function and autophagy were identified in MASLD patients, likely linking oxidative stress to MASLD progression. The authors then determined the representative gene signatures in the development of fibrosis by comparing this cohort with the other two published cohorts. Top enriched pathways in fibrotic patients included GTPas signaling and innate immune responses, suggesting the involvement of GTPas in MASLD progression to fibrosis. The authors then challenged human patient derived 3D spheroid system with a dual PPARa/d agonist and found that this treatment restored the expression levels of GTPase-related genes in MASLD 3D spheroids. In conclusion, the authors suggested the involvement of upregulated GTPase-related genes during fibrosis initiation.

      Concerns from first round of review:

      (1) A recent study, via proteomic and transcriptomic analysis, revealed that four proteins (ADAMTSL2, AKR1B10, CFHR4 and TREM2) could be used to identify MASLD patients at risk of steatohepatitis (PMID: 37037945). It is not clear why the authors did not include this study in their comparison.

      (2) The authors recruited 109 patients but only performed transcriptomic and metabolomic analysis in 94 liver samples. Why did the authors exclude other samples?

      (3) The authors mentioned clinical data in Table 1 but did not present the table in this manuscript.

      (4) The generated metabolomic data could be a very useful resource to the MASLD community. However, it is very confusing how the data was generated in those supplemental tables. There is no clear labeling of human clinical information in those tables. Also, what do those values mean in columns 47-154? This reviewer assumed that they are the raw data of metabolomic analysis in plasma samples. However, without clear clinical information in these patients, it is impossible that any scientist can use the data to reproduce the authors' findings.

      (5) In Fig. 5B, the authors excluded the steatosis and fibrosis overlapped genes. Steatosis and fibrosis specific genes could simply reflect the outcomes rather than causes. In this case, the obtained results might not identify the gene signatures related to fibrosis initiation.

      (6 In Fig. 6D, the authors used 3D liver spheroid to validate their findings. However, there is no images showing the 3D liver spheroid formation before and after PPARa/d agonist treatment. It is not clear whether the 3D liver spheroid was successfully established.

      (7) The authors suggested that targeting LX-2 cells with Rac1 and Cdc42 inhibitors could reduce collagen production. Did the authors observe these two genes upregulated in mRNA and protein expression levels in their cohort when compared MASLD patients with and without fibrosis?

      (8) Did the authors observe that the expression levels of Rac1 and Cdc42 are correlated with fibrosis progression in MASLD patients?

      (9) Other studies have revealed several metabolite changes related to MASLD progression (PMID: 35434590, PMID: 22364559). However, the authors did not discuss the discrepancies between their findings with the previous studies.

      Significance:

      Overall, the current study might provide some new resources regarding transcriptomic and metabolomic data derived from obese patients with and without MASLD. The MASLD research community will be interested in the resource data.

      Comments on revised version:

      Thank you for the authors' responses to my concerns. I do not have any further comments.

    2. Reviewer #3 (Public review):

      Summary:

      Metabolic dysfunction associated liver disease (MASLD) describes a spectrum of progressive liver pathologies linked to life style-associated metabolic alterations (such as increased body weight and elevated blood sugar levels), reaching from steatosis over steatohepatitis to fibrosis and finally end stage complications, such as liver failure and hepatocellular carcinoma. Treatment options for MASLD include diet adjustments, weight loss, and the receptor-β (THR-β) agonist resmetirom, but remain limited at this stage, motivating further studies to elucidate molecular disease mechanisms to identify novel therapeutic targets.

      In their present study, the authors aim to identify early molecular changes in MASLD linked to obesity. To this end, they study a cohort of 109 obese individuals with no or early-stage MASLD combining measurements from two anatomic sides: 1. bulk RNA-sequencing and metabolomics of liver biopsies, and 2. metabolomics from patient blood. Their major finding is that GTPase-related genes are transcriptionally altered in livers of individuals with steatosis with fibrosis compared to steatosis without fibrosis.

      Comments from the first round of review:

      (1) Confounders (such as (pre-)diabetes)

      The patient table shows significant differences in non-MASLD vs. MASLD individuals, with the latter suffering more often from diabetes or hypertriglyceridemia. Rather than just stating corrections, subgroup analyses should be performed (accompanied with designated statistical power analyses) to infer the degree to which these conditions contribute to the observations. I.e., major findings stating MASLD-associated changes should hold true in the subgroup of MASLD patients without diabetes/of female sex and so forth (testing for each of the significant differences between groups).

      Post-rebuttal update: The authors have performed the requested sub-group analysis and find the gene signatures hold for the non-diabetic sub-cohort, but not the diabetic subgroup. They denote a likely interaction between fibrosis and diabetes, that was not corrected for in the original analysis.

      (2) External validation

      Additionally, to back up the major GTPase signature findings, it would be desirable to analyze an external dataset of (pre)diabetes patients (other biased groups) for alternations in these genes. It would be important to know if this signature also shows in non-MASLD diabetic patients vs. healthy patients or is a feature specific to MASLD. Also, could the matched metabolic data be used to validate metabolite alterations that would be expected under GTPase-associated protein dysregulation?

      Post-rebuttal update: The authors confirm that with the present data, insulin resistance cannot be fully ruled out as a confounder to the GTP-ase related gene signature. They however plan future mouse model experiments to study whether the GTPase-fibrosis signature differs in diabetic vs. non-diabetic conditions.

      (3).3D liver spheroid MASH model, Fig. 6D/E

      This 3D experiment is technically not an external validation of GTPase-related genes being involved in MASLD, since patient-derived cells may only retain changes that have happened in vivo. To demonstrate that the GTPase expression signature is specifically invoked by fibrosis the LX-2 set up is more convincing, however, the up-regulation of the GTPase-related genes upon fibrosis induction with TGF-beta, in concordance with the patient data, needs to be shown first (qPCR or RNA-seq). Additionally, the description of the 3D model is too uncritical. The maintenance of functional PHHs is a major challenge (PMID: 38750036, PMID: 21953633, PMID: 40240606, PMID: 31023926). It cannot be ruled out that their findings are largely attributable to either 1) the (other present) mesenchymal cells (i.e., mesenchyme-derived cells, such as for example hepatic stellate cells, not to be confused with mesenchymal stem cells, MSCs), or 2) related to potential changes in PHHs in culture, and these limitations need to be stated.

      Post-rebuttal update: To address the concern of other cells than hepatocytes contributing to the observed effects in culture, the authors performed TGF-beta treatment in independent mono-cultures (Figure R4): LX-2 and hepatocytes, and the spheroid system. Surprisingly, important genes highlighted in Figure 6E for the spheroid system (RAB6A, ARL4A, RAB27B, DIRAS2) are all absent from this qPCR(?) validation experiment. The authors evaluate instead RAC1, RHOU, VAV1, DOCK2, RAB32. ­In spheroids, RHOU and RAB32 are down-regulated with TGF-B. In hepatocytes DOCK2 and RAC seemed up-regulated. They find no difference in these genes in LX-2 cells. Surprisingly, ACTA2 expression values are missing for LX-2 cells. Together, it is hard to judge which individual cell type recapitulates the changes observed in patients in this validation experiment, as the major genes called out in Figure 6E are not analyzed.

      Unfortunately, the 3D liver spheroid model used (as presente­d in PMID39605182) lacks important functional validation tests of maintained hepatocyte identity in culture (at the very least Albumin expression and secretion plus CYP3A4 assay). This functional data (acquired at the time point in culture when the RNA expression analysis in 6E was performed) is indispensable prior to stating that mature hepatocytes cause the observed effects.

      (4) Novelty / references

      Similar studies that also combined liver and blood lipidomics/metabolomics in obese individuals with and without MASLD (e.g. PMID 39731853, 39653777) should be cited. Additionally, it would benefit the quality of the discussion to state how findings in this study add new insights over previous studies, if their findings/insights differ, and if so, why.

      Post-rebuttal update: The authors have included the studies into their discussion.

    1. ['1', '2', 'Fizz', '4', 'FizzBuzz', 'Fizz', '7', '8', 'Fizz', 'FizzBuzz', '11', 'Fizz', '13', '14', 'Fizz', '16', '17', 'Fizz', '19']

      5で割り切れるところの結果がおかしい。

    1. Synthèse sur l'Optimisation des Plaques de Gélatine DIY

      Résumé Exécutif

      Ce document de synthèse détaille les problèmes, solutions et innovations présentés dans le contexte de la fabrication et de l'utilisation de plaques d'impression à la gélatine faites maison (DIY).

      L'analyse révèle trois problèmes majeurs avec les recettes traditionnelles :

      un séchage prématuré de la peinture, des risques significatifs liés à l'utilisation d'alcool, et des réactions chimiques indésirables avec la peinture acrylique.

      La solution centrale est l'adoption d'une nouvelle recette "sobre", qui élimine complètement l'alcool et le remplace par du propylène glycol.

      Ce changement résout non seulement le risque d'incendie et les problèmes d'irritation, mais améliore également de manière significative la rétention d'eau de la plaque, prévenant ainsi le séchage de la peinture.

      Parallèlement, de nouveaux protocoles de maintenance sont introduits, notamment une "routine de soins" en deux étapes (nettoyage et hydratation) pour préserver la surface de la plaque et inhiber la croissance microbienne.

      Les recommandations de stockage ont été révisées pour préconiser un contenant hermétique, en conjonction avec cette nouvelle routine.

      Enfin, des outils et méthodologies de précision sont proposés, comme le passage à des mesures en grammes et le lancement d'un "Calculateur de Recette 2.0".

      Cet outil en ligne permet de personnaliser les recettes en fonction de la taille de la plaque et de la force (valeur de Bloom) de la gélatine.

      Le document aborde également la cause du "caillage" de la peinture acrylique—un environnement acide—et fournit une solution de neutralisation à base de bicarbonate de soude.

      --------------------------------------------------------------------------------

      1. Problèmes Identifiés avec la Recette Originale

      L'analyse de la recette originale de la plaque de gélatine DIY a mis en évidence plusieurs problèmes récurrents rencontrés par les utilisateurs, transformant parfois l'expérience d'impression en un processus frustrant.

      1.1. Le Problème de la "Plaque Assoiffée"

      Le problème le plus courant est celui d'une plaque qui sèche trop rapidement, rendant la peinture quasi impossible à retirer.

      Cause principale : Un surdosage de gélatine ou l'utilisation d'une gélatine à haute valeur de Bloom. Une telle plaque n'est pas entièrement saturée en eau et devient "très, très assoiffée".

      Mécanisme : La plaque de gélatine sèche aspire instantanément l'eau contenue dans la peinture acrylique. Les pigments adhèrent alors de manière presque irréversible à la surface.

      Conséquence : La plaque se comporte comme un "aimant suceur de peinture acrylique" plutôt que comme une surface de transfert antiadhésive.

      Analogies : L'auteure compare ce phénomène aux premières crêpes que l'on jette, expliquant que la plaque a besoin de "s'échauffer", c'est-à-dire de se saturer en eau, avant de fonctionner correctement.

      1.2. Les Risques et Inconvénients de l'Alcool

      L'alcool, ingrédient clé de l'ancienne recette pour réduire l'aspect collant et améliorer la conservation, présente deux inconvénients majeurs.

      Risque d'incendie : L'utilisation d'alcool (isopropylique, dénaturé, ou à haute teneur) lors du chauffage du mélange présente un risque réel d'incendie.

      Une utilisatrice nommée Rita a d'ailleurs connu un tel incident, ce qui a été un catalyseur pour le changement de recette.

      Irritation : Les vapeurs d'alcool peuvent irriter les yeux et les voies respiratoires des utilisateurs.

      Déshydratation de la plaque : L'alcool contribue significativement à la déshydratation de la plaque sur le long terme.

      L'auteure fait une analogie avec la "sensation de Sahara dans la bouche" après une soirée arrosée pour illustrer cet effet.

      1.3. Comportement Anormal de la Peinture Acrylique

      Certains utilisateurs ont rapporté un comportement "super étrange" de la peinture acrylique, qui se met à cailler ou à se décomposer sur la plaque.

      Cause : Un environnement acide (pH bas).

      Origine du problème : L'ajout d'acides comme le jus de citron ou l'acide citrique dans le mélange, souvent dans le but d'agir comme conservateur.

      Effet : Dans un milieu fortement acide, le système liant de la peinture acrylique peut se rompre, provoquant son caillage.

      La peinture adhère alors davantage au rouleau qu'à la plaque elle-même.

      2. La Nouvelle Recette "Sobre" : La Solution Centrale

      Pour remédier à ces problèmes, la recette a été entièrement reformulée, la modification la plus importante étant le retrait de l'alcool, qualifiant la nouvelle plaque de "sobre".

      2.1. Le Remplacement de l'Alcool par le Propylène Glycol

      L'alcool est remplacé par le propylène glycol, décrit comme le "partenaire parfait" de la glycérine.

      Propriétés : Le propylène glycol appartient chimiquement à la famille des alcools, mais il est beaucoup moins volatil, ne s'évapore quasiment pas et présente un risque d'incendie significativement plus faible dans des conditions de cuisine normales.

      Bénéfices dans la recette :

      Stabilité : Il aide à rendre la plaque plus ferme et stable sans lui "voler toute son eau".  

      Rétention d'humidité : Il aide la plaque à rester flexible, à moins rétrécir et à conserver son humidité, ce qui garantit de belles impressions.   

      Conservation : Il contribue à ralentir la croissance des bactéries et des moisissures, agissant comme un agent de conservation.

      Conclusion de l'auteure : "Si je devais choisir entre 'Brûle bien' et 'Imprime bien'... je suis assez sûre que vous choisirez la plaque qui imprime parfaitement plutôt que le feu d'artifice dans la cuisine."

      2.2. Expérimentations avec des Plaques "Fusion"

      Des tests ont été menés sur des plaques "fusion" combinant les propriétés de la gélatine et d'agents gélifiants à base de plantes. Ces versions semblent résoudre nativement le problème de séchage de la peinture.

      Ingrédients testés :

      Gomme de xanthane   

      Konjac (ou glucomannane) : L'agent actif de la farine de konjac, connu pour son pouvoir gélifiant et épaississant extrême.

      Résultats préliminaires : Les plaques fusion semblent libérer plus de peinture sur le papier, laissant moins de résidus sur la surface. Les tests sont jugés "très prometteurs".

      Note : Une exploration plus approfondie de ces hydrogels est prévue dans une future vidéo.

      3. Nouveaux Protocoles de Maintenance, de Stockage et de Réparation

      La nouvelle approche s'accompagne de protocoles mis à jour pour entretenir, stocker et même réparer les plaques.

      3.1. Réhydratation d'une Plaque Sèche

      Une plaque devenue trop sèche peut être "ramenée à la vie" sans être refondue.

      Méthode : Un "bain" d'eau. La plaque est immergée dans l'eau pendant une durée allant de 3 à 48 heures, voire plus, jusqu'à ce qu'elle absorbe l'eau nécessaire et augmente de volume.

      Alternative : Si un contenant adapté n'est pas disponible, la surface peut être vaporisée d'eau, recouverte de papier essuie-tout humide et enveloppée dans un film plastique.

      3.2. Nettoyage des Anciennes Couches de Peinture

      Une découverte notable a été faite pour enlever les couches de peinture tenaces : La colle artisanale simple à base d'eau (colle blanche universelle) s'est avérée extrêmement efficace pour décoller les anciennes couches de peinture séchée de la surface de la plaque.

      3.3. Nouvelle "Routine de Soins"

      Un protocole de soins post-impression, comparé à une routine de soins pour la peau, est désormais recommandé pour préserver la plaque.

      1. Nettoyage Doux : Vaporiser un spray nettoyant sur la plaque, essuyer avec un chiffon doux pour enlever les résidus de peinture.

      2. Rinçage : Repasser sur la surface avec de l'eau claire pour éliminer tout tensioactif résiduel.

      3. Hydratation et Protection : Masser une petite quantité d'un spray de soin sur la surface.

      Les recettes pour ces sprays sont les suivantes :

      Spray

      Ingrédients (en grammes)

      Instructions

      Spray Nettoyant

      - 500g Eau<br>- 2g Savon neutre<br>- 1g Alcool (pour dissoudre)<br>- 1g Huile essentielle (Arbre à thé ou Clou de girofle, optionnel)

      Dissoudre l'huile essentielle dans l'alcool, ou directement dans le savon. Mélanger tous les ingrédients et verser dans un flacon pulvérisateur.

      Spray de Soin

      - 200g Eau<br>- 2g Huile pour bébé (huile minérale)<br>- 1g Huile essentielle d'arbre à thé<br>- 1g Huile essentielle de clou de girofle

      Mélanger tous les ingrédients. Agiter vigoureusement avant chaque utilisation car le mélange est biphasique (l'huile se sépare de l'eau).

      Le spray de soin laisse un "film protecteur huileux très fin" qui protège contre le dessèchement et rend la surface moins accueillante pour les microbes grâce aux propriétés des huiles essentielles.

      3.4. Recommandations de Stockage Mises à Jour

      Ancienne recommandation (pour les plaques avec alcool) : Ne pas stocker dans un contenant hermétique les premiers jours pour permettre à l'humidité de s'échapper et éviter un "microclimat tropical" propice aux moisissures.

      Nouvelle recommandation (pour les plaques "sobres" avec routine de soin) : Stocker dans un contenant hermétique dès le début.

      Cette approche est jugée optimiste pour minimiser la perte d'eau, les précautions étant prises par la routine de soin antimicrobienne.

      4. Outils et Méthodologies de Précision

      Pour améliorer la fiabilité et la reproductibilité des résultats, de nouvelles méthodologies ont été introduites.

      4.1. Passage aux Mesures en Grammes

      Toutes les nouvelles recettes sont désormais formulées en grammes plutôt qu'en unités de volume.

      Raison : La précision est cruciale, en particulier avec les agents gélifiants végétaux où "un demi-gramme de plus ou de moins peut déjà faire une énorme différence".

      Avantage pratique : Il devient très simple de calculer la perte d'eau lors de la refonte d'une plaque.

      Il suffit de peser la plaque usagée, de comparer son poids au poids total initial des ingrédients, et d'ajouter la différence en eau lors de la refonte pour la restaurer à son état optimal.

      4.2. Le Calculateur de Recette 2.0

      https://ashrey.com/diy-gel-plate/

      Un nouvel outil en ligne, le "Calculateur de Recette 2.0", a été développé.

      Fonctionnalités :

      ◦ Fonctionne entièrement en grammes.  

      ◦ Prend en compte la force de la gélatine (valeur de Bloom).   

      ◦ Permet de dimensionner les recettes précisément à la taille de plaque souhaitée.   

      ◦ Offre le choix entre différents types de plaques : standard, plus souple, ou la version expérimentale "fusion" avec hydrogel.

      Disponibilité : L'outil est accessible sur le site web de l'auteure. Le calculateur classique (en unités métriques et impériales) reste également disponible.

      5. Diagnostic et Solution pour le Caillage de la Peinture

      Le mystère du comportement anormal de la peinture acrylique a été résolu.

      Diagnostic : La peinture acrylique n'aime pas les environnements acides. Un pH bas provoque son caillage et la rupture de son système liant.

      Action à éviter : Ne pas ajouter d'acides (jus de citron, acide citrique) comme conservateurs dans le mélange de la plaque de gélatine.

      Solution de Réparation ("Fix d'Urgence") : Pour une plaque déjà acide, il est possible de neutraliser sa surface.

      1. Préparer une solution alcaline douce : Dissoudre 2 à 3 grammes de bicarbonate de soude (disponible sous des noms comme "Kaisernatron" en Allemagne) dans 1 litre d'eau.  

      2. Appliquer : Verser ou vaporiser la solution sur la surface de la plaque.  

      3. Attendre : Laisser agir pendant 30 à 60 secondes. 

      4. Essuyer : Sécher la plaque, puis la nettoyer à nouveau avec de l'eau propre ou une lingette pour bébé.  

      5. Répéter si nécessaire jusqu'à ce que la plaque fonctionne correctement.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03091

      Corresponding author(s): Chia-Tsen, Tsai, Liuh-Yow Chen

      1. General Statements [optional]

      We thank the reviewers for their valuable time and constructive feedback on our study, which ultimately improved our manuscript. Herein, we provide a detailed response to each of the reviewers' comments, supported by new data that have been integrated into both the main text and the supplementary figures.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary This manuscript builds upon the authors' prior findings that targeting COUP-TF2 to TRF1 induces ALT-associated phenotypes and G2-mediated synthesis in telomerase-immortalised BJT human fibroblasts. In this study, the authors show that telomere-coupled COUP-TF2 promotes H3K9me3 enrichment in these cells, and that this effect is blocked by TRIM28 depletion. Furthermore, TRIM28 depletion also suppresses the formation of ALT phenotypes in VA13 ALT cells. Given that TRIM28 has been implicated in regulating H3K9me3 deposition via SETDB1, and has been reported to co-purify with TR2 and TR4 (though not previously in the context of ALT telomeres), these findings add mechanistic depth to how heterochromatin regulators contribute to ALT activity. Overall, the manuscript's conclusions are generally supported by the presented data, but several aspects require clarification or additional experimental validation.

      The authors report a modest reduction in telomeric H3K9me3 following COUP-TF2 and TR4 depletion in U-2 OS and VA13 cells (Figure 1B). To strengthen the claim that these orphan receptors specifically regulate H3K9me3, the authors should 1) Assess additional heterochromatic histone marks (e.g., H4K20me3) at telomeres, 2) Normalize telomeric signals to both parental histone levels and input, and 3) Evaluate whether global H3K9me3 levels also decrease upon receptor depletion

      Response: We appreciate the reviewer's suggestion. To address the concern regarding specificity, we assessed H3K27me3 and H4K20me3 levels upon COUP-TF2/TR4 depletion and found no significant changes (Supplementary Fig. 1C). Furthermore, we reprocessed the telomeric ChIP data, normalizing to both input DNA and parental histone levels (Figure 1B). This refined analysis reinforces our original conclusion. Finally, Western blot analysis showed no significant changes in global H3 or H3K9me3 levels upon COUP-TF2/TR4 depletion (Figure 1A). Altogether, these results further support the specificity of COUP-TF2/TR4 for H3K9me3 at telomeres. We have revised the main text (page 3) and updated Figure 1A, 1B, and Supplementary Figure 1C for these changes.

      Most experiments explore chromatin changes in telomerase-positive BJT fibroblasts (Figure 2, Figure 4D). It remains unclear whether similar manipulations in ALT cells yield consistent effects, which would give a broader context for ALT phenotype induction. Are ALT phenotypes similarly induced in ALT cells? Does altered chromatin status affect telomere length or telomerase recruitment/activity? Can these pathways drive ALT phenotypes in non-immortalised cells?

      Response: We appreciate the reviewer's suggestion and have explored chromatin changes in telomerase-negative BJ and IMR90 primary fibroblasts (Supplementary Fig. 2C, D). Consistent to the result in BJ-telomerase cells, we found that VP64-TRF1 decreased telomeric H3, H4, and H3K9me3 levels, whereas KRAB-TRF1 increased these marks. Moreover, expression of either VP64-TRF1 or KRAB-TRF1 was sufficient to induce APB formation and ATDs in BJ and IMR90 cells. These results indicate that the chromatin changes at telomeres can drive ALT phenotypes in both primary and telomerase-immortalized fibroblast cells.

          Additionally, regarding whether chromatin alteration affects telomere length or telomere regulation, we have explored telomere length changes in BJT cells expressing vector, TRF1, KRAB-TRF1 or VP64-TRF1. The result of telomere restriction fragment (TRF) assay showed that the cells of all conditions maintained static telomere lengths through 30 days in culture (data shown below), suggesting that the chromatin alterations may not impact telomerase recruitment or activity. As this result is beyond the scope of current study, this data is only shown here in the rebuttal letter for a reference and is not included in the revised manuscript.
      
          Moreover, according to the reviewer's suggestion, we also carried out VP64-TRF1 or KRAB-TRF1 expression experiments in WI38-VA13/2RA cells that express high TERRA and have altered chromatin structures. Our data revealed that VP64-TRF1 suppresses telomere H3K9me3 and ALT activity, while KRAB-TRF1 increases both (Supplementary Figure 2E), suggesting an association of heterochromatin state with ALT activation in WI38-VA13/2RA cells.
      
          The observation that VP64-TRF1 reduces ALT activity in WI38-2RA/VA13 cells contrasts with findings in BJT cells. It is worth noting that studies from the Azzalian and Linger groups demonstrated that experimentally induced TERRA expression promotes ALT activity in ALT and non-ALT cells (PMID: 36122232, PMID: 40624280). Therefore, we propose that TERRA upregulation by VP64-TRF1 may contribute to the ALT induction observed in BJT cells (Supplementary Figure 2A, B), whereas the ability of VP64-TRF1 to suppress ALT activity in WI38-2RA/VA13 cells could be attributed to the reduction of telomere H3K9me3 and heterochromatin loss. Importantly, KRAB-TRF1 concurrently enhanced histone H3, H4, and H3K9me3 occupancy and ATL activity in both human fibroblasts and ALT cells. Altogether, these results support the notion that heterochromatin formation triggers ALT.
      
          We also examined TRIM28 recruitment to telomeres by telomere-ChIP and found that COUP-TF2LBD-TRF1 promotes TRIM28 telomere enrichment in BJ, IMR90 and U2OS, similar to BJT cells (Supplementary Fig. 5A-D).  Moreover, in ALT cell lines WI38-2RA/VA13, U2OS, and Saos-2, depletion of COUP-TF2 or TR4 reduced TRIM28 telomeric association (Figure 4A, B). Together, the data from human fibroblasts and ALT cells supports a role of orphan NRs in recruiting TRIM28 to ALT telomeres.
      

      We acknowledge the reviewer's suggestions, which allow us to clarify and strengthen the conclusions. The corresponding data are presented in Figure 4A-B and Supplementary Figure 2B-D and 5E-F, and the main text has been modified on page 4-6 in the revised manuscript.

      When referring to Figure 3G, the authors state that that telomeric H3K9me3 was abolished upon depleting TRIM28 from the U2OS and WI38-VA13/2RA cells. Abolished is a strong word for a 50% decrease, and this sentence should be revised. The reduction appears greater than that seen with COUP-TF2/TR4 depletion. Are the effects additive? If so, might TRIM28 act, at least in part, independently of COUP-TF2/TR4?

      Response: We appreciate the reviewer's comments. We have revised the manuscript on page 5, replacing "abolished" with "significantly reduced" to better describe the effect of TRIM28 depletion on telomeric H3K9me3. To further investigate the interplay between TRIM28 and orphan NRs in regulating telomeric H3K9me3, we conducted single and combined knockdown experiments in U2OS and WI38-VA13/2RA cells, followed by telomere-ChIP analysis (Supplementary Figures 4D, E). Our results showed that single depletion of either orphan NRs or TRIM28 lead to a similar decrease in telomeric H3K9me3, and that combined knockdown do not result in any further reduction. These findings support an epistatic interaction between orphan NRs and TRIM28 in the regulation of telomeric H3K9me3. We have expanded on this interpretation in the main text (page 6) and included the relevant data in Supplementary Figures 4D, E.

      VA13 cells consistently exhibit stronger effects than U-2 OS (e.g., Figures 1 and 3). This discrepancy could be linked to the high content of variant repeats in VA13 cells. The authors should assess whether variant repeat content underlies the differential response. Repeating key experiments in additional ALT lines with varied repeat compositions would be informative.

      Response: We appreciate the reviewer's suggestion and have extended our analyses to two additional ALT osteosarcoma cell lines, SAOS-2 and G292. In both lines, depletion of orphan NRs resulted in a consistent decrease in telomeric H3K9me3 levels (Supplementary Figures 1A, B). We also examined the contribution of TRIM28 to telomeric H3K9me3 in these cells. siRNA-mediated depletion of TRIM28 in SAOS-2 and G292 cells similarly caused a significant reduction in telomeric H3K9me3 and ALT phenotypes (Supplementary Figure 4A-C). Together, these results from 4 ALT cell lines confirm that orphan NRs and TRIM28 promote telomeric H3K9me3 formation in ALT cells. We have modified the main text on page 3 and 5-6 for these results.

      In line with the previous point, it would be useful to show whether TRIM28 telomeric enrichment is affected by COUP-TF2/TR4 depletion in U2OS cells (Figure 4C). To improve confidence in these findings, the authors should perform telomeric ChIP assays, especially with the COUP-TF2^LBDΔAF2-TRF1 mutant construct.

      Response: Following the reviewer's suggestion, we performed telomere-ChIP assays to assess TRIM28 enrichment at telomeres upon expression of COUP-TF2LBD-TRF1 and its ΔAF2 mutant in U2OS cells. Consistent with our immunofluorescence results, telomere-ChIP revealed that COUP-TF2LBD-TRF1 expression promotes TRIM28 telomere enrichment, while the AF2 deletion mutant failed to recruit TRIM28 (Supplementary Figure 5D). We have modified the main text on page 6 for this result.

      The immunoprecipitation experiments showing TRIM28 association with orphan receptors should include benzonase treatment to rule out DNA-mediated co-association (Figure 4F-G).

      Response: We appreciate the reviewer's suggestion. To address the possibility of DNA-mediated interactions, we pre-incubated cell lysates with benzonase prior to Co-IP (Page 7). This treatment did not disrupt the association between TRIM28 and COUP-TF2 or TR4 in WI38-VA13/2RA and BJT cells (Supplementary Figures 5E-G), indicating a DNA-independent interaction. We have modified the main text on page 7 for this result.

      The study would benefit from a direct assessment of whether COUP-TF2LBDΔAF2-TRF1 fails to induce ALT phenotypes in BJTfibroblasts.

      Response: We thank the reviewer for this suggestion. As the role of the COUP-TF2 AF2 domain in ALT induction in BJT fibroblasts has recently been thoroughly investigated and published by our group (PMID: 38752489), we have directed the current study towards a more detailed mechanistic question. Specifically, we have carried out experiments to further demonstrate that COUP-TF2 recruits TRIM28 to telomeres via its AF2 domain in both human fibroblasts and ALT cells (Supplementary Figures 5A-D). On Page 6, we have modified the main text for these results and included a citation to our previous publication to provide the necessary background.

      The experiments performed in Figure 5E-H lack a vector-only + siCtrl control.• In Figure 5E, the observation that APB formation is restored in siTRIM28 + Vector-treated cells is unexpected. The authors should address this finding and clarify whether this reflects biological noise or a compensatory effect.

      Response: We thank the reviewer for this suggestion. We have repeated the experiments with a revised design, ensuring a consistent vector background across all groups (Vector + siCtrl, Vector + siTRIM28, TRIM28 WT + siTRIM28, and TRIM28 ΔRBCC + siTRIM28) (Figure 5E-H). This improved design confirms that expression of wild-type TRIM28, but not TRIM28 ΔRBCC, restores APB formation, ATDS, ssTeloC, and telomeric H3K9me3 levels in TRIM28-depleted cells. The updated dataset also resolves the previous unexpected increase in APB formation in the siTRIM28 + Vector condition, which is now excluded. We have modified the main text accordingly on page 8.

      Reviewer #1 (Significance (Required)):

      This work offers valuable mechanistic insight into how COUP-TF2 and TRIM28 coordinate to regulate heterochromatin deposition and ALT phenotype formation. It adds to the growing understanding of chromatin-mediated telomere regulation. What remains unclear is how important this interaction is for ALT maintenance, as H3K9me3 is only moderately altered upon TRIM28 depletion in ALT cells. Depletion of TRIM28 has been shown previously to induce APB formation and telomere elongation in U-2 OS ALT cells (Wang et al., 2021), the opposite to what the authors observed here in VA13 cells (Figure 5E-H). Clarifying whether these differences are variant repeat-dependent, or reflect intrinsic features of specific ALT cell lines, would substantially elevate the study's impact.

      Response: We appreciate the reviewer's recognition of the significance of our work in elucidating the molecular basis of ALT regulation through COUP-TF2-TRIM28-mediated heterochromatin formation. In response to the reviewer's insightful comment regarding the importance of this interaction for ALT maintenance, we have expanded our study. We now include data from three additional primary human fibroblasts and a total of four ALT cancer cell lines (Figure 4, Supplementary Figure 4). These new data further strengthen the conclusion that TRIM28 promotes telomeric H3K9me3 and ALT-associated features. Furthermore, our rescue experiments support the model that the ALT-promoting function of TRIM28 in both fibroblasts and ALT cell lines is mediated through its physical interaction with COUP-TF2 (Supplementary Figure 5). We believe these results provide a solid foundation for demonstrating a cooperative role of COUP-TF2 and TRIM28 in ALT maintenance, and address the reviewer's concern regarding the generalizability of our findings.

      Reviewer #2 (Evidence, reproducibility and clarity (Required):

      Summary This manuscript investigates the role of orphan nuclear receptors (ORs), specifically COUP-TF2 and TR4, in promoting H3K9me3 enrichment at ALT telomeres via recruitment of TRIM28 (KAP1). The authors propose that the AF2 domain of COUP-TF2, located in its ligand-binding domain (LBD), is sufficient to recruit TRIM28 to telomeres. This, in turn, promotes heterochromatinization and induces hallmarks of the Alternative Lengthening of Telomeres (ALT) pathway, including APB formation and telomeric DNA synthesis outside of S-phase. This study addresses one important and unresolved question in the field: by what mechanism is the heterochromatic state established at ALT telomeres? Another timely question, not addressed here is: how is heterochromatin (specifically H3K9me3) functionally linked to ALT? The findings are potentially novel and mechanistically insightful. However, key elements of the study, particularly the central tethering experiments, require stronger quantification and clarity. Additional mechanistic tests and literature adjustments would also improve the manuscript.

      Major Concerns

      Central TRF1-COUP-TF2-LBD result lacks quantification and clarity: the tethering of COUP-TF2's LBD to telomeres via TRF1 is a core result of the paper. This experiment demonstrates that this domain is sufficient to induce weak H3K9me3 enrichment and ALT features (APBs and ATDS). However, the supporting ALT data are presented only in Supplementary Figures S1A and S1B, and are not quantified. These data should be quantified with appropriate statistics and moved to a main figure.

      Response: The current study builds upon our recent publication (PMID: 38752489), which comprehensively analyzed ALT induction (APBs, ATDS, C-circles, T-SCEs) by orphan NR-TRF1 expression (COUP-TF1, COUP-TF2, TR2, and TR4; full-length and LBD) in various human fibroblast cell lines. To avoid potential duplicate publication concerns, particularly regarding APB and ATDS results for COUP-TF2LBD-TRF1 in BJT cells, we have put the data with revised quantification results in Supplementary Figure 1D-E. We will follow the reviewer's suggestion and move this data to the main figures if the editors agree.

      Furthermore, the broader functional implication is not explored. Does this tethering induce a fully functional ALT pathway? For example, can telomerase knockout cells expressing TRF1-COUP-TF2-LBD maintain long-term proliferation? Such evidence would significantly strengthen the impact of the study.

      Response: While COUP-TF2LBD-TRF1 expression rapidly induces key ALT phenotypes, we acknowledge that this alone is insufficient to directly promote telomere lengthening and long-term proliferation of primary fibroblasts, as discussed in Gaela et al., 2024 (PMID: 38752489). However, our ongoing, unpublished studies indicate that COUP-TF2LBD-TRF1 can drive immortalization of primary BJ fibroblasts expressing SV40LT by promoting ALT-mediated telomere elongation (Attached Figure A-C; additional data not shown). These findings suggest that COUP-TF2 may cooperate with additional genetic or epigenetic alterations to facilitate ALT development. We appreciate the reviewer's recognition of this critical aspect. As our immortalization study is still in progress and will be the subject of a separate manuscript, we hope the reviewer understands that the data shown in this letter will not be included in the revised manuscript.

      Chromatin manipulation experiments lead to ambiguous conclusions: the authors propose that telomeric heterochromatin promotes ALT activity, but their own experiments (e.g., Figure 2) show that both heterochromatin-inducing (KRAB-TRF1) and euchromatin-inducing (VP64-TRF1) tethering can trigger ALT-like features. This makes it difficult to conclude that heterochromatin is specifically required.

      To clarify:

      -Did the authors express TRF1-VP64 in an ALT cell line? According to their model, this should suppress ALT activity.

      -More broadly, do chromatin alterations per se (regardless of direction) trigger ALT features? Clarifying these points is important for interpretation.

      Response: In response to the reviewer's suggestion, we expressed VP64-TRF1 and KRAB-TRF1 in WI38-2RA/VA13 cells to investigate telomere chromatin changes and ALT activity. Our data indeed revealed that VP64-TRF1 suppresses telomere H3K9me3 and ALT activity, while KRAB-TRF1 increases both (Supplementary Figure 2E), suggesting that heterochromatin triggers ALT activation.

      The observation that VP64-TRF1 reduces ALT activity in WI38-2RA/VA13 cells contrasts with findings in BJT cells. Of note, studies from the Azzalian and Lingner groups demonstrated that experimentally induced TERRA expression promotes ALT activity in ALT and non-ALT cells (PMID: 36122232, PMID: 40624280). Therefore, we propose that TERRA upregulation may contribute to the ALT induction observed in BJT cells (Figure 2A, Supplementary Figure 2A, B). Given the high basal TERRA expression, expression of VP64-TRF1 and KRAB-TRF1 did not result in a consistent change in TERRA levels (Supplementary Figure 2F). Thus, the ability of VP64-TRF1 to suppress ALT activity in WI38-2RA/VA13 cells could be attributed to the reduction of telomere H3K9me3 and heterochromatin loss. Altogether, our results support the hypothesis that heterochromatin formation, rather than euchromatin triggers ALT.

      We thank the reviewer's insightful comments, which have allowed us to resolve the ambiguity of our results and strengthen the notion that heterochromatin formation promotes ALT. We think that the heterochromatin features and high TERRA expression represent two independent, coexisting mechanisms within ALT cancer cells to guarantee ALT activation. We have modified the main text on page 4-5 accordingly.

      TERRA downregulation contradicts current models: while TERRA upregulation is often observed in ALT cells and is thought to contribute to replication stress and recombination at telomeres, the authors show that TRF1-KAP1 expression induces ALT features while TERRA is downregulated. This observation is not addressed in the manuscript. The authors should at least discuss this discrepancy and propose whether this reflects a cell line-specific phenomenon or a decoupling between TERRA levels and ALT induction in this context.

      Response: We thank the reviewer for the comments. As mentioned above (Major Concerns 2), heterochromatin formation and TERRA expression are two mechanisms that can independently promote ALT. Unlike ALT cell lines that have high TERRA levels, human fibroblasts BJ cells have low TERRA that does not induce ALT phenotypes. Thus, the effect of KRAB-TRF1 on ALT induction in BJ cells could be attributed to the heterochromatin formation, but not reduction of TERRA. We have modified the main text on page 5 to clarify the result.

      Minor Comments

      Introduction (p. 3): The authors cite Episkopou et al. as showing increased H3K9me3 at ALT telomeres. This is incorrect; that paper suggests the opposite. The first study to clearly demonstrate H3K9me3 enrichment at ALT telomeres is Cubiles et al., 2018 and should be cited instead. Results (p. 5, first paragraph): The manuscript should cite Déjardin and Kingston, 2009 as the first to report COUP-TF2 and TR4 localization at ALT telomeres. The studies by Conomos et al., 2012 and Gaela et al., 2024 build on this prior evidence. Please also include this citation in the bibliography.

      Response: We appreciate the reviewer's careful reading and for pointing out these errors. The citation errors on pages 2 and 3 have now been corrected.Broader relevance of TRIM28-OR interaction: TRIM28 is a complex protein with roles in SUMOylation, heterochromatin formation, and transcriptional initiation/elongation regulation.

      The authors should explore whether similar COUP-TF2/TRIM28 interactions occur at other genomic loci. Public ChIP-seq data for COUP-TF2, TR4, and TRIM28 could be mined to investigate whether these factors co-occupy regulatory regions elsewhere in the genome, and how this relates to gene expression states.

      Response: We appreciate the reviewer's insightful suggestion regarding a potential genome-wild functional interaction between TRIM28 and COUP-TF2. To address this, we analyzed public ENCODE ChIP-seq data from K562 cells (TRIM28: ENCSR000BRW; COUP-TF2: ENCSR000BRS). This analysis revealed 3,326 co-binding sites for TRIM28 and COUP-TF2 (Attached Figure A). Interestingly, these co-binding sites were preferentially located within gene bodies (70.7%) and promoter regions (4.3%) (Attached Figures B-D), suggesting a potential cooperative role in gene regulation that aligns with our observation of physical interaction. While the finding is intriguing, a full exploration is beyond the scope of this manuscript, which focuses on ALT telomere regulation. We consider this is an important insight and have briefly noted it in the discussion (p. 9), although the corresponding analyses are not included in the revised manuscript.

      Reviewer #2 (Significance (Required)):

      This work contributes mechanistic insight into how heterochromatin is established at ALT telomeres-an important and timely question in telomere biology and cancer research. It offers a noncanonical recruitment mechanism for TRIM28, independent of KRAB-ZNFs, and highlights the functional role of orphan nuclear receptors in telomeric chromatin regulation. The study has potential implications for understanding ALT regulation and for identifying new intervention points in ALT-positive cancers. The work is conceptually interesting, but the conclusions are currently limited by insufficient quantification, some interpretative ambiguities, and a few overlooked references. Addressing the concerns listed above would significantly enhance the rigor and impact of the manuscript.

      Response: We appreciate the reviewer's recognition of the significance of our work in elucidating the molecular basis of ALT regulation through COUP-TF2-TRIM28-mediated heterochromatin formation. We also thank the reviewer for the valuable feedback, which has significantly strengthened our manuscript.

    1. Kidney structure

      High-Level Summary

      The kidneys are bean-shaped organs protected by three outer layers and organized internally into the cortex, medulla, and renal pelvis. Nephrons in the cortex filter blood supplied by a highly branched vascular network that enters and exits through the renal hilum. Urine formed by nephrons flows through the renal pyramids into calyces, then the renal pelvis, and finally the ureter. Each kidney contains over one million nephrons, which are either cortical or juxtamedullary, depending on their position relative to the medulla.

      Study Notes: Kidney Structure 1. External Kidney Structure

      The kidney is surrounded by three protective layers (outer → inner):

      • Renal fascia Tough connective tissue Anchors kidney to surrounding structures

      • Perirenal fat capsule Cushions and stabilizes the kidney

      • Renal capsule Thin, tough layer directly covering kidney surface

      • Internal Kidney Regions

      The kidney has three main internal regions:

      Renal Cortex (outer region) Granular appearance Contains nephrons (functional units of the kidney) Site of blood filtration

      Renal Medulla (middle region) Made of renal pyramids (cone-shaped tissue masses) Each kidney has ~8 pyramids Renal columns lie between pyramids and carry blood , vessels Pyramid tips = renal papillae, which point toward the , pelvis

      Renal Pelvis (inner region) Located at the hilum Funnel-shaped urine collection area Drains urine into the ureter

      1. Hilum of the Kidney

      2. Concave region of the kidney

      3. Entry/exit point for: Renal arteries Renal veins Nerves

      4. Exit point for the ureter

      5. Urine Flow Pathway

      6. Minor calyces → Major calyces → Renal pelvis → Ureter → Urinary bladder

      7. Renal Lobes

      8. A renal lobe = one renal pyramid + surrounding cortical tissue

      9. Functional subdivision of the kidney

      Blood Supply of the Kidney (In Order) 1. Aorta 2. Renal arteries 3. Segmental arteries 4. Interlobar arteries (run through renal columns) 5. Arcuate arteries (arch at cortex–medulla boundary) 6. Cortical radiate arteries 7. Afferent arterioles 8. Glomerular capillaries (nephrons)

      Venous return: Veins follow the same path in reverse. Same names as arteries except no segmental veins. Drain into the inferior vena cava.

      Nephrons (Functional Units) Each kidney contains >1 million nephrons Located mainly in the renal cortex

      Types of Nephrons Cortical nephrons (≈85%) Located deep in cortex Short loops of Henle Juxtamedullary nephrons Located near cortex–medulla boundary Long loops of Henle Important for urine concentration

      Parts of a Nephron Renal corpuscle Renal tubule Associated capillary network

    1. Mental Status Exam

      1-Apperance: how does the person look like wearing and pysical 2-mood: how emotions show itself 3- Cognition: aware of the time and location 4-insight and judgement: aware of the illness itself 5- intellectual functioning: the expression of the toughts are not distrupted and has a flow

    1. Dus Romanisatie van de kerk: 1. Taal van Grieks naar Latijn 2. Centralisatie van het Christendom 3. Keizers gingen concilies leggen 4. Definiëren van ketterij 5. Bevorderen van bekering -> uiteindelijk dwang 6. Er ontstond een hiërarchische structuur: bisschoppen kregen publieke functies zoals praetoren 7. Er werden meer beslechtingsregels ingevoerd waardoor het meer conflictenrecht werd

    1. The hyperparameter search space is summarized in Table 1, with full results in Table 2. While no single configuration is universally optimal, we highlight a setting with block_size=4, fetch_factor=16, and num_workers=12, which achieves approximately 2593 samples/sec and maintains an entropy of 3.59—comparable to random sampling.

      This is a powerful tool that allows everyone to train on large datasets, thank you for sharing it with the community! Do you have any practical sense about the tradeoffs between minibatch entropy and model validation performance for a set amount of training time? Obviously this is an impossible experiment to actually run, but I wonder if even lower minibatch entropy which allows higher throughput would be ideal given a set training time. Do you have any anecdotal evidence from training runs on how much shuffling is optimal? I agree that since this experiment could not actually be performed, close to random shuffling is probably the best. Thank you for this contribution!

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      In this study, we mechanistically define a new molecular interaction linking two of the cell's major morphological regulatory pathways-the Rho GTPase and Hippo signaling networks. These two major signaling pathways are both required for life across huge swaths of the tree of life. They are required for the dynamic organization and reorganization of proteins, lipids, and genetic material that occurs in essential cellular processes such as division, motility and differentiation. For decades these pathways have been almost exclusively studied independently, however, they are known to act in concert in cancer to drive cytoskeletal remodeling and morphological changes that promote proliferation and metastasis. However, mechanistic insight into how they are coordinated is lacking.

      Our data reveal a mechanistic model where coordination is mediated by the RhoA GTPase-activating protein ARHGAP18, which forms molecular interactions with both the tumor suppressor Merlin (NF2) and the transcriptional co-regulator YAP (YAP1). Using a combination of state-of-the-art super-resolution microscopy (STORM, SORA-confocal) in cultured human cells, biochemical pulldown assays with purified proteins, and analyses of tissue-derived samples, we characterize ARHGAP18's function from the molecular to the tissue level in both native and cancer model systems.

      Together, these findings establish a previously unrecognized molecular connection between the RhoA and Hippo pathways and culminate in a working model that integrates our current results with prior work from our group and decades of prior studies. This model provides a new conceptual framework for understanding how RhoA and Hippo signaling are coordinated to regulate cell morphology and tumor progression in human cells.

      In this substantially revised manuscript, we have addressed all comments from the expert reviewers described point-by-point below. A shared major comment from the reviewers was the request for direct evidence of the proposed mechanistic model. To address these constructive comments, we've added new experiments, new quantification, new text, new control data, and have added two expert authors, adding super-resolution mouse tissue imaging data for the endogenous study of ARHGAP18 in its native condition. We believe that these additions greatly enhance the manuscript and collectively address the overall message from the reviewer's collective comments.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript describes a dual mechanism by which ARHGAP18 regulates the actin cytoskeleton. The authors propose that in addition to the known role for ARHGAP18 in regulating Rho GTPases, it also affects the cytoskeleton through regulation of the Hippo pathway transcriptional regulator YAP. ARHGAP18 knockout Jeg3 cells are were generated and show a clear loss of basal stress fiber like F-actin bundles. The authors further characterize the effects of ARHGAP18 knockout and overexpression. It is also discovered that ARHGAP18 binds to the Hippo pathway regulator Merlin and to YAP. Ultimately it is concluded that ARHGAP18 regulates the F-actin cytoskeleton through dual regulation of RHO GTPases and of YAP. While the phenotype of the ARHGAP18 knockout and the association of ARHGAP18 with Merlin and YAP is interesting, I found the authors conclusion that these phenotypes are due to ARHGAP18 regulation of both RHO and YAP to be based on largely correlative evidence and sometimes lacking in controls or tests for significance. In addition the authors often make overly strong conclusions based on the experimental evidence. In some instances, the rationale for how the experimental results support the conclusion is insufficiently articulated, making evaluation challenging. In general although the authors have some interesting observations, more definitive experiments with proper controls and statistical tests for significance and reproducibility are needed to justify their overall conclusions.

      • *

      *We appreciate the reviewers' constructive comments and have added substantial new data and quantifications to address their concerns. We have focused these new data on directly testing the proposed mechanisms, adding controls, and performing quantitative analysis with statistical testing. Additionally, we have edited our language to make our rationale clearer and to present our conclusions as a more moderate assessment of our experimental results. Below we respond to the specific comments made by the reviewer, followed by a list of additional editorial changes we've made based on the reviewer's overarching comments on clarity and rationale. *

      Specific Comments

      1) The authors make a big point about the effects of ARHGAP18 on myosin light chain phosphorylation. However, this result is not quantified and tested for statistical significance and reproducibility.

      *We thank the reviewer for their comments on our western blotting quantification, which in the original submission version had quantification of RhoA downstream signaling of pCofilin/ Cofilin and pLIMK/ LIMK. We had withheld the pMLC and MLC quantification as the result was previously published with quantification, reproducibility, and statistical significance by our group in our prior manuscript on ARHGAP18 published in Elife in 2024 (Fig. 4E of *

      https://doi.org/10.7554/eLife.83526 ). However, these prior results lacked the new overexpression data. We recognize the need to add these data to this manuscript as requested by the reviewer.

      • *

      *To address the reviewer's comment, we have added quantification of pMLC/MLC (Fig. 1F) *

      2) Along similar lines in Figure 2C they state that overexpression of ARHGAP18 causes cells to invade over the top of their neighbors. This might be true and interesting, but only a single cell is shown and there is no quantification or controls for simply overexpressing something in that cell. The authors also conclude from this image that the overexpression phenotype is independent of its GAP activity on Rho. It is not clear how this conclusion is made based on the data. It would seem like a more definitive experiment would be to see if a similar phenotype was induced by an ARHGAP18 mutant deficient in GAP activity.

      Based on the reviewer's comment, we recognize the qualitative statements made in Figure 2C (now Figure 3) should've been made more quantitative. We have added the control of Jeg 3 WT cells expressed with empty vector flag to show that WT cells do not invade over the top of each other (Fig. 3F). Additionally, we have added the quantification found in Fig. 3E, which shows the % invasive/ non-invasive cells between WT and ARHGAP18 overexpression cells. We have clarified our conclusions to make clear that these data do not directly test if the invasive phenotype derives from a Rho-independent mechanism. The text now states the following conclusion alongside others, which can be seen in our tracked changes:

      • *

      "These data support the conclusion that ARHGAP18 acts to regulate basal and junctional actin. However, it was not clear whether this activity occurred through a Rho-independent or a Rho-dependent mechanism."

      • *

      We have added new data of cells expressing an ARHGAP18 mutant deficient in GAP activity, which is explained in detail in the following response below.

      3) In Figure 3 the authors compare gene expression profiles of ARHGAP18 knockout cells to wild-type cells. They see lots of differences in focal adhesion and cytoskeletal proteins and conclude that this supports their conclusion that ARHGAP18 is not just acting through RHO. The rationale for this in not clear. In addition, they observe changes in expression profiles consistent with changes in YAP activity. They conclude that the effects are direct. This very well might be true. However RHO is a potent regulator of YAP activity and the results seem quite consistent with ARHGAP18 acting through RHO to affect YAP.

      • *

      We thank the reviewer for their comment and believe the revised manuscript now presents direct evidence to support the conclusions made through the editing text and the incorporation of new data.

      • *

      First, the reviewer highlighted that we were not clear in our rationale and explanation of the conclusions made from our RNAseq data in the new Figure 4 (Previously Figure 3). We agree with the reviewer that the RNAseq data alone is not sufficient rationale for the conclusion that ARHGAP18 is acting through YAP directly. In the revised manuscript, the conclusion is now made based on the combination of our multi-faceted investigation of the relationship between ARHGAP18 and YAP (most importantly, new Figure 5). It's important for us to argue that our RNAseq analysis is much more robust and specific than simply reporting a descriptive assay seeing lots of differences in cytoskeletal proteins. We recruited an outside RNAseq expert collaborator; Dr. Yongho Bae, to perform state-of-the-art IPA analysis and a grueling manual curation of the top hit genes to identify the predominant signaling pathways linking the loss of ARHGAP18 to known YAP translational products. We've provided a supplemental table listing each citation supporting the identified YAP pathway associations from this manual curation. We also have added a new discussion paragraph on RNAseq data to clarify our specific RNAseq data results and analysis. In the revised manuscript, we have moderated our language in the results text regarding the RNAseq data to reflect the reviewer's suggestion:

      • *

      "Our RNAseq data alone could not independently confirm if the alterations to transcriptional signaling and expression of actin cytoskeleton proteins were through a Rho-dependent or Rho-independent mechanism."

      • *

      • *

      Second, in this comment and the above, the reviewer highlights the need for a new experiment to directly test the Rho Independent effects of ARHGAP18, which we now provide in the new Figure 5. In this new data, we've applied an experimental design suggested by reviewer 2 regarding the same concern. In short, we've produced and expressed a point mutant variant ARHGAP18(R365A), which abolishes the Rho GAP activity while maintaining the remainder of the protein intact. This construct allows us to directly test the effects of ARHGAP18 independent from its RhoA GAP activity. We find that the GAP-deficient ARHGAP18 is able to fully rescue basal focal adhesions, indicating that the basal actin phenotype is at least in part regulated through a Rho-independent mechanism.

      • *

      • *

      *We believe the revised manuscript, when taken in totality, provides the definitive proof requested by the reviewer. Specifically, the combination of Figure 5, where we show new data using the ARHGAP18(R365A) variant, and the result that ARHGAP18 forms a stable complex with YAP (Fig. 6G) or Merlin (Fig.6A), is supportive of direct Rho-independent molecular interactions between YAP, Merlin, and ARHGAP18. *

      4) In Figure 4A showing Merlin binding to ARHGAP18 there is no control for the amount of Merlin sticking to the column as was done in Figure 4F for binding experiments with YAP. This makes it difficult to determine the significance of the observed binding.

      We have performed the requested control experiment and added the results to Figure 6A.

      5) The images in Figure 4C showing YAP being maintained in the nucleus more in ARHGAP18 knockout cells compared to wild-type. However the images only show a few cells and YAP localization can be highly variable depending on where you look in a field. Images with more cells and some sort of quantification would bolster this result.

      We have provided quantification (Figure 6D) of what was originally Figure 4C (now Figure 6C).

      Reviewer #1 (Significance (Required)):

      While the phenotype of the ARHGAP18 knockout and the association of ARHGAP18 with Merlin and YAP is interesting, I found the authors conclusion that these phenotypes are due to ARHGAP18 regulation of both RHO and YAP to be based on largely correlative evidence and sometimes lacking in controls or tests for significance. In addition the authors often make overly strong conclusions based on the experimental evidence. In some instances, the rationale for how the experimental results support the conclusion is insufficiently articulated, making evaluation challenging. In general although the authors have some interesting observations, more definitive experiments with proper controls and statistical tests for significance and reproducibility are needed to justify their overall conclusions.

      In the above comments, we detail the specific definitive experiments, proper controls, and statistical tests for significance, requested by the reviewer, which we believe greatly strengthen our manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This manuscript investigates the Rho effector, ARHGAP18 in Jegs cells, a trophoblastic cell line. It presents a number of new pieces of data, which increase our understanding of the importance of this GAP on cell function and explains at a molecular level previous results of other workers in the field. ARHGAP18 was originally given the name "conundrum' and continues to stand apart from the majority of other GAP proteins and their functions. Hence the data here is significant and of high standard.

      The data is clear, and the images are of high quality and extremely impressive in their resolution. It is significant and adds a further layer to our understanding of the regulation of cell migration, particularly in the formation and resolution of microvilli.

      • *

      We appreciate the reviewer's comments and supportive insights.

      The data is based on the use of the cell line Jeg3. Even the authors previous publication in eLife is based only on this cell line. They need to show the conclusions are general and not specific to this line of cells. As an extension of this, is the ARHGAP18 function shown here only in transformed cells? Does the same mechanisms operate in normal cells, which respond to activation to proliferate or migrate?

      • *
      • We respectfully point out that the critical experiments of the prior eLife publication were validated in DLD-1 colorectal cells and not Jeg-3 cells alone (Figure 1-figure supplement 2). Our newly independent lab, established just over a year ago, is unable to perform a full expansion of the manuscript using untransformed cells, however, we agree with the reviewer's perspective and wish to address the comment to the best of our current capability. To answer the reviewers' suggestions, we have recruited Dr. Christine Schaner Tooley, an expert in mouse model system studies. In the revised manuscript, we've added new Super-Resolution SORA confocal images of endogenous ARHGAP18's localization in the intact intestinal villi tissue, and apical junctions of WT mice (Fig.1A-C). These data indicate that endogenous ARHGAP18 is enriched (but not exclusively localized) at the apical plasma membranes of normal WT epithelial cells. This localization, where both Merlin and Ezrin are present at apical membrane/ junctions under normal conditions, is a major component of the working model proposed in Fig. 7. These data also indicate that ARHGAP18 is capable of entering the nucleus in WT cells, another critical aspect of our proposed model. Collectively, our DLD-1 studies published previously and or new studies using WT mice tissue samples support the conclusion that at least some of ARHGAP18's functions described in this manuscript are not limited to Jeg3 cells.*

      In endothelial cells, Lovelace et al 2017 showed localization to microtubules and that depletion of ARHGAP18 resulted in microtubule instability. The authors may like to comment on the differences. Is this a cell type difference or RhoA versus RhoC difference?

      • *

      In our previous publication (Lombardo Elife), we validated the finding that ARHGAP18 forms a complex with microtubules, as we detected tubulin in the ARHGAP18 pulldown experiment (Figure 1- Source Data). However, our data indicate that in Jeg3 cells ARHGAP18 does not localize to the same microtubule associated spheres observed in the Lovelace publication. We now comment on the shared conclusions and differences between this manuscript and the Lovelace et al 2017 in the discussion section.

      • *

      "In endothelial cells, ARHGAP18 has been reported to localize microtubules and plays a role in maintaining proper microtubule stability (Lovelace et al., 2017). In our epithelial cell culture models and WT mouse intestine, we have been unable to detect ARHGAP18 at microtubules suggesting ARHGAP18 may have additional functions is various cell types."

      On pages 7,9 they conclude that MLC and basal and junctional actin are regulated through a GAP independent mechanism. The best way to show this is with overexpression of a GAP mutant.

      We appreciate the reviewer's insight and have produced and expressed a GAP mutant, ARHGAP18(R365A), in our cells, directly testing our conclusion that ARHGAP18 has a GAP-independent function. These data are now presented in revised Figure 5 and explained further in response to reviewer #1.

      There is a huge amount of data presented in Figure 3, but their 2 genes which they focus on, LOP1 and CORO1A, are discussed but no actual data presented in support.

      We now validate the CORO1A by qPCR in Figure 4J.

      • *

      Reviewer #2 (Significance (Required)):

      The data is significant and adds a further layer to our understanding of the regulation of cell migration, particularly in the formation and resolution of microvilli. This manuscript will be of significance to an basic science audience in the field of RhoGTPases and cell migration.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The study by Murray et al explores the effects of ARHGAP18 on the actin cytoskeleton, Rho effector kinases, non-muscle myosin, and transcription. Using super resolution microscopy, they show that in ARHGAP18 KO cells there is a mixed and unexpected cytoskeleton phenotype where myosin phosphorylation appears to be increased, but actin is disorganised with reduced stress fibres, diminished focal adhesions and augmented invasiveness. They conclude that the underlying mechanisms are likely independent from RhoA. Next, they perform RNAseq using the KO cells and identify an array of dysregulated genes, including those that play crucial roles in microvilli (related to previously published findings). Analysis of the data identify gene expression changes that are relevant for altered focal adhesion (integrins). Further analysis reveals that a large cohort of the dysregulated genes are YAP targets. They then show that in ARHGAP18 KO cells YAP nuclear localization, as detected by immunostaining, is augmented; and demonstrate that immobilized ARHGAP18 protein can bind the Hippo regulator merlin as well as YAP itself.

      Major comments:

      1, The premise of the study (that ARHGAP18 is a RhoA effector or may acts independently of RhoA) remains not proven.

      We have added new evidence of direct RhoA independent activity for ARHGAP18 described in the above comments. Specifically, we've added data using a RhoA-GAP dead variant of ARHGAP18 in Figure 5, which we believe addresses this comment.

      • *

      At several places (including in the title) the authors refer to ARHGAP18 as a Rho effector, which would suggest that it is downstream form Rho, but the basis for this is not clear. In fact, their own previous study suggested that ARHGAP is a RhoA regulator, rather than an effector. In general, the connection of the described effects to RhoA remains unclear, and not addressed in this study. The authors seem to go back and forth in their conclusions regarding the connection between ARHGAP18 and RhoA. For example, the first section of results is finished by stating (line 194): "These data support the conclusion that ARHGAP18 acts to regulate basal and junctional actin through Rho-independent mechanism". But the next section starts by stating (line 198): "We hypothesized that the invasive and cytoskeletal phenotypes observed at the basal surface of cells devoid of ARHGAP18 may be a result of changes in regulation at the transcriptional level either directly through RhoA signaling or through an additional mechanism specific to ARHGAP18". The paper would be strengthened by adding data that show whether the effects are indeed downstream, from RhoA or RhoA independent. If there is no sufficient demonstration that ARHGAP18 is downstream of RhoA and is an effector, this needs to be stated explicitly, and the wording should be changed.

      *We now provide new data in Figure 5, which directly tests the RhoA independent functions of ARHGAP18 as recommended by the reviewer. Our understanding of the term effector is 'a molecule that activates, controls, or inactivates a process or action.' Based on this understanding, we used the term to convey ARHGAP18's functional role within the feedback loop, rather than to imply that it acts exclusively downstream. *

      • *

      We seek to clarify our perspective with the reviewer's assertion that we go "back and forth" as to if ARHGAP18 functions in a Rho Dependent or Rho Independent manner. It was our intent to propose a model where ARHGAP 18 acts in two separate circuits that regulate cell signaling. The first circuit involves ARHGAP18's canonical RhoA GAP activity, which involves ERMs and LOK/SLK, and is limited to the apical plasma membrane. This first signaling circuit was characterized in our prior Elife manuscript (Lombardo et al., 2024) and in an earlier JCB manuscript (Zaman and Lombardo et al., 2021). In this newly revised manuscript, we provide a partial mechanistic characterization of the second circuit, which we freely admit is much more complex and will likely require additional study to fully characterize.

      • *

      As both circuits operate as signaling feedback loops, we find the terms 'upstream' and 'downstream' to be of limited value, and we attempt to avoid their use when possible. We retain their use only when referring to the Hippo and ROCK signaling cascades, where these designations are well established. We suggest that the conceptual inconsistencies of Conundrum/ARHGAP18 may have arisen from the tendency to view it in strictly binary terms as upstream or downstream. Here, we propose a third possibility that ARHGAP18 functions as both, participating in a negative feedback loop.

      • *

      *We have edited and added data testing if the effects are Rho independent and discussion text in response to the reviewer's comments and clarify the molecular function of ARHGAP18.

      "Additionally, focal adhesions and basal actin bundles are restored to WT levels when the ARHGAP18(R365A) GAP-ablated mutant is expressed in ARHGAP18 KO cells (Fig. 5A, B). These results represent the strongest argument that ARHGAP18 functions in additional pathways to RhoA/C alone. Our data suggests that at least one of the alternative pathways is through ARHGAP18's interaction with YAP and Merlin. From these data we conclude that ARHGAP18 has important functions in both RhoA signaling through both its GAP activity and in Hippo signaling through its GAP independent binding partners. "*

      • *

      • *

      The study is descriptive and contains a series of observations that are not connected. Because of this, the study's conclusions are not well supported, and key mechanistic insight is limited. The study feels like a set of separate observations, that remain incompletely worked out and have some preliminary feel to them. The model in the last figure also seems to contain hypotheses based on the observations, several of which remains to be proven.

      • *

      *We present our revised manuscript, in which we've more clearly outlined our rationale and conclusions, as detailed in the above responses, to emphasize the overall connectivity of the study. We have also updated the title of Figure 7 to read "__Theoretical __Model of ARHGAP18's coordination of RhoA and Hippo signaling pathways in Human epithelial cells." To make it clear that we are presenting a working model, which has elements that will require additional investigation. Throughout the manuscript, we highlight the unknown elements that remain to be tested or other outstanding questions. Thus, we do not aim to characterize this complex signaling coordination completely. Instead, this manuscript represents the 3rd iteration in our systematic advances to describe this entirely new signaling pathway. We agree that, despite three separate manuscripts (this one included) to date, this work represents an early stage in understanding the system, many additional studies will be needed to characterize this signaling system fully. Figure 7 is presented as a working model that results from a thoughtful combination of our collective data and that of other researchers, derived from numerous species across decades of study. We firmly believe that proposing such integrative models is valuable for advancing the field. We also recognize the importance of clearly indicating which aspects remain hypothetical. We now explicitly note in several places within the discussion which components of the model will require further validation and experimental confirmation. For example, regarding our theoretical mechanism in Figure 7 we state: *

      "Validation of the direct mechanism by which YAP/TAZ transcriptional changes drive basal actin changes in ARHGAP18 KO cells will require further investigation based on predictions from RNAseq results."

      • *

      Addressing any possible connection between key effects of ARHGAP18 KO (changes in actin, focal adhesion, integrins, Yap and merlin binding) could strengthen the manuscript. One such specific question is the whether the changes in integrin expression (RNAseq) are indeed connected to the actin alterations and reduction ion focal adhesions (Fig 1). Staining for these integrins to show they are indeed altered, and/or manipulating any of them to reproduce changes could provide and exciting addition.

      • *

      *We attempted to stain cells for Integrins by purchasing three separate antibodies. However, despite extensive optimization and careful selection of the specific integrins using our RNAseq results we were unable to get any of these antibodies to work in any cell type or condition. We believe that there is a technical challenge to staining for integrins due to their transmembrane and extracellular components, which we were unable to overcome. As an attempt to address the reviewers comment, we alternatively stained cells for paxillin which directly binds the cytoplasmic tails of integrins (Fig. 3&5). *

      Some of the experimental findings are not convincing or lack controls. Fig 1: some of the western blots are not convincing or poor quality. [...] On the same figure, the quality of LIM kinase blots is poor. [...] The signal is weak, and the blot does not appear to support the quantification. The last condition (expression of flag-ARHGAP18) results in a large drop in pLIMK and pcofilin on the blot, which is not reflected by the graph. Addition of *a better blot and the use of strong positive or negative control would boost confidence in these data. *

      • *

      In response to this and other reviewers' comments, we have added new western data and quantification to Figure 1. We now focus on MLC/pMLC data as we believe these data highlight the potential Rho-independent mechanism of ARHGAP18, and we were able to greatly improve the quality of the blots through careful optimization. We hope the reviewer finds these blots and quantifications (Fig. 1E and F) more convincing.

      *We note that phospho-specific Western blotting presents considerably greater technical challenges than conventional blotting. We believe that the appearance of an attractive looking blot does not always correlate to quality or reproducibility and have focused on taking extraordinarily careful steps in the blotting of our phospho-specific antibodies, which at times comes at the cost of the blot's attractiveness in appearance. For example, all phospho-specific antibodies are run using two color fluorescent markers to blot against both the total protein and the phospho-protein on the same blot. This approach often leads to blots that have reduced signal to noise compared to chemiluminescent Westerns. Additionally, we use phospho-specific blocking buffer reagents which do not contain phosphate-based buffers or agents that attract non-specific phospho-staining signals. These blocking buffers are not as effective as non-fat milk in pbs at blocking the background signal, however, they are ultimately cleaner for phospho-specific primary antibodies. We use carefully optimized protocols, from cell treatment to lysis, transfer, and antibody incubation, including methods developed by laboratories where the corresponding author of the manuscript was trained. Nonetheless, despite these efforts, we have now removed the LIMK and cofilin data because we deemed them unnecessary for the main conclusions of this manuscript and were unable to improve their quality to satisfy the reviewer. *

      The changes in pMLC on the western blots are very small, and for any conclusion, these studies require quantification. Further, the expression levels of Flag-ARHGAP18 needs to be shown to support the statement that the protein is expressed, and indeed overexpressed under these conditions (vs just re-expressed).

      In continuation of the above comment, we have made significant effort to improve the quality of our pMLC western blots and now provide quantification in Figure 1. We also now provide the Flag-ARHGAP18 signal as requested by the reviewer.

      Fig 4: the differences in YAP nuclear localization under the various conditions are not well visible. Quantitation of nuclear/cytosolic signal ratio should be provided. Please provide a rationale and more context for using serum starvation and re-addition. What is the expected effect? Serum removal and addition is referred to as nutrient removal and re-addition, but this is inaccurate, as it does not equal nutrient removal, since serum contains a variety of other important components, e.g. growth factors too.

      We have provided new quantification of the nuclear/cytosolic signal ratio in Figure 6D. We have explained our rational for the study through the following new text:

      "Merlin is activated and localized to junctions upon signaling, promoting growth and proliferation; among these signals is the availability of growth factors and other components of serum (Bretscher et al., 2002). We hypothesized that since ARHGAP18 formed a complex with Merlin that ARHGAP18's localization may localize to junctions under conditions which promote Merlin activation."

      • *

      We have altered our use of "nutrient removal" to "serum removal"

      The binding between ARHGAP18 and merlin is interesting, but a key limitation is the use of expressed proteins. Can the binding be shown for the endogenous proteins (IP, colocalization). Another important unaddressed question is the relevance of this binding, and the relation of this to altered YAP nuclear localization.

      • *

      *Our data in Fig. 6G shows binding of a resin bound human ARHGAP18 to endogenous YAP from human cells as suggested by the reviewer. In Fig. 6A, we have selected to use GFP-Merlin as Merlin shares approximately 60% sequence identity with Ezrin, Radixin, and Moesin (ERMs). Their similarity is such that Merlin was named for Moesin-Ezrin-Radixin-Like Protein. In our experience, nearly all Merlin or ERM antibodies have some cross-contaminating signal. Thus, a major concern is that if we were to blot for endogenous Merlin in the pull-down experiment, we may see a band that could in fact be ERMs. To avoid this, we tagged Merlin with GFP to ensure that the product pulled down by ARHGAP18 was Merlin, not an ERM. Regarding the ARHGAP18-resin bound column, our homemade ARHGAP18 antibody is polyclonal. We have extensive experience in pulldown assays and have found that the binding of a polyclonal antibody to the bait protein can produce less accurate results, as the binding site for the antibody is unknown and can sterically hinder attachment of target proteins like Merlin. In our experience, attachment to a flag-tag, which is expressed after a flexible linker at the N- or C-terminus, allows us to overcome this limitation, which we've used in this manuscript. *

      Minor comments:

      Introduction line 99: "When localized to the nucleus, YAP/TAZ promotes the activation of cytoskeletal transcription factors associated with cell proliferation and actin polymerization" Please clarify what you mean by this statement, that is inaccurate in its present for. Did you mean effects on transcription factors that control cytoskeletal proteins, or do you mean that Yap/Taz affect these proteins? Please also provide reference for this.

      We've altered the sentence as suggested by the reviewer, which now reads the following:

      "When localized to the nucleus, YAP/TAZ promotes transcriptional changes associated with cell proliferation and actin polymerization."

      • *

      *The full mechanism for how YAP/TAZ promotes proliferation and actin polymerization is a currently debated issue. We do not think introducing the various current proposed models is required for this manuscript, and we simply intend to convey that when in the nucleus, YAP/TAZ promotes transcriptional changes that drive actin polymerization and cell proliferation. *

      -What is the cell confluence in these experiments? For epithelial cells confluence affects actin structure. Please comment on similarity of confluency across experimental conditions?

      • *

      All cellular experiments are paired where WT and ARHGAP18 KO cells are plated at the same time under identical conditions. For imaging, we plate all cells onto glass coverslips in a 6 well dish so that each condition is literally in the same cell culture plate and gets identical treatment. In our prior Elife paper studying ARHGAP18, we characterized that ARHGAP18 KO cells and WT cells divide at a similar rate and have similar proliferation characteristics. The epithelial cell cultures are maintained for experiments around 70-80% confluency. For the focal adhesion staining experiments, the confluency is slightly lower, between 50-60% to capture the focal adhesions towards the leading edge. We have added the following new text to further describe these methods: "Cell cultures for experiments were maintained at 70%-80% confluency. For focal adhesion experiments, the cell cultures were maintained at 50%-60% confluency."

      -Fig 2 legend: please indicate that the protein detected was non-muscle myosin heavy chain (distinct from the light chain detected in Fig 1).

      • *

      We have altered original Figure 2 (new Figure 3) legend.

      -Line 339-340: please check the syntax of this sentence -Western blot quantification: the comparison of experiments with samples run on different gels/blots requires careful normalization and experimental consistency. Please describe how this was achieved.

      • *

      We have added the following new text to further describe these methods:

      "For blots which required quantification of antibodies that were only rabbit primaries (e.g., pMLC/MLC antibodies listed above), samples were loaded onto a single gel and transferred onto a single membrane at the same time. After transfer, the membrane was cut in half and subsequent steps were done in parallel. All quantified blots were checked for equal loading using either anti-tubulin as a housekeeping protein or total protein as detected by Coomassie staining"

      Reviewer #3 (Significance (Required)):

      Rho signalling is a central regulator of an array of normal and pathological cell functions, and our understanding of the context dependent regulation of this key pathway remains very incomplete. Therefore, new knowledge on the role of specific regulators, such as ARHGAP18, is of interest to a very broad range of researchers. A further exciting aspect of this protein, that despite indications by many studies that it acts as a GAP (inhibitor) for Rho proteins, there are findings in the literature that suggest that its manipulation can affect actin in unexpected (opposite) manner. These point to possible Rho-independent roles, and warranted further in-depth exploration.

      One of the strength of the study is that it explores possible roles of ARHGAP18 beyond RhoA and describes some new and interesting observations, which advance our knowledge. The authors use some excellent tools (e.g. ARHGAP KO cells and re-expression) and approaches (e.g. super resolution microscopy to analyze actin changes, RNAseq and bioinformatics to find genes that may be downstream from ARHGAP18). A key limitation of the study however, is that it is not clear whether the observed findings are indeed independent from RhoA. Further limitation is that potential causal relationships between the described findings are not studied, and therefore the findings are in some cases overinterpreted, and limited mechanistic insights are provided. In some cases the exclusive use of expressed proteins is also a limitation. Finally, some of the experiments also need improvement.

      Reviewer expertise: RhoA signalling, guanine nucleotide exchange factors, epithelial biology, cell migration, intercellular junctions.

      In the above comments, we detail the new experimental data addressing reviewer 3's listed key limitations. We've added new data using the Rho GAP deficient ARHGAP18(R365A) variant which allows for the direct characterization of ARHGAP18's Rho independent activity. We have introduced new data in WT cells studying endogenous proteins to address the limitations from expressed proteins. Finally, we have moderated our language to address overinterpretation. Collectively, we believe that our revised manuscript addresses the constructive reviewer's comments.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The study by Murray et al explores the effects of ARHGAP18 on the actin cytoskeleton, Rho effector kinases, non-muscle myosin, and transcription. Using super resolution microscopy, they show that in ARHGAP18 KO cells there is a mixed and unexpected cytoskeleton phenotype where myosin phosphorylation appears to be increased, but actin is disorganised with reduced stress fibres, diminished focal adhesions and augmented invasiveness. They conclude that the underlying mechanisms are likely independent from RhoA. Next, they perform RNAseq using the KO cells and identify an array of dysregulated genes, including those that play crucial roles in microvilli (related to previously published findings). Analysis of the data identify gene expression changes that are relevant for altered focal adhesion (integrins). Further analysis reveals that a large cohort of the dysregulated genes are YAP targets. They then show that in ARHGAP18 KO cells YAP nuclear localization, as detected by immunostaining, is augmented; and demonstrate that immobilized ARHGAP18 protein can bind the Hippo regulator merlin as well as YAP itself.

      Major comments:

      1. The premise of the study (that ARHGAP18 is a RhoA effector or may acts independently of RhoA) remains not proven. At several places (including in the title) the authors refer to ARHGAP18 as a Rho effector, which would suggest that it is downstream form Rho, but the basis for this is not clear. In fact, their own previous study suggested that ARHGAP is a RhoA regulator, rather than an effector. In general, the connection of the described effects to RhoA remains unclear, and not addressed in this study. The authors seem to go back and forth in their conclusions regarding the connection between ARHGAP18 and RhoA. For example, the first section of results is finished by stating (line 194): "These data support the conclusion that ARHGAP18 acts to regulate basal and junctional actin through Rho-independent mechanism". But the next section starts by stating (line 198): "We hypothesized that the invasive and cytoskeletal phenotypes observed at the basal surface of cells devoid of ARHGAP18 may be a result of changes in regulation at the transcriptional level either directly through RhoA signaling or through an additional mechanism specific to ARHGAP18". The paper would be strengthened by adding data that show whether the effects are indeed downstream, from RhoA or RhoA independent. If there is no sufficient demonstration that ARHGAP18 is downstream of RhoA and is an effector, this needs to be stated explicitly and the wording should be changed.
      2. The study is descriptive and contains a series of observations that are not connected. Because of this, the study's conclusions are not well supported, and key mechanistic insight is limited. The study feels like a set of separate observations, that remain incompletely worked out and have some preliminary feel to them. The model in the last figure also seems to contain hypotheses based on the observations, several of which remains to be proven. Addressing any possible connection between key effects of ARHGAP18 KO (changes in actin, focal adhesion, integrins, Yap and merlin binding) could strengthen the manuscript. One such specific question is the whether the changes in integrin expression (RNAseq) are indeed connected to the actin alterations and reduction ion focal adhesions (Fig 1). Staining for these integrins to show they are indeed altered, and/or manipulating any of them to reproduce changes could provide and exciting addition.
      3. Some of the experimental findings are not convincing or lack controls.

      Fig 1: some of the western blots are not convincing or poor quality. The changes in pMLC on the western blots are very small, and for any conclusion, these studies require quantification. Further, the expression levels of Flag-ARHGAP18 needs to be shown to support the statement that the protein is expressed, and indeed overexpressed under these conditions (vs just re-expressed). On the same figure, the quality of LIM kinase blots is poor. The signal is weak, and the blot does not appear to support the quantification. The last condition (expression of flag-ARHGAP18) results in a large drop in pLIMK and pcofilin on the blot, which is not reflected by the graph. Addition of a better blot and the use of a strong positive or negative control would boost confidence in these data.

      Fig 4: the differences in YAP nuclear localization under the various conditions are not well visible. Quantitation of nuclear/cytosolic signal ratio should be provided. 4. Please provide a rationale and more context for using serum starvation and re-addition. What is the expected effect? Serum removal and addition is referred to as nutrient removal and re-addition, but this is inaccurate, as it does not equal nutrient removal, since serum contains a variety of other important components, e.g. growth factors too. 5. The binding between ARHGAP18 and merlin is interesting, but a key limitation is the use of expressed proteins. Can the binding be shown for the endogenous proteins (IP, colocalization). Another important unaddressed question is the relevance of this binding, and the relation of this to altered YAP nuclear localization.

      Minor comments:

      • Introduction line 99: "When localized to the nucleus, YAP/TAZ promotes the activation of cytoskeletal transcription factors associated with cell proliferation and actin polymerization" Please clarify what you mean by this statement, that is inaccurate in its present for. Did you mean effects on transcription factors that control cytoskeletal proteins, or do you mean that Yap/Taz affect these proteins? Please also provide reference for this.
      • What is the cell confluence in these experiments? For epithelial cells confluence affects actin structure. Please comment on similarity of confluency across experimental conditions?
      • Fig 2 legend: please indicate that the protein detected was non-muscle myosin heavy chain (distinct from the light chain detected in Fig 1).
      • Line 339-340: please check the syntax of this sentence
      • Western blot quantification: the comparison of experiments with samples run on different gels/blots requires careful normalization and experimental consistency. Please describe how this was achieved.

      Significance

      Rho signalling is a central regulator of an array of normal and pathological cell functions, and our understanding of the context dependent regulation of this key pathway remains very incomplete. Therefore, new knowledge on the role of specific regulators, such as ARHGAP18, is of interest to a very broad range of researchers. A further exciting aspect of this protein, that despite indications by many studies that it acts as a GAP (inhibitor) for Rho proteins, there are findings in the literature that suggest that its manipulation can affect actin in unexpected (opposite) manner. These point to possible Rho-independent roles, and warranted further in-depth exploration. One of the strength of the study is that it explores possible roles of ARHGAP18 beyond RhoA and describes some new and interesting observations, which advance our knowledge. The authors use some excellent tools (e.g. ARHGAP KO cells and re-expression) and approaches (e.g. super resolution microscopy to analyze actin changes, RNAseq and bioinformatics to find genes that may be downstream from ARHGAP18). A key limitation of the study however, is that it is not clear whether the observed findings are indeed independent from RhoA. Further limitation is that potential causal relationships between the described findings are not studied, and therefore the findings are in some cases overinterpreted, and limited mechanistic insights are provided. In some cases the exclusive use of expressed proteins is also a limitation. Finally, some of the experiments also need improvement.<br /> Reviewer expertise: RhoA signalling, guanine nucleotide exchange factors, epithelial biology, cell migration, intercellular junctions.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This manuscript describes a dual mechanism by which ARHGAP18 regulates the actin cytoskeleton. The authors propose that in addition to the known role for ARHGAP18 in regulating Rho GTPases, it also affects the cytoskeleton through regulation of the Hippo pathway transcriptional regulator YAP. ARHGAP18 knockout Jeg3 cells are were generated and show a clear loss of basal stress fiber like F-actin bundles. The authors further characterize the effects of ARHGAP18 knockout and overexpression. It is also discovered that ARHGAP18 binds to the Hippo pathway regulator Merlin and to YAP. Ultimately it is concluded that ARHGAP18 regulates the F-actin cytoskeleton through dual regulation of RHO GTPases and of YAP. While the phenotype of the ARHGAP18 knockout and the association of ARHGAP18 with Merlin and YAP is interesting, I found the authors conclusion that these phenotypes are due to ARHGAP18 regulation of both RHO and YAP to be based on largely correlative evidence and sometimes lacking in controls or tests for significance. In addition the authors often make overly strong conclusions based on the experimental evidence. In some instances, the rationale for how the experimental results support the conclusion is insufficiently articulated, making evaluation challenging. In general although the authors have some interesting observations, more definitive experiments with proper controls and statistical tests for significance and reproducibility are needed to justify their overall conclusions.

      Specific Comments

      1) The authors make a big point about the effects of ARHGAP18 on myosin light chain phosphorylation. However this result is not quantified and tested for statistical significance and reproducibility.

      2) Along similar lines in Figure 2C they state that overexpression of ARHGAP18 causes cells to invade over the top of their neighbors. This might be true and interesting, but only a single cell is shown and there is no quantification or controls for simply overexpressing something in that cell. The authors also conclude from this image that the overexpression phenotype is independent of its GAP activity on Rho. It is not clear how this conclusion is made based on the data. It would seem like a more definitive experiment would be to see if a similar phenotype was induced by an ARHGAP18 mutant deficient in GAP activity.

      3) In Figure 3 the authors compare gene expression profiles of ARHGAP18 knockout cells to wild-type cells. They see lots of differences in focal adhesion and cytoskeletal proteins and conclude that this supports their conclusion that ARHGAP18 is not just acting through RHO. The rationale for this in not clear. In addition, they observe changes in expression profiles consistent with changes in YAP activity. They conclude that the effects are direct. This very well might be true. However RHO is a potent regulator of YAP activity and the results seem quite consistent with ARHGAP18 acting through RHO to affect YAP.

      4) In Figure 4A showing Merlin binding to ARHGAP18 there is no control for the amount of Merlin sticking to the column as was done in Figure 4F for binding experiments with YAP. This makes it difficult to determine the significance of the observed binding.

      5) The images in Figure 4C showing YAP being maintained in the nucleus more in ARHGAP18 knockout cells compared to wild-type. However the images only show a few cells and YAP localization can be highly variable depending on where you look in a field. Images with more cells and some sort of quantification would bolster this result.

      Significance

      While the phenotype of the ARHGAP18 knockout and the association of ARHGAP18 with Merlin and YAP is interesting, I found the authors conclusion that these phenotypes are due to ARHGAP18 regulation of both RHO and YAP to be based on largely correlative evidence and sometimes lacking in controls or tests for significance. In addition the authors often make overly strong conclusions based on the experimental evidence. In some instances, the rationale for how the experimental results support the conclusion is insufficiently articulated, making evaluation challenging. In general although the authors have some interesting observations, more definitive experiments with proper controls and statistical tests for significance and reproducibility are needed to justify their overall conclusions.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Dear editor and reviewers,

      We sincerely thank you for your thoughtful comments and constructive suggestions, which have greatly improved the quality and clarity of our manuscript. In response, we have implemented all requested changes, which are highlighted in yellow throughout the revised text, and updated several figures accordingly. Furthermore, we have performed all additional experiments recommended by the reviewers and incorporated the new data into the manuscript. To enhance clarity, we have also included a schematic representation of our proposed model in an additional figure, providing a concise visual summary of our findings.

      We hope that these revisions fully address all concerns raised by the reviewers and meet all the expectations for publication.

      Below, we answer the reviewers point by point (in blue).


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this paper, the authors address the important question of the role of centrosomes during neuronal development. They use Drosophila as an in vivo model. The field is somewhat unclear on the role and importance of centrosomes during neuronal development, although the current data would suggest they are dispensable for axon specification and growth. Early studies in cultured mammalian neurons showed that centrosomes are active and that their microtubules can be cut and transported into the neurites. But a study then showed that centrosomes in these cultured neurons are deactivated relatively early during neuronal development in vitro and that ablating centrosomes even when they are active had no obvious effect on axon specification and growth. Consistent with this, a study in Drosophila provided evidence that centrosomes were not active or necessary in different types of neurons. More recently, a study showed that centrosomal microtubules are dispensable for axon specification and growth in mice in vivo but are required for neuronal migration in the cerebral cortex. However, another study has linked the generation of acetylated microtubules at centrosomes with axon development. In this current study, the authors examine the effect of centrosome loss on various motor and sensory neurons and muscles mainly by examining mutants in essential centriole duplication genes. They associate axonal routing and morphology defects with centrosome loss and provide some evidence that centrosomes could still be active in the developing neurons. Overall, they conclude that centrosomes are active during at least early neuronal development and that this activity is important for proper axonal morphology and routing.

      While I think this study addressing a very interesting and important question, I think as it stands the data is not sufficient to be conclusive on a role for centrosomes during neuronal development. My biggest concern is that most phenotypes have not yet been shown to be cell autonomous, as whole animal mutants have been analysed rather than analysing the effect of cell-specific depletion, and the evidence for active centrosomes needs to be strengthened. If the authors can provide stronger evidence for a role of centrosomes in axonal development then the paper will certainly be of interest to a broad readership.

      We thank the reviewer for the clear and concise summary and fully agree that our study addresses a critical gap in understanding. Centrosomes have long been implicated in morphogenesis, yet their precise contribution to nervous system development has remained unclear. Our findings provide compelling evidence that centrosomes are indispensable for proper nervous system formation and that their absence also triggers muscular defects, highlighting their broader role in tissue organization.

      We acknowledge that the original manuscript lacked some key details; therefore, we have now strengthened our conclusions with additional experiments. Specifically, we demonstrate that these effects are cell-autonomous by using two independent RNAi lines targeted to a subset of motor neurons. Furthermore, we present new data showing that neuronal centrosomes remain active during the early stages of axonal development, emphasising their functional relevance in morphogenesis. All new experiments, figures, and corresponding text revisions are detailed below.

      Major comments 1) The sas-6 transallelic combination shows only 17% embryonic lethality compared to 50% embryonic lethality with sas-4 mutants. Given that both mutants should result in the same degree of centrosome loss (this should be quantified in sas-6 mutants) it would suggest that either sas-4 has other roles away from centrosomes or that the sas-4 mutant chromosome used in the experiment has other mutations that affect viability. The effect of picking up "second-site lethal" mutations on mutant chromosomes is common and so I would not be surprised if this is the reason for the difference in phenotypes. This can be addressed either by "cleaning up" the sas-4 mutant chromosome by backcrossing to wild-type lines, allowing recombination to occur and replace the potential second site mutations, or by using transallelic combinations of sas-4, as they did for sas-6. The "easier" option may just be to analyse all the phenotypes with the sas-6 transallelic combination.

      We appreciate this comment, as it brought to light an issue with the CRISPR line Sas-6-Δa. Upon reanalysing all the data, we determined that this line is embryonic lethal both in homozygosis and when combined with the deficiency uncovering the genomic region, Df(3R)BSC794. In contrast, Sas-6-Δb homozygotes are viable. The inconsistency between these results raised concerns about whether the Δa and Δb Sas-6 mutants carry deletions confined to the Sas-6 coding region. Although this would not hinder our cell biology analysis, it could represent a problem in viability tests. To address this, we repeated all analyses using Sas-6-Δb homozygotes and Sas-6-Δb combined with Df(3R)BSC794. These new results are more consistent and indicate that approximately 50% of Sas-6/Def individuals hatch as adults. Fig. 3 was redone and the manuscript text changed in view of these results.

      2) Using "whole animal" mutants for assessing neuronal morphology is risky due to non-cell-autonomous effects. The authors have carried out some phenotypic analysis of neurons depleted of Sas-4 by cell-specific RNAi, but I feel they need to do this for all of their analysis. This includes embryonic lethality measures, quantification of centrosome numbers, and all axonal phenotypes in Sas-4 RNAi neurons. It would also be prudent to use 2 distinct RNAi lines to help ensure any phenotypes are not off-target effects (and this may help clarify why the authors see some additional phenotypes with RNAi). Indeed, there are relatively weak phenotypes in muscles when using RNAi compared to the mutants and these potential non-cell-autonomous effects could then have a knock-on effect on neuronal morphology. If the authors were concerned that RNAi is not very efficient (explaining any potential weaker phenotypes than in mutants) the authors could examine the effectiveness of RNAi lines by analysing protein depletion by western blotting or mRNA depletion by rt-qPCR (although this has to be done in a different cell type due to the difficulty in obtaining a neuronal extract).

      We have now added a new panel to supplementary Figure 1, showing how the expression of a different Sas-4 RNAi line (2) induces similar nervous system phenotypes when expressed only in aCC, pCC and RP2 pioneer neurons (Sup. Fig. 1 M-O).

      3) When analysing centriole presence or absence it is a good idea to stain with two different centriole markers e.g. Asl and Plp. This helps rule out unspecific staining. It is clear from the images that similar sized foci can be observed outside of the cells (see Figure 5A for example), so clearly some of the foci that appear to be within the cells may also be unspecific staining.

      In a new supplementary figure, we now show that Asl and Plp colocalize and quantify the number of times we find this colocalization in neurons (Supl. Fig 3). In addition, and we apologise for the confusion, but the reason why there are foci outside the marked cells is because these are wholemount embryonic stainings and the anti-Plp antibody marks all centrosomes in all cells in the embryo.

      4) The evidence for active centrosomes is not that convincing. Acetylated tubulin is associated with stable MTs, which are not normally organised by "active" centrosomes that nucleate dynamic microtubules. Moreover, it is plausible that centriole foci happen to overlap with the acetylated tubulin staining by chance. This would explain why not all centrosomes colocalise with acetylated tubulin signal. The authors could better test centrosome activity by performing live imaging with EB1-GFP. If centrosomes are active, it is very easy to observe the many comets produced by the centrosomes.

      We appreciate the reviewer’s comment and agree that acetylated tubulin alone is not an ideal marker for centrosome activity. To address this, we performed live imaging of aCC neurons expressing EB1-GFP together with Asl-Tomato. This was technically challenging because we were imaging only two neurons per segment in live embryos, under significant limitations in fluorescence detection and timing. Despite these constraints, we were able to clearly observe EB1 comets emerging from the centrosome and moving toward the cell periphery, providing direct evidence of microtubule nucleation from centrosomes in neurons.

      Importantly, we complemented this with a microtubule depolymerization/polymerization assay, which provides unequivocal evidence that polymerization initiates at the centrosome. After depolymerization, we observed microtubule regrowth from the centrosome, confirming its role as an active microtubule-organizing centre in these neurons. Together, we hope that these results are enough to demonstrate that neuronal centrosomes are functionally active during early axonal development. These experiments are presented in Figure 6 and corresponding text in the manuscript.

      5) If the authors believe that centrosomes have a role in axon pathfinding in sensory neurons, they should show that these centrosomes are active, at least during early stages (again using EB1-GFP imaging).

      We appreciate the reviewer’s suggestion and agree that EB1-GFP imaging would be the most direct way to assess centrosome activity in sensory neurons. However, performing time-lapse imaging in these neurons is technically very demanding due to their location and accessibility in live embryos, and we did not attempt this approach. Instead, we now provide new evidence showing that sensory neuron centrosomes colocalize with both α-tubulin and γ-tubulin. This strongly supports that these centrosomes are associated with microtubule nucleation machinery and are as likely as motor neuron centrosomes to be active during early stages of axon development. These new data have been included in the revised manuscript (see Figure 5 and corresponding text).

      6) The authors mention in the discussion that "increased JNK activity, can result in axonal wiggliness (Karkali et al, 2023)". I therefore wonder whether centrosome loss may induce JNK activation (the stress response), as this would then indicate an indirect effect of centrosome loss on axonal structure rather than a direct influence of centrosome-generated microtubules. The authors could assess whether the DNK-JNK pathway is activated in neurons lacking centrosomes by expression UAS-Puc-GFP and quantifying the nuclear signal.

      In a new supplementary figure, we now show by using a reporter for JNK signalling, as requested, that Sas-4 neurons do not activate the JNK pathway (Supl. Fig 4).

      7) In Figure 5, the authors claim that they find "a correlation between axonal guidance phenotypes and the numbers of centrioles per embryo". I don't think this is a strong correlation. The difference in centriole number between embryos with no defects and those with defects is very small. In contrast, the difference between centriole numbers in control (no defects) and mutant (no defects) is very large. So, there does not appear to be a strong correlation between centrosome number and phenotype.

      We agree and we have corrected this sentence to better explain the results.

      Minor comments

      1) I don't understand Figure 3C - why do the % of surviving homozygotes and heterozygotes add up to 100%? Should the grey boxes not relate to dead and the white to surviving?

      Thank you for pointing this out. Figures 1B and 3C represent only the surviving individuals. The grey boxes correspond to surviving homozygotes, and the white boxes correspond to surviving heterozygotes. The percentages add up to 100% only at embryonic stages because all embryos reach late embryonic stages. The grey and white boxes reflect the proportion of these two genotypes among the survivors, not the total number of embryos including those that died. We have changed the text to convey this.

      2) "In mouse fibroblasts, myoblasts and endothelial cells, centrosome orientation is important for nuclear positioning and cell migration(Chang et al, 2015; Gomes et al, 2005; Kushner et al, 2014)." Do you mean "centrosome position"?

      Yes, text changed, thank you for spotting it.

      3) In the introduction, the authors mention Meka et al. when saying the centrosomal microtubules are important for axonal development, but they should also discuss the counter argument from Vinopal et al., 2023 (Neuron) that showed how centrosomes were required for neuronal migration but not axon growth, which was instead mediated by Golgi-derived microtubules.

      Done, thank you very much.

      4) Lines 228-230 - repeated sentence

      Corrected, thank you very much.

      5) Additionally, we did not detect centrioles in the quadrant opposite the axon exit point (Fig. 2B n=75) - this data is not in Fig 2B

      Correct, it is in figure 4B, thank you very much.

      6) "This significant decrease in the humber of centrioles further supports the critical role of Sas-4 in pioneer neurons of the ventral nerve cord (VNC) during Drosophila embryogenesis". It rather highlights that Sas-4 is required for centriole formation in these neurons. Also, humber = number.

      We agree, and have changed the text, thank you very much.

      7) Result title: Non-ciliated sensory neurons have centrioles. This is kind of obvious. A better title may be "axon phenotypes correlate with centriole numbers in sensory neurons" but unfortunately i don't think there is good evidence for this (See major point above).

      We agree and we have changed. We now believe we have strong evidence to support it. We hope the additional data presented in the revision convincingly demonstrate this point.

      Reviewer #1 (Significance (Required)):

      As mentioned above, the advance will be important if more evidence is provided. In this case, the paper will be interesting to a broad readership. But currently the paper is limited by the lack of evidence for centrosome function and activity in the neurons.

      We hope that reviewer 1, now considers that the manuscript is not limited anymore and that it shows convincing evidence for centrosome function and activity in embryonic neurons.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In this manuscript, Gonzalez et al. examine the potential function of centrosomes in the neurons and muscle cells of Drosophila embryos. By studying various mutant and RNAi lines in which centriole duplication has been disrupted, they conclude that the loss of centrioles disrupts axonal pathfinding and muscle integrity.

      Major points: 1. Throughout the manuscript, the phenotypes presented are often quite subtle. For this reason, I would really recommend that these experiments are scored blind. Perhaps the authors did this, but I didn't see any mention of this.

      All our phenotypic analyses are performed blind. We apologize for not having originally included this information in the Methods section; it has now been added. Embryos are stained using colorimetric methods (DAB) to label the nervous system, while balancer chromosomes are marked with a fluorescent antibody. This approach allows us to assess and quantify phenotypes using white light without knowing whether the embryos are homozygous mutants or heterozygous, which can only be detected by changing the channels to fluorescence.

      1. The authors conclude that neurons have active centrioles that function as centrosomes (Figure 6), but the data here is confusing. The authors state that in these cells they observe acetylated MTs extending from the centrosomes and these colocalised with g-tubulin. But the authors don't show the overlap between centrosomes, g-tubulin and MTs, as they stain for these separately. This is problematic, as it was not clear from these images that the majority of the MTs really are extending from the centrosome: the centrosome may just associate or be close by to these MT cables (Figure 6A,B). Moreover, the authors show that only a fraction of the centrosomes in these cells associate with g-tubulin, so presumably in cells where the centrosomes lack g-tubulin they would not expect the centrosomes to be associated with the MTs-but they do not show that this is the case. Perhaps the authors can't test this, but an alternative would be to show that these MT arrays are absent in Sas-4 mutants. This would give more confidence that these MTs arise from the centrosomes.

      We agree that the initial data based on acetylated microtubules and γ-tubulin colocalization were not sufficient to conclude that microtubules originate from the centrosome, as these markers can only suggest association. To address this, we have now included additional experiments that provide direct evidence of centrosome activity.

      First, we performed live imaging of aCC neurons expressing EB1-GFP together with Asl-Tomato. Despite the technical challenges of imaging only two neurons per segment in live embryos under strict fluorescence and timing constraints, we were able to clearly observe EB1 comets emerging from the centrosome and moving toward the cell periphery. This demonstrates active microtubule nucleation from centrosomes rather than mere proximity to microtubule bundles.

      Second, we carried out a microtubule depolymerization/polymerization assay, which provides unequivocal evidence that polymerization initiates at the centrosome. After depolymerization, microtubules regrew from the centrosome, confirming its role as an active microtubule-organizing center. These experiments go beyond colocalization and directly address the concern that centrosomes might simply be adjacent to microtubule cables.

      Regarding the suggestion to use Sas-4 mutants, while we did not perform this experiment, the regrowth assay combined with EB1 imaging strongly supports that these microtubules originate from the centrosome. All new data are presented in Figure 6 and the corresponding text in the revised manuscript.

      1. The authors show that muscle cell integrity is compromised by centriole-loss (Figure 2). This is very surprising as it is widely believed that centrosomes are non-functional in muscle cells, and the MTs are instead organised around the nuclear envelope. I'm not aware of the situation in Drosophila muscle cells, but the authors should ideally try to examine if the centrioles are functioning as centrosomes in these cells. At the very least they should discuss how they think centriole-loss is influencing the muscle integrity when it is widely believed they are inactive in these cells.

      We do not claim that centrosomes are active in muscle cells at these developmental stages. The observed muscle defects could result from earlier processes such as cell division, migration, or muscle fusion. We agree that this is an intriguing observation; however, pursuing this question further would go beyond the scope of the current manuscript. As requested by the reviewer, we have now expanded the discussion to consider how centriole loss might impact muscle integrity.

      Regardless of the strength of the supporting data, I think the authors should tone down their conclusions. The title and abstract led me to believe that centriole loss would cause significant problems in axonal pathfinding and muscle integrity. In all the mutant specimens examined (and certainly the low magnification views shown in Figure 1D'-F', Figure 1I'-K' and Figure 2D'-F') the mutants look very similar to the WT. Many readers may not get past the title and abstract, so the authors should make it clearer that these defects are very subtle.

      We have changed the text to convey this idea.

      Minor points: 1. In Figures 4 and 5, CP309 staining is relied on to identify centrioles, but there is quite a background of non-specific dots, making it hard to be certain what is a centriole and what isn't. For example, in Figure 5D' there are lots of dots within some of the cells - are any of these centrioles? How can the authors be certain which dot is a centriole in some of the cells shown in Figure 5C'? Is it possible to use a second marker and only count as centrioles dots that are recognised by both antibodies?

      We thank the reviewer for this suggestion and agree that using a second marker improves confidence in centriole identification. In a new supplementary figure (Supplementary Fig. 3), we now show that Asl and Plp colocalize in neurons and provide a quantification of the frequency of this colocalization. This dual labelling confirms the identity of centrioles and addresses the concern about non-specific background.

      We also apologize for any confusion regarding the presence of foci outside the marked cells. These images are whole-mount embryonic stainings, and the anti-Plp antibody labels all centrosomes in all cells of the embryo, which explains the additional foci observed.

      In the abstract that authors state that traditionally centrosomes have been considered to be non-essential in terminally differentiated cells. I don't think this is correct. In the standard "textbook" view of a cell, the centrosome is normally positioned in the centre of the cell organising an extensive array of MTs that are thought play an important role in organising intracellular transport, the positioning and movement of organelles and the maintenance and establishment of cell polarity. I don't think it is only recent evidence that suggests they play vital roles in terminally differentiated cells.

      We thank the reviewer for this correction and we have changed the text accordingly.

      1. Line 162 the authors state that in the RNAi knockdown lines they observe several additional phenotypes, but then in the same sentence (Line 164) they say that these defects were also observed in the original mutant and mutant/Df lines.

      We apologise for this confusion, we have rearranged the sentence for clearance.

      The sentences in Line281-287 don't reference any of the Figures, so it seems the authors are just stating these results without presenting any data (e.g. "Significantly, we also found a correlation between axonal guidance phenotypes and the numbers of centrioles per embryo". If they've tested this correlation, they should show it.

      We have rearranged the sentences for better understanding.

      In Figure 7 I did not understand how the authors measured tortuosity (wiggliness) and could see no description in the methods. This is important as, again the defect seems quite subtle, but perhaps I am not understanding which bits of the axon are being measures. Is it just the small bit of the axons close to the asterixis that is being measured, or the whole FasII track?

      We have now added another quantification and additional descriptions in the methods section.

      Reviewer #2 (Significance (Required)):

      The potential function of centrosomes in axonal outgrowth is quite controversial, so this study is potentially of considerable interest.

      However, several aspects of the data presented here were confusing or not terribly convincing. In its present state, I don't think the main conclusions are strongly enough supported by the data.

      We hope that reviewer 2, now considers that the manuscript is not confusing anymore and that it shows convincing evidence for centrosome function and activity in embryonic neurons.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript of González et al. entitled "Centriole Loss in Embryonic Development Disrupts Axonal Pathfinding and Muscle Integrity" deals with the role of centrosomes in shaping axonal morphology. To this aim the AA analysed Drosophila Sas-4 mutants that are reported to develop until adult stage without centrioles. Remarkably, the AA observe that 50% of the homozygous mutant embryos fail to hatch as larvae. The present observations suggest that centrosome loss results in axonemal shaping defects and muscle developmental abnormalities. Finally, the AA show the presence of functional centrosomes in neurons. In my opinion, the manuscript is interesting because shows unexpected findings. However, to justify these new findings the AA are required to improve some experimental observations.

      We thank the reviewer for his summary of our work and for considering it interesting. We have taken into account all the comments and believe that these have helped improve our manuscript.

      Major: Abstract- It is unclear in which phenotypic condition the observations of centrosome loss or centrosome presence have been found. Please better explain. l.36. embryos, larvae, adult, from Sas4 or controls? If mutants, the observations are very interesting since Sas4 would be without centrioles. Indeed, Basto et al., show that chemosensory neurons do not develop an axoneme in the absence of centrioles, but extend dendrites toward the sensory bristle.

      We have made clear which refer to wild-type and which are Centriole Loss (CL) conditions. CL conditions refer to mutant and downregulation conditions, whereas targeted downregulation refers to RNAi downregulation only in neurons.

      I do not think appropriate the use of "centriole" in the main title since the centrioles would be localized by true centriolar antigens rather than by centrosomal antigens. This problem occurs throughout the text and some figures where the AA image centrioles by centrosomal material. In Gig. 5A only the AA properly look at Asl localization. The other pictures of presumptive centrioles or centriole quantification report CP309 dots. This localization does not unequivocally reveal centrioles, since CP309 is essentially required for centrosome-mediated Mt nucleation. There are differentiated Drosophila tissues in which centrioles are present, but inactivated, and unable to recruit pericentriolar material. Mt are nucleated by ncMTOCs that contain centrosomal material and gamma-tubulin. Thus, the centrosomal antigens do not colocalize with centrioles.

      We have changed centrioles to centrosomes in the title and most sections in the manuscript. We have also included an extra control, showing that Asl and Plp colocalize and quantify the number of times we find this colocalization in neurons (Supl. Fig 3). Asl is a reliable and widely used marker for centrioles, as it localizes specifically to the centriole structure (Varmark H, Llamazares S, Rebollo E, Lange B, Reina J, Schwarz H, Gonzalez C. Asterless is a centriolar protein required for centrosome function and embryo development in Drosophila. Curr Biol. 2007 Oct 23;17(20):1735-45. doi: 10.1016/j.cub.2007.09.031. PMID: 17935995.)

      Minor: l. 58. The early arrest is mainly due to a checkpoint control. In double mutant for Sas4 and P53 the embryos survive longer, even if their further development is asrrested.

      We thank the reviewer for this comment, and we have changed the text accordingly.

      1. Previous works, also quoted by the AA, reported that in mature neurons the centrosome are inactivated, whereas the present manuscript describes functional centrosomes in Drosophila motor and peripheral nervous system. This is an intriguing observations that needs a better explanation in Discussion section.

      We thank the reviewer for this comment, and we have changed the discussion accordingly.

      l.143-145. I understand that 50% of the Sas4 embryos that reach the adult stage have centrioles. Is it correct? But if it is so, how the AA explain the absence of centrioles in sensory neurons of adult flies as reported by Basto et al. ?

      According to our results they have less centrioles than controls already at embryonic stages. In addition, as reported in Basto et al. they continue losing centrioles during larval stages and metamorphosis, which explains why centrioles are not detected at adult stages.

      l.215. It is unclear for me why the AA analyse Sas6 flies, unless explain the mutant phenotype.

      To strengthen our conclusions with Sas-4 and exclude the possibility that the observed phenotypes arise from a centrosome-independent function of Sas-4. For this reason, we have taken additional steps to confirm that the effects are specifically due to centrosome loss and we used Sas-6 mutants as one of these.

      l.221. How the centrioles have been quantified? What antibody, the AA used.

      We have quantified centrosomes using antibodies agains Plp (CP309) and Asl-YFP expression.

      l.244. and Fig 4C,D. I see high background with CP309. As reported previously I think better to use antibodies against centriolar proteins, such as Sas6, Ana1, Asl, or Sas4 ( if centrioles are present in 50% of mutants as the AA claim, the antibody could be also useful). In addition, I can see some CP309 spots in Fig 4E,F. Are they centrioles?

      Indeed, as we report, Sas-4 mutant embryos are not totally devoid of centrosomes. In addition, and we apologise for the confusion, but the reason why there are foci outside the marked cells in control embryos is because these are wholemount embryonic stainings and the anti-Plp antibody marks all centrosomes in all cells in the embryo, not just in the neurons.

      l.270 and Fig. 5A and Fig.5 C-E. Why the AA localize Cp309 and not Asl (Fig. 5A) to detect centrioles?

      In a new supplementary figure, we now show that Asl and Plp colocalize and quantify the number of times we find this colocalization in neurons (Supl. Fig 3). So, we can use CP309 in neurons to the same effect as Asl-

      L295-296. I cannot see Mts, but only a diffuse staining. I am expecting to see distinct Mt bundles.

      In figure 5 it is now easier to see the MT bundles in the new experiment in Fig. 5F-I , where we performed MT depolymerisation/repolymerisation: Nevertheless, we need to stress out that we are doing these analyses in wholemount embryonic stainings.

      326-327. How the AA explain this different lethality, even if both the proteins are involved in centriole assembly?

      We have now redone all the viability and mutant phenotype analysis using Sas-6 CRISPR mutant over the Deficiency, which is a better way to access the phenotype.

      335-337. In my opinion the quoted publications are not relevant.

      We believe that these references back up our hypothesis because:

      • Metzger et al 2012 stress the importance of nuclear position in muscle development in Drosophila
      • Loh et al 2023, relate centrosomes with nuclear migration in Drosophila
      • Tillery et al 2018, is a review describing MTs in muscle development in Drosophila.

      358-359. Does maternal contribution persist after gastrulation?

      While bulk degradation occurs by midblastula transition, some stable maternal products persist beyond gastrulation. In our case, if centrioles are formed due to the maternal contribution, they will only be diluted by cell division, which explains why we can detect centrioles at late embryonic stages.

      l.366. This is an intriguing point, but as previously observed I have some problem with centriole localization. References. Please uniform Journal abbreviations and control page numbers.

      I hope we have clarified this problem with the new experiments showing MT repolarization from the centrosomes in neurons.

      Reviewer #3 (Significance (Required)):

      The manuscript is potentially interesting for peoples working of cell and molecular biology, and development. However, the paper needs an additional working to be suitable for publication.

      We hope that reviewer 3, considers that the additional work and revision make this manuscript suitable for publication.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary: In this manuscript, Gonzalez et al. examine the potential function of centrosomes in the neurons and muscle cells of Drosophila embryos. By studying various mutant and RNAi lines in which centriole duplication has been disrupted, they conclude that the loss of centrioles disrupts axonal pathfinding and muscle integrity.

      Major points:

      1. Throughout the manuscript, the phenotypes presented are often quite subtle. For this reason, I would really recommend that these experiments are scored blind. Perhaps the authors did this, but I didn't see any mention of this.
      2. The authors conclude that neurons have active centrioles that function as centrosomes (Figure 6), but the data here is confusing. The authors state that in these cells they observe acetylated MTs extending from the centrosomes and these colocalised with g-tubulin. But the authors don't show the overlap between centrosomes, g-tubulin and MTs, as they stain for these separately. This is problematic, as it was not clear from these images that the majority of the MTs really are extending from the centrosome: the centrosome may just associate or be close by to these MT cables (Figure 6A,B). Moreover, the authors show that only a fraction of the centrosomes in these cells associate with g-tubulin, so presumably in cells where the centrosomes lack g-tubulin they would not expect the centrosomes to be associated with the MTs-but they do not show that this is the case. Perhaps the authors can't test this, but an alternative would be to show that these MT arrays are absent in Sas-4 mutants. This would give more confidence that these MTs arise from the centrosomes.
      3. The authors show that muscle cell integrity is compromised by centriole-loss (Figure 2). This is very surprising as it is widely believed that centrosomes are non-functional in muscle cells, and the MTs are instead organised around the nuclear envelope. I'm not aware of the situation in Drosophila muscle cells, but the authors should ideally try to examine if the centrioles are functioning as centrosomes in these cells. At the very least they should discuss how they think centriole-loss is influencing the muscle integrity when it is widely believed they are inactive in these cells.
      4. Regardless of the strength of the supporting data, I think the authors should tone down their conclusions. The title and abstract led me to believe that centriole loss would cause significant problems in axonal pathfinding and muscle integrity. In all the mutant specimens examined (and certainly the low magnification views shown in Figure 1D'-F', Figure 1I'-K' and Figure 2D'-F') the mutants look very similar to the WT. Many readers may not get past the title and abstract, so the authors should make it clearer that these defects are very subtle.

      Minor points:

      1. In Figures 4 and 5, CP309 staining is relied on to identify centrioles, but there is quite a background of non-specific dots, making it hard to be certain what is a centriole and what isn't. For example, in Figure 5D' there are lots of dots within some of the cells - are any of these centrioles? How can the authors be certain which dot is a centriole in some of the cells shown in Figure 5C'? Is it possible to use a second marker and only count as centrioles dots that are recognised by both antibodies?
      2. In the abstract that authors state that traditionally centrosomes have been considered to be non-essential in terminally differentiated cells. I don't think this is correct. In the standard "textbook" view of a cell, the centrosome is normally positioned in the centre of the cell organising an extensive array of MTs that are thought play an important role in organising intracellular transport, the positioning and movement of organelles and the maintenance and establishment of cell polarity. I don't think it is only recent evidence that suggests they play vital roles in terminally differentiated cells.
      3. Line 162 the authors state that in the RNAi knockdown lines they observe several additional phenotypes, but then in the same sentence (Line 164) they say that these defects were also observed in the original mutant and mutant/Df lines.
      4. The sentences in Line281-287 don't reference any of the Figures, so it seems the authors are just stating these results without presenting any data (e.g. "Significantly, we also found a correlation between axonal guidance phenotypes and the numbers of centrioles per embryo". If they've tested this correlation, they should show it.
      5. In Figure 7 I did not understand how the authors measured tortuosity (wiggliness) and could see no description in the methods. This is important as, again the defect seems quite subtle, but perhaps I am not understanding which bits of the axon are being measures. Is it just the small bit of the axons close to the asterixis that is being measured, or the whole FasII track?

      Significance

      The potential function of centrosomes in axonal outgrowth is quite controversial, so this study is potentially of considerable interest.

      However, several aspects of the data presented here were confusing or not terribly convincing. In its present state, I don't think the main conclusions are strongly enough supported by the data.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this paper, the authors address the important question of the role of centrosomes during neuronal development. They use Drosophila as an in vivo model. The field is somewhat unclear on the role and importance of centrosomes during neuronal development, although the current data would suggest they are dispensable for axon specification and growth. Early studies in cultured mammalian neurons showed that centrosomes are active and that their microtubules can be cut and transported into the neurites. But a study then showed that centrosomes in these cultured neurons are deactivated relatively early during neuronal development in vitro and that ablating centrosomes even when they are active had no obvious effect on axon specification and growth. Consistent with this, a study in Drosophila provided evidence that centrosomes were not active or necessary in different types of neurons. More recently, a study showed that centrosomal microtubules are dispensable for axon specification and growth in mice in vivo but are required for neuronal migration in the cerebral cortex. However, another study has linked the generation of acetylated microtubules at centrosomes with axon development. In this current study, the authors examine the effect of centrosome loss on various motor and sensory neurons and muscles mainly by examining mutants in essential centriole duplication genes. They associate axonal routing and morphology defects with centrosome loss and provide some evidence that centrosomes could still be active in the developing neurons. Overall, they conclude that centrosomes are active during at least early neuronal development and that this activity is important for proper axonal morphology and routing.

      While I think this study addressing a very interesting and important question, I think as it stands the data is not sufficient to be conclusive on a role for centrosomes during neuronal development. My biggest concern is that most phenotypes have not yet been shown to be cell autonomous, as whole animal mutants have been analysed rather than analysing the effect of cell-specific depletion, and the evidence for active centrosomes needs to be strengthened. If the authors can provide stronger evidence for a role of centrosomes in axonal development then the paper will certainly be of interest to a broad readership.

      Major comments

      1. The sas-6 transallelic combination shows only 17% embryonic lethality compared to 50% embryonic lethality with sas-4 mutants. Given that both mutants should result in the same degree of centrosome loss (this should be quantified in sas-6 mutants) it would suggest that either sas-4 has other roles away from centrosomes or that the sas-4 mutant chromosome used in the experiment has other mutations that affect viability. The effect of picking up "second-site lethal" mutations on mutant chromosomes is common and so I would not be surprised if this is the reason for the difference in phenotypes. This can be addressed either by "cleaning up" the sas-4 mutant chromosome by backcrossing to wild-type lines, allowing recombination to occur and replace the potential second site mutations, or by using transallelic combinations of sas-4, as they did for sas-6. The "easier" option may just be to analyse all the phenotypes with the sas-6 transallelic combination.
      2. Using "whole animal" mutants for assessing neuronal morphology is risky due to non-cell-autonomous effects. The authors have carried out some phenotypic analysis of neurons depleted of Sas-4 by cell-specific RNAi, but I feel they need to do this for all of their analysis. This includes embryonic lethality measures, quantification of centrosome numbers, and all axonal phenotypes in Sas-4 RNAi neurons. It would also be prudent to use 2 distinct RNAi lines to help ensure any phenotypes are not off-target effects (and this may help clarify why the authors see some additional phenotypes with RNAi). Indeed, there are relatively weak phenotypes in muscles when using RNAi compared to the mutants and these potential non-cell-autonomous effects could then have a knock-on effect on neuronal morphology. If the authors were concerned that RNAi is not very efficient (explaining any potential weaker phenotypes than in mutants) the authors could examine the effectiveness of RNAi lines by analysing protein depletion by western blotting or mRNA depletion by rt-qPCR (although this has to be done in a different cell type due to the difficulty in obtaining a neuronal extract).
      3. When analysing centriole presence or absence it is a good idea to stain with two different centriole markers e.g. Asl and Plp. This helps rule out unspecific staining. It is clear from the images that similar sized foci can be observed outside of the cells (see Figure 5A for example), so clearly some of the foci that appear to be within the cells may also be unspecific staining.
      4. The evidence for active centrosomes is not that convincing. Acetylated tubulin is associated with stable MTs, which are not normally organised by "active" centrosomes that nucleate dynamic microtubules. Moreover, it is plausible that centriole foci happen to overlap with the acetylated tubulin staining by chance. This would explain why not all centrosomes colocalise with acetylated tubulin signal. The authors could better test centrosome activity by performing live imaging with EB1-GFP. If centrosomes are active, it is very easy to observe the many comets produced by the centrosomes.
      5. If the authors believe that centrosomes have a role in axon pathfinding in sensory neurons, they should show that these centrosomes are active, at least during early stages (again using EB1-GFP imaging).
      6. The authors mention in the discussion that "increased JNK activity, can result in axonal wiggliness (Karkali et al, 2023)". I therefore wonder whether centrosome loss may induce JNK activation (the stress response), as this would then indicate an indirect effect of centrosome loss on axonal structure rather than a direct influence of centrosome-generated microtubules. The authors could assess whether the DNK-JNK pathway is activated in neurons lacking centrosomes by expression UAS-Puc-GFP and quantifying the nuclear signal.
      7. In Figure 5, the authors claim that they find "a correlation between axonal guidance phenotypes and the numbers of centrioles per embryo". I don't think this is a strong correlation. The difference in centriole number between embryos with no defects and those with defects is very small. In contrast, the difference between centriole numbers in control (no defects) and mutant (no defects) is very large. So, there does not appear to be a strong correlation between centrosome number and phenotype.

      Minor comments

      1. I don't understand Figure 3C - why do the % of surviving homozygotes and heterozygotes add up to 100%? Should the grey boxes not relate to dead and the white to surviving?
      2. "In mouse fibroblasts, myoblasts and endothelial cells, centrosome orientation is important for nuclear positioning and cell migration(Chang et al, 2015; Gomes et al, 2005; Kushner et al, 2014)." Do you mean "centrosome position"?
      3. In the introduction, the authors mention Meka et al. when saying the centrosomal microtubules are important for axonal development, but they should also discuss the counter argument from Vinopal et al., 2023 (Neuron) that showed how centrosomes were required for neuronal migration but not axon growth, which was instead mediated by Golgi-derived microtubules.
      4. Lines 228-230 - repeated sentence
      5. Additionally, we did not detect centrioles in the quadrant opposite the axon exit point (Fig. 2B n=75) - this data is not in Fig 2B
      6. "This significant decrease in the humber of centrioles further supports the critical role of Sas-4 in pioneer neurons of the ventral nerve cord (VNC) during Drosophila embryogenesis". It rather highlights that Sas-4 is required for centriole formation in these neurons. Also, humber = number.
      7. Result title: Non-ciliated sensory neurons have centrioles. This is kind of obvious. A better title may be "axon phenotypes correlate with centriole numbers in sensory neurons" but unfortunately i don't think there is good evidence for this (See major point above).

      Significance

      As mentioned above, the advance will be important if more evidence is provided. In this case, the paper will be interesting to a broad readership. But currently the paper is limited by the lack of evidence for centrosome function and activity in the neurons.

    1. Reviewer #1 (Public Review):

      Summary:

      Argunşah et al. describe and investigate the mechanisms underlying the differential response dynamics of barrel vs septa domains in the whisker-related primary somatosensory cortex (S1). Upon repeated stimulation, the authors report that the response ratio between multi- and single-whisker stimulation increases in layer (L) 4 neurons of the septal domain, while remaining constant in barrel L4 neurons. The authors attribute this divergence to differences in short-term synaptic plasticity, particularly within somatostatin-expressing (SST⁺) interneurons. This interpretation is supported by 1) the increased density of SST+ neurons in L4 of the septa compared to barrel domain, 2) the stronger response of (L2/3) SST+ neurons to repeated multi- vs single-whisker stimulation and 3) the reduced functional difference in single- versus multi-whisker response ratios across barrel and septal domains in Elfn1 KO mice, which lack a synaptic protein that confers characteristic short-term plasticity, notably in SST+ neurons. Consistently, a decoder trained on WT data fails to generalize to Elfn1 KO responses. Finally, the authors report a relative enrichment of S2- and M1-projecting cell densities in L4 of the septal domain compared to the barrel domain, suggesting that septal and barrel circuits may differentially route information about single vs multi-whisker stimulation downstream of S1.

      Strengths:

      This paper describes and aims to study a circuit underlying differential response between barrel columns and septal domains of the primary somatosensory cortex. This work supports the view these two domains contribute distinctly to the processing single versus multi-whisker inputs and highlight the role of SST+ neuron and their short-term plasticity. Together, this study suggests that the barrel cortex multiplexes whisker-derived sensory information across its domains, enabling parallel processing within S1.

      Weaknesses:

      Although the divergence in responses to repeated single- versus multi-whisker stimulation between barrel and septal domains is consistent with a role for SST⁺ neuron short-term plasticity, the evidence presented does not conclusively demonstrate that this mechanism is the critical driver of the difference. The lack of targeted recordings and manipulations limits the strength of this conclusion: SST⁺ neuron activity is not measured in L4, nor is it assessed in a domain-specific manner. The Elfn1 knockout manipulation does not appear to selectively affect either stimulus condition, domain or interneuron subtype. Finally, all experiments were performed under anesthesia, which raises concerns about how well the reported dynamics generalize to awake cortical processing.

    2. Reviewer #2 (Public review):

      Summary:

      Argunsah and colleagues demonstrate that SST expressing interneurons are concentrated in the mouse septa and differentially respond to repetitive multi-whisker inputs. Identifying how a specific neuronal phenotype impacts responses is an advance.

      Strengths:

      (1) Careful physiological and imaging studies.

      (2) Novel result showing the role of SST+ neurons in shaping responses.

      (3) Good use of a knockout animal to further the main hypothesis.

      (4) Clear analytical techniques.

      Comments on revisions:

      The authors have effectively responded to my initial critiques - I have no further concerns.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Reviews):

      Summary:

      Argunşah et al. describe and investigate the mechanisms underlying the differential response dynamics of barrel vs septa domains of the whisker-related primary somatosensory cortex (S1). Upon repeated stimulation, the authors report that the response ratio between multi- and single-whisker stimulation increases in layer (L) 4 neurons of the septal domain, while remaining constant in barrel L4 neurons. This difference is attributed to the short-term plasticity properties of interneurons, particularly somatostatin-expressing (SST+) neurons. This claim is supported by the increased density of SST+ neurons found in L4 of the septa compared to barrels, along with a stronger response of (L2/3) SST+ neurons to repeated multi- vs single-whisker stimulation. The role of the synaptic protein Elfn1 is then examined. Elfn1 KO mice exhibited little to no functional domain separation between barrel and septa, with no significant difference in single- versus multi-whisker response ratios across barrel and septal domains. Consistently, a decoder trained on WT data fails to generalize to Elfn1 KO responses. Finally, the authors report a relative enrichment of S2- and M1-projecting cell densities in L4 of the septal domain compared to the barrel domain.

      Strengths:

      This paper describes and aims to study a circuit underlying differential response between barrel columns and septal domains of the primary somatosensory cortex. This work supports the view that barrel and septal domains contribute differently to processing single versus multi-whisker inputs, suggesting that the barrel cortex multiplexes sensory information coming from the whiskers in different domains.

      We thank the reviewer for the very neat summary of our findings that barrel cortex multiplexes converging information in separate domains.

      Weaknesses:

      While the observed divergence in responses to repeated SWS vs MWS between the barrel and septal domains is intriguing, the presented evidence falls short of demonstrating that short-term plasticity in SST+ neurons critically underpins this difference. The absence of a mechanistic explanation for this observation limits the work’s significance. The measurement of SST neurons’ response is not specific to a particular domain, and the Elfn1 manipulation does not seem to be specific to either stimulus type or a particular domain.

      We appreciate the reviewer’s perspective. Although further research is needed to understand the circuit mechanisms underlying the observed phenomenon, we believe our data suggest that altering the short-term dynamics of excitatory inputs onto SST neurons reduces the divergent spiking dynamics in barrels versus septa during repetitive single- and multi-whisker stimulation. Future work could examine how SST neurons, whose somata reside in barrels and septa, respond to different whisker stimuli and the circuits in which they are embedded. At this time, however, the authors believe there is no alternative way to test how the short-term dynamics of excitatory inputs onto SST neurons, as a whole, contribute to the temporal aspects of barrel versus septa spiking.

      The study's reach is further constrained by the fact that results were obtained in anesthetized animals, which may not generalize to awake states.

      We appreciate the reviewer’s concern regarding the generalizability of our findings from anesthetized animals to awake states. Anesthesia was employed to ensure precise individual whisker stimulation (and multi-whisker in the same animal), which is challenging in awake rodents due to active whisking. While anesthesia may alter higher-order processing, core mechanisms, such as short and long term plasticity in the barrel cortex, are preserved under anesthesia (Martin-Cortecero et al., 2014; Mégevand et al., 2009).

      The statistical analysis appears inappropriate, with the use of repeated independent tests, dramatically boosting the false positive error rate.

      Thank you for your feedback on our analysis using independent rank-based tests for each time point in wild-type (WT) animals. To address concerns regarding multiple comparisons and temporal dependencies (for Figure 1F and 4D for now but we will add more in our revision), we performed a repeated measures ANOVA for WT animals (13 Barrel, 8 Septa, 20 time points), which revealed a significant main effect of Condition (F(1,19) = 16.33, p < 0.001) and a significant Condition-Time interaction (F(19,361) = 2.37, p = 0.001). Post-hoc tests confirmed significant differences between Barrel and Septa at multiple time points (e.g., p < 0.0025 at times 3, 4, 6, 7, 8, 10, 11, 12, 16, 19 after Bonferroni posthoc correction), supporting a differential multi-whisker vs. single-whisker ratio response in WT animals. In contrast, a repeated measures ANOVA for knock-out (KO) animals (11 Barrel, 7 Septa, 20 time points) showed no significant main effect of Condition (F(1,14) = 0.17, p = 0.684) or Condition-Time interaction (F(19,266) = 0.73, p = 0.791), indicating that the BarrelSepta difference observed in WT animals is absent in KO animals.

      Furthermore, the manuscript suffers from imprecision; its conclusions are occasionally vague or overstated. The authors suggest a role for SST+ neurons in the observed divergence in SWS/MWS responses between barrel and septal domains. However, this remains speculative, and some findings appear inconsistent. For instance, the increased response of SST+ neurons to MWS versus SWS is not confined to a specific domain. Why, then, would preferential recruitment of SST+ neurons lead to divergent dynamics between barrel and septal regions? The higher density of SST+ neurons in septal versus barrel L4 is not a sufficient explanation, particularly since the SWS/MWS response divergence is also observed in layers 2/3, where no difference in SST+ neuron density is found.

      Moreover, SST+ neuron-mediated inhibition is not necessarily restricted to the layer in which the cell body resides. It remains unclear through which differential microcircuits (barrel vs septum) the enhanced recruitment of SST+ neurons could account for the divergent responses to repeated SWS versus MWS stimulation.

      We fully appreciate the reviewer’s comment. We currently do not provide any evidence on the contribution of SST neurons in the barrels versus septa in layer 4 on the response divergence of spiking observed in SWS versus MWS. We only show that these neurons differentially distribute in the two domains in this layer. It is certainly known that there is molecular and circuit-based diversity of SST-positive neurons in different layers of the cortex, so it is plausible that this includes cells located in the two domains of vS1, something which has not been examined so far. Our data on their distribution are one piece of information that SST neurons may have a differential role in inhibiting barrel stellate cells versus septa ones. Morphological reconstructions of SST neurons in L4 of the somatosensory barrel cortex has shown that their dendrites and axons project locally and may confine to individual domains, even though not specifically examined (Fig. 3 of Scala F et al., 2019). The same study also showed that L4 SST cells receive excitatory input from local stellate cells) and is known that they are also directly excited by thalamocortical fibers (Beierlein et al., 2003; Tan et al., 2008), both of which facilitate.

      As shown in our supplementary figure, the divergence is also observed in L2/3 where, as the reviewer also points out, where we do not have a differential distribution of SST cells, at least based on a columnar analysis extending from L4. There are multiple scenarios that could explain this “discrepancy” that one would need to examine further in future studies. One straightforward one is that the divergence in spiking in L2/3 domains may be inherited from L4 domains, where L4 SST act on. Another is that even though L2/3 SST neurons are not biased in their distribution their input-output function is, something which one would need to examine by detailed in vitro electrophysiological and perhaps optogenetic approaches in S1. Despite the distinctive differences that have been found between the L4 circuitry in S1 and V1 (Scala F et al., 2019), recent observations indicate that small but regular patches of V1 marked by the absence of muscarinic receptor 2 (M2) have high temporal acuity (Ji et al., 2015), and selectively receive input from SST interneurons (Meier et al., 2025). Regions lacking M2 have distinct input and output connectivity patterns from those that express M2 (Meier et al., 2021; Burkhalter et al., 2023). These findings, together with ours, suggest that SST cells preferentially innervate and regulate specific domains columns- in sensory cortices.

      Regardless of the mechanism, the Elfn1 knock-out mouse line almost exclusively affects the incoming excitability onto SST neurons (see also reply to comment below), hence what can be supported by our data is that changing the incoming short-term synaptic plasticity onto these neurons brings the spiking dynamics between barrels and septa closer together.

      The Elfn1 KO mouse model seems too unspecific to suggest the role of the short-term plasticity in SST+ neurons in the differential response to repeated SWS vs MWS stimulation across domains. Why would Elfn1-dependent short-term plasticity in SST+ neurons be specific to a pathway, or a stimulation type (SWS vs MWS)? Moreover, the authors report that Elfn1 knockout alters synapses onto VIP+ as well as SST+ neurons (Stachniak et al., 2021; previous version of this paper)-so why attribute the phenotype solely to SST+ circuitry? In fact, the functional distinctions between barrel and septal domains appear largely abolished in the Elfn1 KO.

      Previous work by others and us has shown that globally removing Elfn1 selectively removes a synaptic process from the brain without altering brain anatomy or structure. This allows us to study how the temporal dynamics of inhibition shape activity, as opposed to inhibition from particular cell types. We will nevertheless update the text to discuss more global implications for SST interneuron dynamics and include a reference to VIP interneurons that contain Elfn1.

      When comparing SWS to MWS, we find that MWS replaces the neighboring excitation which would normally be preferentially removed by short-term plasticity in SST interneurons, thus providing a stable control comparison across animals and genotypes. On average, VIP interneurons failed to show modulation by MWS. We were unable to measure a substantial contribution of VIP cells to this process and also note that the Elfn1 expressing multipolar neurons comprise only ~5% of VIP neurons (Connor and Peters, 1984; Stachniak et al., 2021), a fraction that may be lost when averaging from 138 VIP cells. Moreover, the effect of Elfn1 loss on VIP neurons is quite different and marginal compared to that of SST cells, suggesting that the primary impact of Elfn1 knockout is mediated through SST+ interneuron circuitry. Therefore, even if we cannot rule out that these 5% of VIP neurons contribute to barrel domain segregation, we are of the opinion that their influence would be very limited if any.

      Reviewer #2 (Public Reviews):

      Summary:

      Argunsah and colleagues demonstrate that SST-expressing interneurons are concentrated in the mouse septa and differentially respond to repetitive multi-whisker inputs. Identifying how a specific neuronal phenotype impacts responses is an advance.

      Strengths:

      (1)  Careful physiological and imaging studies.

      (2)  Novel result showing the role of SST+ neurons in shaping responses.

      (3)  Good use of a knockout animal to further the main hypothesis.

      (4)  Clear analytical techniques.

      We thank the reviewer for their appreciation of the study.

      Weaknesses:

      No major weaknesses were identified by this reviewer. Overall, I appreciated the paper but feel it overlooked a few issues and had some recommendations on how additional clarifications could strengthen the paper. These include:

      (1) Significant work from Jerry Chen on how S1 neurons that project to M1 versus S2 respond in a variety of behavioral tasks should be included (e.g. PMID: 26098757). Similarly, work from Barry Connor’s lab on intracortical versus thalamocortical inputs to SST neurons, as well as excitatory inputs onto these neurons (e.g. PMID: 12815025) should be included.

      We thank the reviewer for these valuable resources that we overlooked. We will include Chen et al. (2015), Cruikshank et al. (2007) and Gibson et al. (1999) to contextualize S1 projections and SST+ inputs, strengthening the study’s foundation as well as Beierlein et al. (2003) which nicely show both local and thalamocortical facilitation of excitatory inputs onto L4 SST neurons, in contrast to PV cells. The paper also shows the gradual recruitment of SST neurons by thalamocortical inputs to provide feed-forward inhibition onto stellate cells (regular spiking) of the barrel cortex L4 in rat.

      (2) Using Layer 2/3 as a proxy to what is happening in layer 4 (~line 234). Given that layer 2/3 cells integrate information from multiple barrels, as well as receiving direct VPm thalamocortical input, and given the time window that is being looked at can receive input from other cortical locations, it is not clear that layer 2/3 is a proxy for what is happening in layer 4.

      We agree with the reviewer that what we observe in L2/3 is not necessarily what is taking place in L4 SST-positive cells. The data on L2/3 was included to show that these cells, as a population, can show divergent responses when it comes to SWS vs MWS, which is not seen in L2/3 VIP neurons. Regardless of the mechanisms underlying it, our overall data support that SST-positive neurons can change their activation based on the type of whisker stimulus and when the excitatory input dynamics onto these neurons change due to the removal of Elfn1 the recruitment of barrels vs septa spiking changes at the temporal domain. Having said that, the data shown in Supplementary Figure 3 on the response properties of L2/3 neurons above the septa vs above the barrels (one would say in the respective columns) do show the same divergence as in L4. This suggests that a circuit motif may exist that is common to both layers, involving SST neurons that sit in L4, L5 or even L2/3. This implies that despite the differences in the distribution of SST neurons in septa vs barrels of L4 there is an unidentified input-output spatial connectivity motif that engages in both L2/3 and L4. Please also see our response to a similar point raised by reviewer 1.

      (3) Line 267, when discussing distinct temporal response, it is not well defined what this is referring to. Are the neurons no longer showing peaks to whisker stimulation, or are the responses lasting a longer time? It is unclear why PV+ interneurons which may not be impacted by the Elfn1 KO and receive strong thalamocortical inputs, are not constraining activity.

      We thank the reviewer for their comment and will clarify the statement.

      This convergence of response profiles was further clear in stimulus-aligned stacked images, where the emergent differences between barrels and septa under SWS were largely abolished in the KO (Figure 4B). A distinction between directly stimulated barrels and neighboring barrels persisted in the KO. In addition, the initial response continued to differ between barrel and septa and also septa and neighbor (Figure 4B). This initial stimulus selectivity potentially represents distinct feedforward thalamocortical activity, which includes PV+ interneuron recruitment that is not directly impacted by the Elfn1 KO (Sun et al., 2006; Tan et al., 2008). PV+ cells are strongly excited by thalamocortical inputs, but these exhibit short-term depression, as does their output, contrasting with the sustained facilitation observed in SST+ neurons. These findings suggest that in WT animals, activity spillover from principal barrels is normally constrained by the progressive engagement of SST+ interneurons in septal regions, driven by Elfn1-dependent facilitation at their excitatory synapses. In the absence of Elfn1, this local inhibitory mechanism is disrupted, leading to longer responses in barrels, delayed but stronger responses in septa, and persistently stronger responses in unstimulated neighbors, resulting in a loss of distinction between the responses of barrel and septa domains that normally diverge over time (see Author response image 1 below).

      Author response image 1.

      (A) Barrel responses are longer following whisker stimulation in KO. (B) Septal responses are slightly delayed but stronger in KO. (C) Unstimulated neighbors show longer persistent responses in KO.

       

      (4) Line 585 “the earliest CSD sink was identified as layer 4…” were post-hoc measurements made to determine where the different shank leads were based on the post-hoc histology?

      Post hoc histology was performed on plane-aligned brain sections which would allow us to detect barrels and septa, so as to confirm the insertion domains of each recorded shank. Layer specificity of each electrode therefore could therefore not be confirmed by histology as we did not have coronal sections in which to measure electrode depth.

      (5) For the retrograde tracing studies, how were the M1 and S2 injections targeted (stereotaxically or physiologically)? How was it determined that the injections were in the whisker region (or not)?

      During the retrograde virus injection, the location of M1 and S2 injections was determined by stereotaxic coordinates (Yamashita et al., 2018). After acquiring the light-sheet images, we were able to post hoc examine the injection site in 3D and confirm that the injections were successful in targeting the regions intended. Although it would have been informative to do so, we did not functionally determine the whisker-related M1 and whisker-related S2 region in this experiment.

      (6) Were there any baseline differences in spontaneous activity in the septa versus barrel regions, and did this change in the KO animals?

      Thank you for this interesting question. Our previous study found that there was a reduction in baseline activity in L4 barrel cortex of KO animals at postnatal day (P)12, but no differences were found at P21 (Stachniak et al., 2023).

      Reviewer #3 (Public Reviews):

      Summary:

      This study investigates the functional differences between barrel and septal columns in the mouse somatosensory cortex, focusing on how local inhibitory dynamics, particularly involving Elfn1-expressing SST⁺ interneurons, may mediate temporal integration of multiwhisker (MW) stimuli in septa. Using a combination of in vivo multi-unit recordings, calcium imaging, and anatomical tracing, the authors propose that septa integrate MW input in an Elfn1-dependent manner, enabling functional segregation from barrel columns.

      Strengths:

      The core hypothesis is interesting and potentially impactful. While barrels have been extensively characterized, septa remain less understood, especially in mice, and this study's focus on septal integration of MW stimuli offers valuable insights into this underexplored area. If septa indeed act as selective integrators of distributed sensory input, this would add a novel computational role to cortical microcircuits beyond what is currently attributed to barrels alone. The narrative of this paper is intellectually stimulating.

      We thank the reviewer for finding the study intellectually stimulating.

      Weaknesses:

      The methods used in the current study lack the spatial and cellular resolution needed to conclusively support the central claims. The main physiological findings are based on unsorted multi-unit activity (MUA) recorded via low-channel-count silicon probes. MUA inherently pools signals from multiple neurons across different distances and cell types, making it difficult to assign activity to specific columns (barrel vs. septa) or neuron classes (e.g., SST⁺ vs. excitatory).

      The recording radius (~50-100 µm or more) and the narrow width of septa (~50-100 µm or less) make it likely that MUA from "septal" electrodes includes spikes from adjacent barrel neurons.

      The authors do not provide spike sorting, unit isolation, or anatomical validation that would strengthen spatial attribution. Calcium imaging is restricted to SST⁺ and VIP⁺ interneurons in superficial layers (L2/3), while the main MUA recordings are from layer 4, creating a mismatch in laminar relevance.

      We thank the reviewer for pointing out the possibility of contamination in septal electrodes. Importantly, it may not have been highlighted, although reported in the methods, but we used an extremely high threshold (7.5 std, in methods, line 583) for spike detection in order to overcome the issue raised here, which restricts such spatial contaminations. Since the spike amplitude decays rapidly with distance, at high thresholds, only nearby neurons contribute to our analysis, potentially one or two. We believe that this approach provides a very close approximation of single unit activity (SUA) in our reported data. We will include a sentence earlier in the manuscript to make this explicit and prevent further confusion.

      Regarding the point on calcium imaging being performed on L2/3 SST and VIP cells instead of L4. Both reviewer 1 and 2 brought up the same issue and we responded as follows. As shown in our supplementary figure, the divergence is also observed in L2/3 where we do not have a differential distribution of SST cells, at least based on a columnar analysis extending from L4. There are multiple scenarios that could explain this “discrepancy” that one would need to examine further in future studies. One straightforward one is that the divergence in spiking in L2/3 domains may be inherited from L4 domains, where L4 SST act on. Another is that even though L2/3 SST neurons are not biased in their distribution their input-output function is, something which one would need to examine by detailed in vitro electrophysiological and perhaps optogenetic approaches in S1. Despite the distinctive differences that have been found between the L4 circuitry in S1 and V1 (Scala F et al., 2019), recent observations indicate that small but regular patches of V1 marked by the absence of muscarinic receptor 2 (M2) have high temporal acuity (Ji et al., 2015), and selectively receive input from SST interneurons (Meier et al., 2025). Regions lacking M2 have distinct input and output connectivity patterns from those that express M2 (Meier et al., 2021; Burkhalter et al., 2023). These findings, together with ours, suggest that SST cells preferentially innervate and regulate specific domains -columns- in sensory cortices.

      Furthermore, while the role of Elfn1 in mediating short-term facilitation is supported by prior studies, no new evidence is presented in this paper to confirm that this synaptic mechanism is indeed disrupted in the knockout mice used here.

      We thank Reviewer #3 for noting the absence of new evidence confirming Elfn1’s disruption of short-term facilitation in our knockout mice. We acknowledge that our study relies on previously strong published data demonstrating that Elfn1 mediates short-term synaptic facilitation of excitatory inputs onto SST+ interneurons (Sylwestrak and Ghosh, 2012; Tomioka et al., 2014; Stachniak et al., 2019, 2023). These studies consistently show that Elfn1 knockout abolishes facilitation in SST+ synapses, leading to altered temporal dynamics, which we hypothesize underlies the observed loss of barrel-septa response divergence in our Elfn1 KO mice (Figure 4). Nevertheless, to address the point raised, we will clarify in the revised manuscript (around lines 245-247 and 271-272) that our conclusions are based on these established findings, stating: “Building on prior evidence that Elfn1 knockout disrupts short-term facilitation in SST+ interneurons (Sylwestrak and Ghosh, 2012; Tomioka et al., 2014; Stachniak et al., 2019, 2023), we attribute the abolished barrel-septa divergence in Elfn1 KO mice to altered SST+ synaptic dynamics, though direct synaptic measurements were not performed here.”

      Additionally, since Elfn1 is constitutively knocked out from development, the possibility of altered circuit formation-including changes in barrel structure and interneuron distribution, cannot be excluded and is not addressed.

      We thank Reviewer #3 for raising the valid concern that constitutive Elfn1 knockout could potentially alter circuit formation, including barrel structure and interneuron distribution. To address this, we will clarify in the revised manuscript (around line ~271 and in the Discussion) that in our previous studies that included both whole-cell patch-clamp in acute brain slices ranging from postnatal day 11 to 22 (P11 - P21) and in vivo recordings from barrel cortex at P12 and P21, we saw no gross abnormalities in barrel structure, with Layer 4 barrels maintaining their characteristic size and organization, consistent with wildtype (WT) mice (Stachniak et al., 2019, 2023). While we cannot fully exclude subtle developmental changes, prior studies indicate that Elfn1 primarily modulates synaptic function rather than cortical cytoarchitecture (Tomioka et al., 2014). Elfn1 KO mice show no gross morphological or connectivity differences and the pattern and abundance of Elfn1 expressing cells (assessed by LacZ knock in) appears normal (Dolan and Mitchell, 2013).

      We will add the following to the Discussion: “Although Elfn1 is constitutively knocked out, we find here and in previous studies that barrel structure is preserved (Stachniak et al., 2019, 2023). Further, the distribution of Elfn1 expressing interneurons is not different in KO mice, suggesting minimal developmental disruption (Dolan and Mitchell, 2013).

      Nonetheless, we acknowledge that subtle circuit changes cannot be ruled out without the usage of time-depended conditional knockout of the gene.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) My biggest concern is regarding statistics. Did the authors repeatedly apply independent tests (Mann-Whitney) without any correction for multiple comparisons (Figures 1 and 4)? In that case, the chances of a spurious "significant" result rise dramatically. 

      In response to the reviewer’s comment, we now present new statistical results by utilizing ANOVA and blended these results in the manuscript between lines 172 and 192 for WT data and 282 and 298 for Elfn1 KO data. This new statistical approach shows the same differences as we had previously reported, hence consolidating the statements made. 

      (2) The findings only hint at a mechanism involving SST+ neurons for how SWS and MWS are processed differently in the barrel vs septal domains. As a direct test of SST+ neuron involvement in the divergence of barrel and septal responses, the authors might consider SST-specific manipulations - for example, inhibitory chemo- or optogenetics during SWS and MWS stimulation.

      We thank the reviewer for this comment and agree that a direct manipulation of SST+ neurons via inhibitory chemo- or opto-genetics could provide further supporting evidence for the main claims in our study. We have opted out from performing these experiments for this manuscript as we feel they can be part of a future study.  At the same time, it is conceivable that such manipulations and depending on how they are performed may lead to larger and non-specific effects on cortical activity, since SST neurons will likely be completely shut down. So even though we certainly appreciate and value the strengths of such approaches, our experiments have addressed a more nuanced hypothesis, namely that the synaptic dynamics onto SST+ neurons matter for response divergence of septa versus barrels, which could not have been easily and concretely addressed by manipulating SST+ cell firing activity.  

      (3) In general, it is hard to comprehend what microcircuit could lead to the observed divergence in the MWS/SWS ratio in the barrel vs septal domain. There preferential recruitment of SST+ neurons during MWS is not specific to a particular domain, and the higher density of SST+ neurons specifically in L4 septa cannot per se explain the diverging MWS/SWS ratio in L4 septal neurons since similar ratio divergence is observed across domains in L2/3 neurons without increase SST+ neuron density in L2/3. This view would also assume that SST+ inhibition remains contained to its own layer and domain. Is this the case? Is it that different microcircuits between barrels and septa differently shape the response to repeated MWS? This is partially discussed in the paper; can the authors develop on that? What would the proposed mechanism be? Can the short-term plasticity of the thalamic inputs (VPM vs POm) be part of the picture?

      We thank the reviewer for raising this important point. We propose that the divergence in MWS/SWS ratios across barrel and septal domains arises from dynamic microcircuit interactions rather than static anatomical features such as SST+ density, which we describe and can provide a hint. In L2/3, where SST+ density is uniform, divergence persists, suggesting that trans-laminar and trans-domain interactions are key. Barrel domains, primarily receiving VPM inputs, exhibit short-term depression onto excitatory cells and engage PV+ and SST+ neurons to stabilize the MWS/SWS ratio, with Elfn1-dependent facilitation of SST+ neurons gradually increasing inhibition during repetitive SWS. Septal domains, in contrast, are targeted by facilitating POm inputs, combined with higher L4 SST+ density and Elfn1-mediated facilitation, producing progressive inhibitory buildup that amplifies the MWS/SWS ratio. SST+ projections in septa may extend trans-laminarly and laterally, influencing L2/3 and neighboring barrels, thereby explaining L2/3 divergence despite uniform SST+ density in L2/3. In this regards, direct laminar-dependent manipulations will be required to confirm whether L2/3 divergence is inherited from L4 dynamics. In Elfn1 KO mice, the loss of facilitation in SST+ neurons likely flattens these dynamics, disrupting functional segregation. Future experiments using VPM/POm-specific optogenetic activation and SST+ silencing will be critical to directly test this model.

      We expanded the discussion accordingly.

      (4) Can the decoder generalize between SWS and MWS? In this condition, if the decoder accuracy is higher for barrels than septa, it would support the idea that septa are processing the two stimuli differently. 

      Our results show that septal decoding accuracy is generally higher than barrel accuracy when generalizing from multi-whisker stimulation (MWS) to single-whisker stimulation (SWS), indicating distinct information processing in septa compared to barrels.

      In wild-type (WT) mice, septal accuracy exceeds barrel accuracy across all time windows (150ms, 51-95ms, 1-95ms), with the largest difference in the 51-95ms window (0.9944 vs. 0.9214 at pulse 20, 10Hz stimulation). This septal advantage grows with successive pulses, reflecting robust, separable neural responses, likely driven by the posterior medial nucleus (POm)’s strong MWS integration contrasting with minimal SWS activation. Barrel responses, driven by consistent ventral posteromedial nucleus (VPM) input for both stimuli, are less distinguishable, leading to lower accuracy.

      In Elfn1 knockout (KO) mice, which disrupt excitatory drive to somatostatin-positive (SST+) interneurons, barrel accuracy is higher initially in the 1-50ms window (0.8045 vs. 0.7500 at pulse 1), suggesting reduced early septal distinctiveness. However, septal accuracy surpasses barrels in later pulses and time windows (e.g., 0.9714 vs. 0.9227 in 51-95ms at pulse 20), indicating restored septal processing. This supports the role of SST+ interneurons in shaping distinct MWS responses in septa, particularly in late-phase responses (51-95ms), where inhibitory modulation is prominent, as confirmed by calcium imaging showing stronger SST+ activation during MWS.

      These findings demonstrate that septa process SWS and MWS differently, with higher decoding accuracy reflecting structured, POm- and SST+-driven response patterns. In Elfn1 KO mice, early deficits in septal processing highlight the importance of SST+ interneurons, with later recovery suggesting compensatory mechanisms. 

      We have added Supplementary Figure 4 and included this interpretation between lines 338353. 

      We thank the reviewer for suggesting this analysis.

      (5) It is not clear to me how the authors achieve SWS. How is it that the pipette tip "placed in contact with the principal whisker" does not detach from the principal whisker or stimulate other whiskers? Please clarify the methods. 

      Targeting the specific principal whisker is performed under the stereoscope.  

      Specifically, we have added this statement in line 628:

      “We trimmed the whiskers where necessary, to avoid them touching each other and to avoid stimulating other whiskers. By putting the pipette tip very close (almost touching) to the principal whisker, the movement of the tip (limited to 1mm) would reliably move the targeted whisker. The specificity of the stimulation of the selected principal whisker was observed under the stereoscope.”

      (6) The method for calculating decoder accuracy is not clearly described-how can accuracy exceed 1? The authors should clarify this metric and provide measures of variability (e.g., confidence intervals or standard deviations across runs) to assess the significance of their comparisons. Additionally, using a consistent scale across all plots would improve interoperability. 

      We thank the reviewer for raising this point. We have now changed the way accuracies are calculated and adopted a common scale among different plots (see updated Figure 5). We have also changed the methods section accordingly.

      (7) Figure 1: The sample size is not specified. It looks like the numbers match the description in the methods, but the sample size should be clearly stated here. 

      These are the numbers the reviewer is inquiring about. 

      WT: (WT) animals: a 280 × 95 × 20 matrix for the stimulated barrel (14 Barrels, 95ms, 20 pulses), a 180 × 95 × 20 matrix for the septa (9 Septa, 95ms, 20 pulses), and a 360 × 95 × 20 matrix for the neighboring barrel (18 Neighboring barrels, 95ms, 20 pulses). N=4 mice.

      KO: 11-barrel columns, 7 septal columns, 11 unstimulated neighbors from N=4 mice.

      Panels D-F are missing axes and axis labels (firing rate, p-value). Panel D is mislabeled (left, middle, and right). I can't seem to find the yellow line. 

      Thank you for this observation. We made changes in the figures to make them easier to navigate based on the collective feedback from the reviewers.

      Why is changing the way to compare the differences in the responses to repeated stimulation between SWS and MWS? 

      To assess temporal accumulation of information, we compared responses to repeated single-whisker stimulation (SWS) and multi-whisker stimulation (MWS) using an accumulative decoding approach rather than simple per-pulse firing rates. This method captures domain-specific integration dynamics over successive pulses.

      The use of the term "principal whisker" is confusing, as it could refer to the whisker that corresponds to the recorded barrel. 

      When we use the term principal whisker, the intention is indeed to refer to the whisker corresponding to the recorded barrel during single whisker stimulation. The term principal whisker is removed from Figure legend 1 and legend S1C where it may have led to  ambiguity.    

      Why the statement "after the start of active whisking"? Mice are under anesthesia here; it does not appear to be relevant for the figure. 

      “After the start of active whisking” refers to the state of the barrel cortex circuitry at the time of recordings. The particular reference we use comes from the habit of assessing sensory processing also from a developmental point of view. The reviewer is correct that it has nothing to do the with the status of the experiment. Nevertheless, since the reviewer found that it may create confusion, we have now taken it out. 

      (8) Figure 3: The y-axis label is missing for panel C. 

      This is now fixed. (dF/F).

      (9) Figure 4: Axis labels are missing.

      Added.

      Minor: 

      (10) Line 36: "progressive increase in septal spiking activity upon multi-whisker stimulation". There is no increase in septal spiking activity upon MWS; the ratio MWS/SWS increases.

      We have changed the sentence as follows: Genetic removal of Elfn1, which regulates the incoming excitatory synaptic dynamics onto SST+ interneurons, leads to the loss of the progressive increase in septal spiking ratio (MWS/SWS) upon stimulation.

      (11) Line 105: domain-specific, rather than column-specific, for consistency.

      We have changed it.

      (12) Lines 173-174: "a divergence between barrel and septa domain activity also occurred in Layer 4 from the 2nd pulse onward (Figure 1E)". The authors only show a restricted number of comparisons. Why not show the p-values as for SWS?

      The statistics is now presented in current Figure 1E.

      (13) Lines 151-153: "Correspondingly, when a single whisker is stimulated repeatedly, the response to the first pulse is principally bottom-up thalamic-driven responses, while the later pulses in the train are expected to also gradually engage cortico-thalamo-cortical and cortico-cortical loops." Can the authors please provide a reference?

      We have now added the following references : (Kyriazi and Simons, 1993; Middleton et al., 2010; Russo et al., 2025).

      (14) Lines 184-186: "Our electrophysiological experiments show a significant divergence of responses over time upon both SWS and MWS in L4 between barrels (principal and neighboring) and adjacent septa, with minimal initial difference". The only difference between the neighboring barrel and septa is the responses to the initial pulse. Can the author clarify? 

      We have now changed the sentence as follows: Our electrophysiological experiments show a significant divergence of responses between domains upon both SWS and MWS in L4. (Line 198 now)

      (15) Line 214: "suggest these interneurons may play a role in diverging responses between barrels and septa upon SWS". Why SWS specifically?

      We have changed the sentence as follows: These results confirmed that SST+ and VIP+ interneurons have higher densities in septa compared to barrels in L4 and suggest these interneurons may play a role in diverging responses between barrels and septa. (Line 231 now).

      (16) Line 235: "This result suggests that differential activation of SST+ interneurons is more likely to be involved in the domain-specific temporal ratio differences between barrels and septa". Why? The results here are not domain-specific.

      We have now revised this statement to: This result suggested that temporal ratio differences specific to barrels and septa might involve differential activation of SST+ interneurons rather than VIP+ interneurons.

      (17) Lines 241-243: "SST+ interneurons in the cortex are known to show distinct short-term synaptic plasticity, particularly strong facilitation of excitatory inputs, which enables them to regulate the temporal dynamics of cortical circuits." Please provide a reference.

      We have now added the following references: (Grier et al., 2023; Liguz-Lecznar et al., 2016).

      (18) Lines 245-247: "A key regulator of this plasticity is the synaptic protein Elfn1, which mediates short-term synaptic facilitation of excitation on SST+ interneurons (Stachniak et al., 2021, 2019; Tomioka et al., 2014)". Is Stachniak et al., 2021 not about the role of Elf1n in excitatory-to-VIP+ neuron synapses?

      The reviewer correctly spotted this discrepancy . This reference has now been removed from this statement.

      (19) Lines 271-272: "Building on our findings that Elfn1-dependent facilitation in SST+ interneurons is critical for maintaining barrel-septa response divergence". The authors did not show that.

      We have now changed the statement to: Building on our findings that Elfn1 is critical for maintaining barrel-septa response divergence  

      (20) Line 280: second firing peak, not "peal".

      Thank you, it is now fixed.

      (21) Lines 304-305: "These results highlight the critical role of Elfn1 in facilitating the temporal integration of 305 sensory inputs through its effects on SST+ interneurons". This claim is also overstated. 

      We have now changed the statement to: These results highlight the contribution of Elfn1 to the temporal integration of sensory inputs. (Line 362)

      (22) Line 329: Any reason why not cite Chen et al., Nature 2013?

      We have now added this reference, as also pointed out by reviewer 1.

      (23) Line 341-342: "wS1" and "wS2" instead of S1 and S2 for consistency.

      Thanks, we have now updated the terms.

      Reviewer #2 (Recommendations for the authors): 

      (1) Figure 3D - the SW conditions are labeled but not the MW conditions (two right graphs) - they should be labeled similarly (SSTMW, VIPMW). 

      The two right graphs in Figure 3D represent paired SW vs MW comparisons of the evoked responses for SST and VIP populations, respectively.

      (2) Figure 6 D and E I think it would be better if the Depth measurements were to be on the yaxis, which is more typical of these types of plots. 

      We thank the reviewer for this comment. Although we appreciate this may be the case, we feel that the current presentation may be easier for the reader to navigate, and we have hence kept it. 

      (3) Having an operational definition of septa versus barrel would be useful. As the authors point out, this is a tough distinction in a mouse, and often you read papers that use Barrel Wall versus Barrel Hollow/Center - operationally defining how these areas were distinguished would be helpful. 

      We thank the reviewer for this comment and understand the point made.

      We have now updated the methods section in line 611: 

      DiI marks contained within the vGlut2 staining were defined as barrel recordings, while DiI marks outside vGlut2 staining were septal recordings.

      Reviewer #3 (Recommendations for the authors): 

      To support the manuscript's major claims, the authors should consider the following:

      (1) Validate the septal identity of the neurons studied, either anatomically or functionally at the single-cell level (e.g., via Ca²⁺ imaging with confirmed barrel/septa mapping). 

      We thank the reviewer for this suggestion, but we feel that these extensive experiments are beyond the scope of this study. 

      (2) Provide both anatomical and physiological evidence to assess the possibility of altered cortical development in Elfn1 KO mice, including potential changes in barrel structure or SST⁺ cell distribution. 

      To address the reviewer’s point, we have now added the following to the Discussion: “Although Elfn1 is constitutively knocked out, we find here and in previous studies that barrel structure is preserved (Stachniak et al., 2019, 2023). Further, the distribution of Elfn1 expressing interneurons is not different in KO mice, suggesting minimal developmental disruption (Dolan and Mitchell, 2013). Nonetheless, we acknowledge that subtle circuit changes cannot be ruled out without conditional knockouts.”,

      (3) Examine the sensory responses of SST⁺ and VIP⁺ interneurons in deeper cortical layers, particularly layer 4, which is central to the study's main conclusions.

      We thank the reviewer for this suggestion and appreciate the value it would bring to the study. We nevertheless feel that these extensive experiments are beyond the scope of this study and hence opted out from performing them. 

      Minor Comments:

      (1)  The authors used a CLARITY-based passive clearing protocol, which is known to sometimes induce tissue swelling or distortion. This may affect anatomical precision, especially when assigning neurons to narrow domains such as septa versus barrels. Please clarify whether tissue expansion was measured, corrected, or otherwise accounted for during analysis.

      Yes, the tissue expansion was accounted during analysis for the laminar specification. We excluded the brains with severe distortion. 

      (2) While the anatomical data are plotted as a function of "depth from the top of layer 4," the manuscript does not specify the precise depth ranges used to define individual cortical layers in the cleared tissue. Given the importance of laminar specificity in projection and cell type analyses, the criteria and boundaries used to delineate each layer should be explicitly stated.

      Thank you for pointing this out. We now include the criteria for delineating each layer in the manuscript. “Given that the depth of Layer 4 (L4) can be reliably measured due to its welldefined barrel boundaries, and that the relative widths of other layers have been previously characterized (El-Boustani et al., 2018), we estimated laminar boundaries proportionally. Specifically, Layer 2/3 was set to approximately 1.3–1.5 times the width of L4, Layer 5a to ~0.5 times, and Layer 5b to a similar width as L4. Assuming uniform tissue expansion across the cortical column, we extrapolated the remaining laminar thicknesses proportionally.”

      (3)  In several key comparisons (e.g., SST⁺ vs. VIP⁺ interneurons, or S2-projecting vs. M1projecting neurons), it is unclear whether the same barrel columns were analyzed across conditions. Given the anatomical and functional heterogeneity across wS1 columns, failing to control for this may introduce significant confounds. We recommend analyzing matched columns across groups or, if not feasible, clearly acknowledging this limitation in the manuscript.

      We thank the reviewer for raising this important point. For the comparison of SST⁺ versus VIP⁺ interneurons, it would in principle have been possible to analyze the same barrel columns across groups. However, because some of the cleared brains did not reach the optimal level of clarity, our choice of columns was limited, and we were not always able to obtain sufficiently clear data from the same columns in both groups. Similarly, for the analysis of S2- versus M1-projecting neurons, variability in the position and spread of retrograde virus injections made it difficult to ensure measurements from identical barrel columns. We have now added a statement in the Discussion to acknowledge this limitation.

      (4) Figure 1C: Clarify what each point in the t-SNE plot represents-e.g., a single trial, a recording channel, or an averaged response. Also, describe the input features used for dimensionality reduction, including time windows and preprocessing steps.

      In response to the reviewer’s comment, we have now added the following in the methods: In summary, each point in the t-SNE plots represents an averaged response across 20 trials for a specific domain (barrel, septa, or neighbor) and genotype (WT or KO), with approximately 14 points per domain derived from the 280 trials in each dataset. The input features are preprocessed by averaging blocks of 20 trials into 1900-dimensional vectors (95ms × 20), which are then reduced to 2D using t-SNE with the specified parameters. This approach effectively highlights the segregation and clustering patterns of neural responses across cortical domains in both WT and KO conditions.

      (5) Figures 1D, E (left panels): The y-axes lack unit labeling and scale bars. Please indicate whether values are in spikes/sec, spikes/bin, or normalized units.

      We have now clarified this. 

      (6) Figures 1D, E (right panels): The color bars lack units. Specify whether the values represent raw firing rates, z-scores, or other normalized measures. Replace the vague term "Matrix representation" with a clearer label such as "Pulse-aligned firing heatmap."

      Thank you, we have now done it.

      (7) Figure 1E (bottom panel): There appears to be no legend referring to these panels. Please define labels such as "B" and "S." 

      Thank you, we have now done it.

      (8) Figure 1E legend: If it duplicates the legend from Figure 1D, this should be made explicit or integrated accordingly. 

      We have changed the structure of this figure.

      (9) Figure 1F: Define "AUC" and explain how it was computed (e.g., area under the firing rate curve over 0-50 ms). Indicate whether the plotted values represent percentages and, if so, label the y-axis accordingly. If normalization was applied, describe the procedure. Include sample sizes (n) and specify what each data point represents (e.g., animal, recording site). 

      The following paragraph has been added in the methods section:

      The Area Under the Curve (AUC) was computed as the integral of the smoothed firing rate (spikes per millisecond) over a 50ms window following each whisker stimulation pulse, using trapezoidal integration. Firing rate data for layer 4 barrel and septal regions in wild-type (WT) and knockout (KO) mice were smoothed with a 3-point moving average and averaged across blocks of 20 trials. Plotted values represent the percentage ratio of multi-whisker (MW) to single whisker (SW) AUC with error bars showing the standard error of the mean. Each data point reflects the mean AUC ratio for a stimulation pulse across approximately 11 blocks (220 trials total). The y-axis indicates percentages.

      (10) Figure 3C: Add units to the vertical axis.

      We have added them.

      (11) Figure 3D: Specify what each line represents (e.g., average of n cells, individual responses?). 

      Each line represents an average response of a neuron.  

      (12) Figure 4C legend: Same with what?". No legend refers to the bottom panels - please revise to clarify. 

      Thank you. We have now changed the figure structure and legends and fixed the missing information issue.

      (13) Supplementary Figure 1B: Indicate the physical length of the scale bar in micrometers. 

      This has been fixed. The scale bar is 250um.

      (14) Indicate the catalog number or product name of the 8×8 silicon probe used for recordings.

      We have added this information. It is the A8x8-Edge-5mm-100-200-177-A64

      References

      (1) Beierlein, M., Gibson, J. R. & Connors, B. W. (2003). Two dynamically distinct inhibitory networks in layer 4 of the neocortex. J. Neurophysiol. 90, 2987–3000.

      (2) Burkhalter, A., D’Souza, R. D. & Ji, W. (2023). Integration of feedforward and feedback information streams in the modular architecture of mouse visual cortex. Annu. Rev. Neurosci. 46, 259–280.

      (3) Chen, J. L., Margolis, D. J., Stankov, A., Sumanovski, L. T., Schneider, B. L. & Helmchen, F. (2015). Pathway-specific reorganization of projection neurons in somatosensory cortex during learning. Nat. Neurosci. 18, 1101–1108.

      (4) Connor, J. R. & Peters, A. (1984). Vasoactive intestinal polypeptide-immunoreactive neurons in rat visual cortex. Neuroscience 12, 1027–1044.

      (5) Cruikshank, S. J., Lewis, T. J. & Connors, B. W. (2007). Synaptic basis for intense thalamocortical activation of feedforward inhibitory cells in neocortex. Nat. Neurosci. 10, 462–468.

      (6) Dolan, J. & Mitchell, K. J. (2013). Mutation of Elfn1 in mice causes seizures and hyperactivity. PLoS One 8, e80491.

      (7) Gibson, J. R., Beierlein, M. & Connors, B. W. (1999). Two networks of electrically coupled inhibitory neurons in neocortex. Nature 402, 75–79.

      (8) Ji, W., Gămănuţ, R., Bista, P., D’Souza, R. D., Wang, Q. & Burkhalter, A. (2015). Modularity in the organization of mouse primary visual cortex. Neuron 87, 632–643.

      (9) Martin-Cortecero, J. & Nuñez, A. (2014). Tactile response adaptation to whisker stimulation in the lemniscal somatosensory pathway of rats. Brain Res. 1591, 27–37.

      (10) Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M. & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. J. Neurosci. 29, 5326–5335.

      (11) Meier, A. M., Wang, Q., Ji, W., Ganachaud, J. & Burkhalter, A. (2021). Modular network between postrhinal visual cortex, amygdala, and entorhinal cortex. J. Neurosci. 41, 4809– 4825.

      (12) Meier, A. M., D’Souza, R. D., Ji, W., Han, E. B. & Burkhalter, A. (2025). Interdigitating modules for visual processing during locomotion and rest in mouse V1. bioRxiv 2025.02.21.639505.

      (13) Scala, F., Kobak, D., Shan, S., Bernaerts, Y., Laturnus, S., Cadwell, C. R., Hartmanis, L., Froudarakis, E., Castro, J. R., Tan, Z. H., et al. (2019). Layer 4 of mouse neocortex differs in cell types and circuit organization between sensory areas. Nat. Commun. 10, 4174.

      (14) Stachniak, T. J., Sylwestrak, E. L., Scheiffele, P., Hall, B. J. & Ghosh, A. (2019). Elfn1induced constitutive activation of mGluR7 determines frequency-dependent recruitment of somatostatin interneurons. J. Neurosci. 39, 4461–4475.

      (15) Stachniak, T. J., Kastli, R., Hanley, O., Argunsah, A. Ö., van der Valk, E. G. T., Kanatouris, G. & Karayannis, T. (2021). Postmitotic Prox1 expression controls the final specification of cortical VIP interneuron subtypes. J. Neurosci. 41, 8150–8166.

      (16) Stachniak, T. J., Argunsah, A. Ö., Yang, J. W., Cai, L. & Karayannis, T. (2023). Presynaptic kainate receptors onto somatostatin interneurons are recruited by activity throughout development and contribute to cortical sensory adaptation. J. Neurosci. 43, 7101–7118.

      (17) Sun, Q.-Q., Huguenard, J. R. & Prince, D. A. (2006). Barrel cortex microcircuits: Thalamocortical feedforward inhibition in spiny stellate cells is mediated by a small number of fast-spiking interneurons. J. Neurosci. 26, 1219–1230.

      (18) Sylwestrak, E. L. & Ghosh, A. (2012). Elfn1 regulates target-specific release probability at CA1-interneuron synapses. Science 338, 536–540.

      (19) Tan, Z., Hu, H., Huang, Z. J. & Agmon, A. (2008). Robust but delayed thalamocortical activation of dendritic-targeting inhibitory interneurons. Proc. Natl. Acad. Sci. USA 105, 2187–2192.

      (20) Tomioka, N. H., Yasuda, H., Miyamoto, H., Hatayama, M., Morimura, N., Matsumoto, Y., Suzuki, T., Odagawa, M., Odaka, Y. S., Iwayama, Y., et al. (2014). Elfn1 recruits presynaptic mGluR7 in trans and its loss results in seizures. Nat. Commun. 5, 4501.

      (21) Yamashita, T., Vavladeli, A., Pala, A., Galan, K., Crochet, S., Petersen, S. S. & Petersen, C. C. (2018). Diverse long-range axonal projections of excitatory layer 2/3 neurons in mouse barrel cortex. Front. Neuroanat. 12, 33.

    1. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Pereira de Castro and coworkers are studying potential competition between a more standard splicing factor SF1 and an alternative splicing factor called QK1. This is interesting because they bind to overlapping sequence motifs and could potentially have opposing effects on promoting the splicing reaction. To test this idea, the authors KD either SF1 or QK1 in mammalian cells and uncover several exons whose splicing regulation follows the predicted pattern of being promoted for splicing by SF1 and repressed by QK1. Importantly, these have introns enriched in SF1 and QK1 motifs. The authors then focus on one exon in particular with two tandem motifs to study the mechanism of this in greater detail and their results confirm the competition model. Mass spec analysis largely agrees with their proposal; however, it is complicated by apparently quick transition of SF1 bound complexes to later splicing intermediates. An inspired experiment in yeast shows how QK1 competition could potentially have a determinental impact on splicing in an orthogonal system. Overall these results show how splicing regulation can be achieved by competition between a "core" and alternative splicing factor and provide additional insight into the complex process of branch site recognition. The manuscript is exceptionally clear and the figures and data very logically presented. The work will be valuable to those in the splicing field who are interested in both mechanism and bioinformatics approaches to deconvolve any apparent "splicing code" being used by cells to regulate gene expression.

      Strengths:

      (1) The main discovery of the manuscript involving evidence for SF1/QK1 competition is quite interesting and important for this field. This evidence has been missing and may change how people think about branch site recognition.

      (2) The experiments and the rationale behind them are clearly and logically presented.

      (3) The experiments are carried out to a high standard and well-designed controls are included.

      (4) The extrapolation of the result to yeast in order to show the potentially devastating consequences of QK1 competition was creative and informative.

      Weaknesses:

      Overall the weaknesses are relatively minor and involve cases where conclusions could potentially have been strengthened with additional experimentation. For example, pull-down of the U2 snRNP could be strengthened by detection of the snRNA whereas the proteins may themselves interact with these factors in the absence of the snRNA. In addition the discussion is a bit speculative given the data, but compelling nonetheless.

    2. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important manuscript provides insights into the competition between Splicing Factor 1 (SF1) and Quaking (QKI) for binding at the ACUAA branch point sequence in a model intron, regulating exon inclusion. The study employs rigorous transcriptomic, proteomic, and reporter assays, with both mammalian cell culture and yeast models. Nevertheless, while the data are convincing, broadening the analysis to additional exons and narrowing the manuscript's title to better align with the experimental scope would strengthen the work.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors aimed to show that SF1 and QKI compete for the intron branch point sequence ACUAA and provide evidence that QKI represses inclusion when bound to it.

      Major strengths of this manuscript include:

      (1) Identification of the ACUAA-like motif in exons regulated by QKI and SF1.

      (2) The use of the splicing reporter and mutant analysis to show that upstream and downstream ACUAAC elements in intron 10 of RAI are required for repressing splicing.

      (3) The use of proteomic to identify proteins in C2C12 nuclear extract that binds to the wild type and mutant sequence.

      (4) The yeast studies showing that ectopic lethality when Qki5 expression was induced, due to increased mis-splicing of transcripts that contain the ACUAA element.

      The authors conclusively show that the ACUAA sequence is bound by QKI and provide strong evidence that this leads to differences in exons inclusion and exclusion. In animal cells, and especially in human, branchpoint sequences are degenerate but seem to be recognized by specific splicing factors. Although a subset of splicing factors shows tissue-specific expression patterns most don't, suggesting that yet-to-be-identified mechanisms regulate splicing. This work suggests that an alternate mechanism could be related to the binding affinity of specific RNA binding factors for branchpoint sequences coupled with the level of these different splicing factors in a given cell.

      We thank the reviewer for the positive comments.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Pereira de Castro and coworkers are studying potential competition between a more standard splicing factor SF1, and an alternative splicing factor called QK1. This is interesting because they bind to overlapping sequence motifs and could potentially have opposing effects on promoting the splicing reaction. To test this idea, the authors KD either SF1 or QK1 in mammalian cells and uncover several exons whose splicing regulation follows the predicted pattern of being promoted for splicing by SF1 and repressed by QK1. Importantly, these have introns enriched in SF1 and QK1 motifs. The authors then focus on one exon in particular with two tandem motifs to study the mechanism of this in greater detail and their results confirm the competition model. Mass spec analysis largely agrees with their proposal; however, it is complicated by the apparently quick transition of SF1-bound complexes to later splicing intermediates. An inspired experiment in yeast shows how QK1 competition could potentially have a detrimental impact on splicing in an orthogonal system. Overall, these results show how splicing regulation can be achieved by competition between a "core" and alternative splicing factor and provide additional insight into the complex process of branch site recognition. The manuscript is exceptionally clear and the figures and data are very logically presented. The work will be valuable to those in the splicing field who are interested in both mechanism and bioinformatics approaches to deconvolve any apparent "splicing code" being used by cells to regulate gene expression. Criticisms are minor and the most important of them stem from overemphasis on parts of the manuscript on the evolutionary angle when evolution itself wasn't analyzed per se.

      We thank the reviewer for the positive comments and very clear and fair critical points.

      Strengths:

      (1) The main discovery of the manuscript involving evidence for SF1/QK1 competition is quite interesting and important for this field. This evidence has been missing and may change how people think about branch site recognition.

      (2) The experiments and the rationale behind them are exceptionally clearly and logically presented. This was wonderful!

      Thank you so much. We felt the overall flow of the paper and data make for a nice “story” that conveys a relatively easy-to-understand explanation for a complex subject.

      (3) The experiments are carried out to a high standard and well-designed controls are included.

      (4) The extrapolation of the result to yeast in order to show the potentially devastating consequences of the QK1 competition was very exciting and creative.

      We agree this is a very exciting result and finding! Thanks.

      Weaknesses:

      Overall the weaknesses are relatively minor and involve cases where clarification is necessary, some additional analysis could bolster the arguments, and suggestions for focusing the manuscript on its strengths.

      (1) The title (Ancient...evolutionary outcomes), abstract, and some parts of the discussion focus heavily on the evolutionary implications of this work. However, evolutionary analysis was not performed in these studies (e.g., when did QK1 and SF1 proteins arise and/or diverge? How does this line up with branch site motifs and evolution of U2? Any insight from recent work from Scott Roy et al?). I think this aspect either needs to be bolstered with experimental work/data or this should be tamped down in the manuscript. I suggest highlighting the idea expressed in the sentence "A nuanced implication of this model is that loss-of-function...". To me, this is better supported by the data and potentially by some analysis of mutations associated with human disease.

      We have revised the title and dampened the evolutionary aspects of the previous version of the manuscript.

      (2) One paper that I didn't see cited was that by Tanackovic and Kramer (Mol Biol Cell 2005). This paper is relevant because they KD SF1 and found it nonessential for splicing in vivo. Do their results have implications for those here? How do the results of the KD compare? Could QK1 competition have influenced their findings (or does their work influence the "nuanced implication" model referenced above?)?

      This is an interesting point, and thank you for the suggestion. We have now included a brief description of this study in the Introduction of the revised manuscript and do note that the authors measured intron retention of a beta globin reporter and SF3A1, SF3A2, and SF3A3 during SF1 knockdown, but did not detect elevated unspliced RNA in these targets.

      (3) Can the authors please provide a citation for the statement "degeneracy is observed to a higher degree in organisms with more alternative splicing"? Does recent evolutionary analysis support this?

      We have removed the statement, as it did not add much to the content and I am not sure I can state the concept I was attempting to convey in a simple manner with few citations.

      (4) For the data in Figure 3, I was left wondering if NMD was confounding this analysis. Can the authors respond to this and address this concern directly?

      We have not measured if the reporters used in Figure 3 produce protein(s). Presumably, though, all spliced reporter RNA would be degraded equally (the included/skipped isoforms’ “reading frames” are not altered from one another). This would not be case for unspliced nuclear reporter RNA, however. Given this difference, and that our analysis can not resolve the subcellular localization of the different reporter species, we have removed the measurement of and subsequent results describing unspliced reporter RNA from Figure 3.

      (5) To me, the idea that an engaged U2 snRNP was pulled down in Figure 4F would be stronger if the snRNA was detected. Was that able to be observed by northern or primer extension? Would SF1 be enriched if the U2 snRNA was degraded by RNaseH in the NE?

      We did not measure any co-associating RNAs in this experimental approach, but agree that this approach would strengthen the evidence for it.

      (6) I'm wondering how additive the effects of QK1 and SF1 are... In Figure 2, if QK1 and SF1 are both knocked down, is the splicing of exon 11 restored to "wt" levels?

      This is an interesting question that we were unfortunately unable to address experimentally here.

      (7) The first discussion section has two paragraphs that begin "How does competition between SF1..." and "Relatively little is known about how...". I found the discussion and speculation about localization, paraspekles, and lncRNAs interesting but a bit detracting from the strengths of the manuscript. I would suggest shortening these two paragraphs into a single one.

      We have revised the Discussion.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors were trying to establish whether competition between the RNA-binding proteins SF1 and QKI controlled splicing outcomes. These two proteins have similar binding sites and protein sequences, but SF1 lacks a dimerization motif and seems to bind a single version of the binding sequence. Importantly, these binding sequences correspond to branchpoint consensus sequences, with SF1 binding leading to productive splicing, but QKI binding leading instead to association with paraspeckle proteins. They show that in human cells SF1 generally activates exons and QKI represses, and a large group of the jointly regulated exons (43% of joint targets) are reciprocally controlled by SF1 and QKI. They focus on one of these exons RAI14 that shows this reciprocal pattern of regulation, and has 2 repeats of the binding site that make it a candidate for joint regulation, and confirm regulation within a minigene context. The authors used the assembly of proteins within nuclear extracts to explain the effect of QKI versus SF1 binding. Finally, the authors show that the expression of QKI is lethal in yeast, and causes splicing defects.

      How this fits in the field. This study is interesting and provides a conceptual advance by providing a general rule on how SF1 and QKI interact in relation to binding sites, and the relative molecular fates followed, so is very useful. Most of the analysis seems to focus on one example, although the molecular analysis and global work significantly add to the picture from the previously published paper about NUMB joint regulation by QKI and SF (Zong et al, cited in text as reference 50, that looked at SF1 and QKI binding in relation to a duplicated binding site/branchpoint sequence in NUMB).

      Thank you for the encouraging remarks.

      Strengths:

      The data presented are strong and clear. The ideas discussed in this paper are of wide interest, and present a simple model where two binding sites generate a potentially repressive QKI response, whereas exons that have a single upstream sequence are just regulated by SF1. The assembly of splicing complexes on RNAs derived from RAI14 in nuclear extracts, followed by mass spec gave interesting mechanistic insight into what was occurring as a result of QKI versus SF1 binding.

      Weaknesses:

      I did not think the title best summarises the take-home message and could be perhaps a bit more modest. Although the authors investigated splicing patterns in yeast and human cells, yeast do not have QKI so there is no ancient competition in that case, and the study did not really investigate physiological or evolutionary outcomes in splicing, although it provides interesting speculation on them. Also as I understood it, the important issue was less conserved branchpoints in higher eukaryotes enabling alternative splicing, rather than competition for the conserved branchpoint sequence. So despite the the data being strong and properly analysed and discussed in the paper, could the authors think whether they fit best with the take-home message provided in the title? Just as a suggestion (I am sure the authors can do a better job), maybe "molecular competition between variant branchpoint sequences predict physiological and evolutionary outcomes in splicing"?

      Thank you for this point (Reviewer 2 had a similar comment) and the suggestion. We have revised the title.

      Although the authors do provide some global data, most of the detailed analysis is of RAI14. It would have been useful to examine members of the other quadrants in Figure 1C as well for potential binding sites to give a reason why these are not co-regulated in the same way as RAI14. How many of the RAI14 quadrants had single/double sites (the motif analysis seemed to pull out just one), and could one of the non-reciprocally regulated exons be moved into a different quadrant by addition or subtraction of a binding site or changing the branchpoint (using a minigene approach for example).

      This is an interesting point that we have considered. Our intent with the focus on RAI14 was to use a naturally occurring intron bps with evidence of strong QKI binding that did not require a high degree of sequence manipulation or engineering.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Most of my recommendations are really centered on the figures. In their current state, they detract from the data shown and could be improved: I recommend the authors use a uniform font. For example, Figure 1E and F have at least three different fonts of varying sizes making it very messy. In Figure 1C, the authors could bold the Ral14 ex11 or simply indicate that the blue is this exon in the legend, thus removing the text from this very busy graph. In Figure 4F, I would recommend, having all the labels the same size and putting those genes of interest like Sf3a1 in bold. This could also be done in Figure 4E.

      Thank you for the suggestion and we have edited these (FYI the font in Fig’s 1E and 1F were from the rMAPS default output, but I agree, it gives a sloppy appearance).

      (2) In Figures 4D and 4G, is there QKI binding to the downstream deletion mutant after 30 minutes? Also, in Figure 4G, are these all from the same blot? The band sizes seem to be very different between lanes. If these were not on the same blot, the original gels should be submitted.

      A small amount of Qki appears to be binding after 30 min. All lanes/blots are from the same gels/membranes; see new Supplemental Figure 4 for the original (uncropped) images of the blots.

      (3) The authors should indicate, the source and concentration of the antibodies used for their WB. They should also indicate the primers used for RT-PCRs.

      We have revised the methods to include the antibody information and have uploaded a supplemental table 8 with all oligonucleotide sequences used (which I (Sam Fagg) neglected to do initially, so that’s my bad).

      Reviewer #2 (Recommendations for the authors):

      (1) This may come down to the author's preference but branch point and branch site are frequently two words, not a single compound word (branch point vs. branchpoint). In addition, the authors may want to use branchsite with the abbreviation BS more frequently since they often don't describe the specific point of branching, and bp and bps could be confused for the more frequent abbreviations for base pair(s).

      Good suggestion; we have edited the text accordingly.

      (2) In general the addition of page numbers and line numbers to the manuscript would greatly aid reviewers!

      Point taken…

      (3) Introduction; "...under normal growth conditions they are efficiently spliced". I would say MOST introns in yeast are efficiently spliced. This is definitely not universal.

      Text edited to indicate that most are efficiently spliced.

      (4) Introduction; " recognition of the bps by SF1 (mammals) (20)". The choice of reference 20 is an odd one here. I think the Robin Reed and Michael Rosbash paper was the first to show SF1 was the human homolog of BBP.

      Got it, thanks (added #14 here and kept #20 also since it shows the structure of SF1 in complex with a UACUAAC bps.)

      (5) Results; "QK1 and SF1 co-regulate.."; it may be useful for the reader if you could explain in more detail why exon inclusion and intron retention are expected outcomes for QK1 knockdown and vice versa for SF1. The exon inclusion here is more obvious than the intron retention phenotype. (In other words, if more exons are included shouldn't it follow that more introns are removed?)

      We explain the expected results for exon inclusion in the Introduction and this paragraph of the Results. Although we have observed more intron retention under QKI loss-of-function approaches before, I am uncertain where the reviewer sees that we indicate any expected result for intron retention from either QKI or SF1 knockdown. I believe the statement you refer to might be on line 162 and starts with: “Consistent with potentially opposing functions in splicing…” ?

      Also, I agree that if SF1 is a “splicing activator,” one might expect more IR in its absence (but this is not the case; there is, in fact, less), but nonetheless, the opposite outcome is observed with QKI knockdown (more IR). It is unclear why this is the case, and we did not investigate it.

      (6) Results; "QK1 and SF1 co-regulate.."; "Thus the most highly represented set.." To me, the most highly represented set is those which are not both QK1-repressed and SF1-activated. Does this indicate that other factors are involved at most sites than simple competition between these two?

      We have revised the sentence in question to include the text “by quadrant” in order to convey our meaning more precisely.

      (7) Throughout the manuscript, 5 apostrophes and 3 apostrophes are used instead of 5 prime symbols and 3 prime symbols.

      Thank you for pointing that out. We have fixed each instance of this.

      (8) Sometimes SF1 is written as Sf1. (also Tatsf1)

      This was a mouse/human gene/protein nomenclature error that we have fixed; thank you for pointing this out.

      (9) You may want to make sure that figures are labeled consistently with the manuscript text. In Figure 1B, it is RI rather than IR. In Figure 4 it is myoblast NE rather than C2C12 nuclear extract.

      We have fixed these, checked for other examples, and where relevant, edited those too.

      (10) I think Figure 1A could be improved by also including a depiction of the domain arrangements of SF1 and QK1.

      Done.

      (11) I was a bit confused with all the lines in Figure 1E and 1F. What is the difference between the log (pVal) and upregulated plots? Can these figures be simplified or explained more thoroughly?

      Based on this comment and one from Reviewer 1, we have slightly revised the wording (and font) on the output, which hopefully clarifies. These are motif enrichment plots generated by rMAPS (Refs 61 and 62) analysis of rMATS (Ref 60) data for exons more included (depicted by the red lines) or more skipped (depicted by the blue lines) compared to control versus a “background” set of exons that are detectable but unchanged. The -log<sub>10</sub> is P-value (dotted line) indicates the significance of exons more included in shRNA treatment vs control shRNA (previously read “upregulated”) compared to background exons that are detectable but unchanged; the solid lines indicate the motif score; these are described in the references indicated.

      (12) Figure 1B, it is a bit hard to conclude that there is more AltEx or "RI/IR" in one sample vs. the other from these plots since the points overlay one another. Can you include numbers here?

      Added (and deleted Suppl Fig S1, which was simply a chart showing the numbers).

      (13) How was PSI calculated in Figure 2A?

      VAST-tools (we state this in the legend in the revised version).

      You may want to include rel protein (or the lower limit of detection) for Figure 2B to be consistent with 2C. Why is KD of SF1 so poor and variable between 2C and 2D?

      We have not investigated this, but these blots show an optimized result that we were able to obtain for the knockdown in each cell type. It may be that HEK293 cells (Fig 2B) have a stronger requirement for SF1 than C2C12 cells…? I would argue that it is not necessarily “poor” in Fig 2C, as we observe ~70% depletion of the protein.

      Why are two bands present in the gel?

      Two to three isoforms of SF1 are present in most cell types.

      A good (or bad, really) example of an SF1 western blot (and knockdown of ~35% in K562 or ~45% in HepG2 can also be seen on the ENCODE project website, for reference:

      https://www.encodeproject.org/documents/6001a414-b096-4073-94ff-3af165617eb5/@@download/attachment/SF1_BGKLV28-49.pdf

      By comparison, I think ours are much more cosmetically pleasing, and our knockdown (especially in C2C12) is much more efficient.

      (14) Figure 3, The asterisk refers to a cryptic product. Can the uaAcuuuCAG be used as a branch point? Presumably the natural 3' SS is now too close so this would result in activation of a downstream 3'SS?

      We did not pursue determining the identity of this minor and likely artefactual product, but we (and others) have observed a similar phenomenon when using splicing reporter-based mutational approaches.

      (15) For the methods. The "RNA extraction, RT -PCR,..." subheading needs to be on its own line. Please add (w/v) or (v/v) to percentages where appropriate. Please convert ug to the symbol for "micro".

      Thank you, we have made these changes.

      (16) In Figure 4B, the text here and legend are microscopic. Even with reading glasses, I couldn't make anything out!

      We have increased the font sizes for the text and scale bar…when referring to “legend” does the reviewer mean the scale bar?

      (17) As a potential discussion item, it is worth noting that SF1 could also repress splicing if it could either not engage with U2AF or be properly displaced by U2 snRNP so the snRNA could pair. I was wondering if QK1 could similarly be activating if it could engage with U2AF. I'm unsure if this could be tested by domain swaps (and is beyond the scope of this paper). It just may be worth speculating about.

      Good point and suggestion…we are looking into this.

      Reviewer #3 (Recommendations for the authors):

      (1) Is the reference in the text to Figure 5F correct for actin splicing (this is just before the discussion)?

      I see references several lines up from this, but I do not see a reference just before the discussion…?

      (2) I was not sure why the minigene experiments showed such high levels of intron retention that seemed to be impacted also by deletion of the branchpoint sequences, and suggest that the two branchpoints are not equal in strength.

      Neither were we, but Reviewer 2 has suggested that degradation of the spliced products could be rapid (NMD substrates) which could complicate the interpretation of what appears to be higher levels of intron retention. Given the possibility that this could be a non-physiological artefact, we have removed the measurement of unspliced reporter and now only show the spliced products (equally subject to degradation) and report their percent inclusion.

    1. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate the nanoscopic distribution of glycine receptor subunits in the hippocampus, dorsal striatum, and ventral striatum of the mouse brain using single-molecule localization microscopy (SMLM). They demonstrate that only a small number of glycine receptors are localized at hippocampal inhibitory synapses. Using dual-color SMLM, they further show that clusters of glycine receptors are predominantly localized within gephyrin-positive synapses. A comparison between the dorsal and ventral striatum reveals that the ventral striatum contains approximately eight times more glycine receptors and this finding is consistent with electrophysiological data on postsynaptic inhibitory currents. Finally, using cultured hippocampal neurons, they examine the differential synaptic localization of glycine receptor subunits (α1, α2, and β). This study is significant as it provides insights into the nanoscopic localization patterns of glycine receptors in brain regions where this protein is expressed at low levels. Additionally, the study demonstrates the different localization patterns of GlyR in distinct striatal regions and its physiological relevance using SMLM and electrophysiological experiments. However, several concerns should be addressed.

      Specific comments on the original version:

      (1) Colocalization analysis in Figure 1A. The colocalization between Sylite and mEos-GlyRβ appears to be quite low. It is essential to assess whether the observed colocalization is not due to random overlap. The authors should consider quantifying colocalization using statistical methods, such as a pixel shift analysis, to determine whether colocalization frequencies remain similar after artificially displacing one of the channels.

      (2) Inconsistency between Figure 3A and 3B. While Figure 3B indicates an ~8-fold difference in the number of mEos4b-GlyRβ detections per synapse between the dorsal and ventral striatum, Figure 3A does not appear to show a pronounced difference in the localization of mEos4b-GlyRβ on Sylite puncta between these two regions. If the images presented in Figure 3A are not representative, the authors should consider replacing them with more representative examples or providing an expanded images with multiple representative examples. Alternatively, if this inconsistency can be explained by differences in spot density within clusters, the authors should explain that.

      (3) Quantification in Figure 5. It is recommended that the authors provide quantitative data on cluster formation and colocalization with Sylite puncta in Figure 5 to support their qualitative observations.

      (4) Potential for pseudo replication. It's not clear whether they're performing stats tests across biological replica, images, or even synapses. They often quote mean +/- SEM with n = 1000s, and so does that mean they're doing tests on those 1000s? Need to clarify.

      (5) Does mEoS effect expression levels or function of the protein? Can't see any experiments done to confirm this. Could suggest WB on homogenate, or mass spec?

      (6) Quantification of protein numbers is challenging with SMLM. Issues include i) some of FP not correctly folded/mature, and ii) dependence of localisation rate on instrument, excitation/illumination intensities, and also the thresholds used in analysis. Can the authors compare with another protein that has known expression levels- e.g. PSD95? This is quite an ask, but if they could show copy number of something known to compare with, it would be useful.

      (7) Rationale for doing nanobody dSTORM not clear at all. They don't explain the reason for doing the dSTORM experiments. Why not just rely on PALM for coincidence measurements, rather than tagging mEoS with a nanobody, and then doing dSTORM with that? Can they explain? Is it to get extra localisations- i.e. multiple per nanobody? If so, localising same FP multiple times wouldn't improve resolution. Also, no controls for nanobody dSTORM experiments- what about non-spec nb, or use on WT sections?

      (8) What resolutions/precisions were obtained in SMLM experiments? Should perform Fourier Ring Correlation (FRC) on SR images to state resolutions obtained (particularly useful for when they're presenting distance histograms, as this will be dependent on resolution). Likewise for precision, what was mean precision? Can they show histograms of localisation precision.

      (9) Why were DBSCAN parameters selected? How can they rule out multiple localisations per fluor? If low copy numbers (<10), then why bother with DBSCAN? Could just measure distance to each one.

      (10) For microscopy experiment methods, state power densities, not % or "nominal power".

      (11) In general, not much data presented. Any SI file with extra images etc.?

      (12) Clarification of the discussion on GlyR expression and synaptic localization: The discussion on GlyR expression, complex formation, and synaptic localization is sometimes unclear, and needs terminological distinctions between "expression level", "complex formation" and "synaptic localization". For example, the authors state: "What then is the reason for the low protein expression of GlyRβ? One possibility is that the assembly of mature heteropentameric GlyR complexes depends critically on the expression of endogenous GlyR α subunits." Does this mean that GlyRβ proteins that fail to form complexes with GlyRα subunits are unstable and subject to rapid degradation? If so, the authors should clarify this point. The statement "This raises the interesting possibility that synaptic GlyRs may depend specifically on the concomitant expression of both α1 and β transcripts." suggests a dependency on α1 and β transcripts. However, is the authors' focus on synaptic localization or overall protein expression levels? If this means synaptic localization, it would be beneficial to state this explicitly to avoid confusion. To improve clarity, the authors should carefully distinguish between these different aspects of GlyR biology throughout the discussion. Additionally, a schematic diagram illustrating these processes would be highly beneficial for readers.

      (13) Interpretation of GlyR localization in the context of nanodomains. The distribution of GlyR molecules on inhibitory synapses appears to be non-homogeneous, instead forming nanoclusters or nanodomains, similar to many other synaptic proteins. It is important to interpret GlyR localization in the context of nanodomain organization.

      Significance:

      The paper presents biological and technical advances. The biological insights revolve mostly on the documentation of Glycine receptors in particular synapses in forebrain, where they are typically expressed at very low levels. The authors provide compelling data indicating that the expression is of physiological significance. The authors have done a nice job of combining genetically tagged mice with advanced microscopy methods to tackle the question of distributions of synaptic proteins. Overall, these advances are more incremental than groundbreaking.

      Comments on revised version:

      The authors have addressed the majority of the significant issues raised in the review and revised the manuscript appropriately. One issue that can be further addressed relates to the issue of pseudo-replication. The authors state in their response that "All experiments were repeated at least twice to ensure reproducibility (N independent experiments). Statistical tests were performed on pooled data across the biological replicates; n denotes the number of data points used for testing (e.g., number of synaptic clusters, detections, cells, as specified in each case).". This suggests that they're not doing their stats on biological replicates, and instead are pseudo replicating. It's not clear how they have ensured reproducibility, when the stats seem to have been done on pooled data across repeats.

    2. Author response:

      The following is the authors’ response to the current reviews.

      We thank the editors of eLife and the reviewers for their thorough evaluation of our study. As regards the final comments of reviewer 1 please note that all experimental replicates were first analyzed separately, and were then pooled, since the observed changes were comparable between experiments. This mean that statistical analyses were done on pooled biological replicates.


      The following is the authors’ response to the original reviews.

      General Statements

      We thank the reviewers for their thorough and constructive evaluation of our work. We have revised the manuscript carefully and addressed all the criticisms raised, in particular the issues mentioned by several of the reviewers (see point-by-point response below). We have also added a number of explanations in the text for the sake of clarity, while trying to keep the manuscript as concise as possible.

      In our view, the novelty of our research is two-fold. From a neurobiological point of view, we provide conclusive evidence for the existence of glycine receptors (GlyRs) at inhibitory synapses in various brain regions including the hippocampus, dentate gyrus and sub-regions of the striatum. This solves several open questions and has fundamental implications for our understanding of the organisation and function of inhibitory synapses in the telencephalon. Secondly, our study makes use of the unique sensitivity of single molecule localisation microscopy (SMLM) to identify low protein copy numbers. This is a new way to think about SMLM as it goes beyond a mere structural characterisation and towards a quantitative assessment of synaptic protein assemblies.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity): 

      In this manuscript, the authors investigate the nanoscopic distribution of glycine receptor subunits in the hippocampus, dorsal striatum, and ventral striatum of the mouse brain using single-molecule localization microscopy (SMLM). They demonstrate that only a small number of glycine receptors are localized at hippocampal inhibitory synapses. Using dual-color SMLM, they further show that clusters of glycine receptors are predominantly localized within gephyrinpositive synapses. A comparison between the dorsal and ventral striatum reveals that the ventral striatum contains approximately eight times more glycine receptors and this finding is consistent with electrophysiological data on postsynaptic inhibitory currents. Finally, using cultured hippocampal neurons, they examine the differential synaptic localization of glycine receptor subunits (α1, α2, and β). This study is significant as it provides insights into the nanoscopic localization patterns of glycine receptors in brain regions where this protein is expressed at low levels. Additionally, the study demonstrates the different localization patterns of GlyR in distinct striatal regions and its physiological relevance using SMLM and electrophysiological experiments. However, several concerns should be addressed. 

      The following are specific comments: 

      (1) Colocalization analysis in Figure 1A. The colocalization between Sylite and mEos-GlyRβ appears to be quite low. It is essential to assess whether the observed colocalization is not due to random overlap. The authors should consider quantifying colocalization using statistical methods, such as a pixel shift analysis, to determine whether colocalization frequencies remain similar after artificially displacing one of the channels. 

      Following the suggestion of reviewer 1, we re-analysed CA3 images of Glrb<sup>eos/eos</sup> hippocampal slices by applying a pixel-shift type of control, in which the Sylite channel (in far red) was horizontally flipped relative to the mEos4b-GlyRβ channel (in green, see Methods). As expected, the number of mEos4b-GlyRβ detections per gephyrin cluster was markedly reduced compared to the original analysis (revised Fig. 1B), confirming that the synaptic mEos4b detections exceed chance levels (see page 5). 

      (2) Inconsistency between Figure 3A and 3B. While Figure 3B indicates an ~8-fold difference in the number of mEos4b-GlyRβ detections per synapse between the dorsal and ventral striatum, Figure 3A does not appear to show a pronounced difference in the localization of mEos4bGlyRβ on Sylite puncta between these two regions. If the images presented in Figure 3A are not representative, the authors should consider replacing them with more representative examples or providing an expanded images with multiple representative examples. Alternatively, if this inconsistency can be explained by differences in spot density within clusters, the authors should explain that. 

      The pointillist images in Fig. 3A are essentially binary (red-black). Therefore, the density of detections at synapses cannot be easily judged by eye. For clarity, the original images in Fig. 3A have been replaced with two other examples that better reflect the different detection numbers in the dorsal and ventral striatum. 

      (3) Quantification in Figure 5. It is recommended that the authors provide quantitative data on cluster formation and colocalization with Sylite puncta in Figure 5 to support their qualitative observations. 

      This is an important point that was also raised by the other reviewers. We have performed additional experiments to increase the data volume for analysis. For quantification, we used two approaches. First, we counted the percentage of infected cells in which synaptic localisation of the recombinant receptor subunit was observed (Fig. 5C). We found that mEos4b-GlyRa1 consistently localises at synapses, indicating that all cells express endogenous GlyRb. When neurons were infected with mEos4b-GlyRb, fewer cells had synaptic clusters, meaning that indeed, GlyR alpha subunits are the limiting factor for synaptic targeting. In cultures infected with mEos4b-GlyRa2, only very few neurons displayed synaptic localisation (as judged by epifluorescence imaging). We think this shows that GlyRa2 is less capable of forming heteromeric complexes than GlyRa1, in line with our previous interpretation (see pp. 9-10, 13). 

      Secondly, we quantified the total intensity of each subunit at gephyrin-positive domains, both in infected neurons as well as non-infected control cultures (Fig. 5D). We observed that mEos4bGlyRa1 intensity at gephyrin puncta was higher than that of the other subunits, again pointing to efficient synaptic targeting of GlyRa1. Gephyrin cluster intensities (Sylite labelling) were not significantly different in GlyRb and GlyRa2 expressing neurons compared to the uninfected control, indicating that the lentiviral expression of recombinant subunits does not fundamentally alter the size of mixed inhibitory synapses in hippocampal neurons. Interestingly, gephyrin levels were slightly higher in hippocampal neurons expressing mEos4b-GlyRa1. In our view, this comes from an enhanced expression and synaptic targeting of mEos4b-GlyRa1 heteromers with endogenous GlyRb, pointing to a structural role of GlyRa1/b in hippocampal synapses (pp. 10, 13).

      The new data and analyses have been described and illustrated in the relevant sections of the manuscript.

      (4) Potential for pseudo replication. It's not clear whether they're performing stats tests across biological replica, images, or even synapses. They often quote mean +/- SEM with n = 1000s, and so does that mean they're doing tests on those 1000s? Need to clarify. 

      All experiments were repeated at least twice to ensure reproducibility (N independent experiments). Statistical tests were performed on pooled data across the biological replicates; n denotes the number of data points used for testing (e.g., number of synaptic clusters, detections, cells, as specified in each case). We have systematically given these numbers in the revised manuscript (n, N, and other experimental parameters such as the number of animals used, coverslips, images or cells). Data are generally given as mean +/- SEM or as mean +/- SD as indicated.

      (5) Does mEoS effect expression levels or function of the protein? Can't see any experiments done to confirm this. Could suggest WB on homogenate, or mass spec? 

      The Glrb<sup>eos/eos</sup> knock-in mouse line has been characterised previously and does not to display any ultrastructural or functional deficits at inhibitory synapses (Maynard et al. 2021 eLife). GlyRβ expression and glycine-evoked responses were not significantly different to those of the wildtype. The synaptic localisation of mEos4b-GlyRb in KI animals demonstrates correct assembly of heteromeric GlyRs and synaptic targeting. Accordingly, the animals do not display any obvious phenotype. We have clarified this in the manuscript (p. 4). In the case of cultured neurons, long-term expression of fluorescent receptor subunits with lentivirus   has proven ideal to achieve efficient synaptic targeting. The low and continuous supply of recombinant receptors ensures assembly with endogenous subunits to form heteropentameric receptor complexes (e.g. [Patrizio et al. 2017 Sci Rep]). In the present study, lentivirus infection did not induce any obvious differences in the number or size of inhibitory synapses compared to control neurons, as judged by Sylite labelling of synaptic gephyrin puncta (new Fig. 5D).

      (6) Quantification of protein numbers is challenging with SMLM. Issues include i) some of FP not correctly folded/mature, and ii) dependence of localisation rate on instrument, excitation/illumination intensities, and also the thresholds used in analysis. Can the authors compare with another protein that has known expression levels- e.g. PSD95? This is quite an ask, but if they could show copy number of something known to compare with, it would be useful. 

      We agree that absolute quantification with SMLM is challenging, since the number of detections depends on fluorophore maturation, photophysics, imaging conditions, and analysis thresholds (discussed in Patrizio & Specht 2016, Neurophotonics). For this reason, only very few datasets provide reliable copy numbers, even for well-studied proteins such as PSD-95. One notable exception is the study by Maynard et al. (eLife 2021) that quantified endogenous GlyRβcontaining receptors in spinal cord synapses using SMLM combined with correlative electron microscopy. The strength of this work was the use of a KI mouse strain, which ensures that mEos4b-GlyRβ expression follows intrinsic regional and temporal profiles. The authors reported a stereotypic density of ~2,000 GlyRs/µm² at synapses, corresponding to ~120 receptors per synapse in the dorsal horn and ~240 in the ventral horn, taking into account various parameters including receptor stoichiometry and the functionality of the fluorophore. These values are very close to our own calculations of GlyR numbers at spinal cord synapses that were obtained slightly differently in terms of sample preparation, microscope setup, imaging conditions, and data analysis, lending support to our experimental approach. Nevertheless, the obtained GlyR copy numbers at hippocampal synapses clearly have to be taken as estimates rather than precise figures, because the number of detections from a single mEos4b fluorophore can vary substantially, meaning that the fluorophores are not represented equally in pointillist images. This can affect the copy number calculation for a specific synapse, in particular when the numbers are low (e.g. in hippocampus), however, it should not alter the average number of detections (Fig. 1B) or the (median) molecule numbers of the entire population of synapses (Fig. 1C). We have discussed the limitations of our approach (p. 11).

      (7) Rationale for doing nanobody dSTORM not clear at all. They don't explain the reason for doing the dSTORM experiments. Why not just rely on PALM for coincidence measurements, rather than tagging mEoS with a nanobody, and then doing dSTORM with that? Can they explain? Is it to get extra localisations- i.e. multiple per nanobody? If so, localising same FP multiple times wouldn't improve resolution. Also, no controls for nanobody dSTORM experiments- what about non-spec nb, or use on WT sections? 

      As discussed above (point 6), the detection of fluorophores with SMLM is influenced by many parameters, not least the noise produced by emitting molecules other than the fluorophore used for labelling. Our study is exceptional in that it attempts to identify extremely low molecule numbers (down to 1). To verify that the detections obtained with PALM correspond to mEos4b, we conducted robust control experiments (including pixel-shift as suggested by the reviewer, see point 1, revised Fig. 1B). The rationale for the nanobody-based dSTORM experiments was twofold: (1) to have an independent readout of the presence of low-copy GlyRs at inhibitory synapses and (2) to analyse the nanoscale organisation of GlyRs relative to the synaptic gephyrin scaffold using dual-colour dSTORM with spectral demixing (see p. 6). The organic fluorophores used in dSTORM (AF647, CF680) ensure high photon counts, essential for reliable co-localisation and distance analysis. PALM and dSTORM cannot be combined in dual-colour mode, as they require different buffers and imaging conditions. 

      The specificity of the anti-Eos nanobody was demonstrated by immunohistochemistry in spinal cord cultures expressing mEos4b-GlyRb and wildtype control tissue (Fig. S3). In response to the reviewer's remarks, we also performed a negative control experiment in Glrb<sup>eos/eos</sup> slices (dSTORM), in which the nanobody was omitted (new Fig. S4F,G). Under these conditions, spectral demixing produced a single peak corresponding to CF680 (gephyrin) without any AF647 contribution (Fig. S4F). The background detection of "false" AF647 detections at synapses was significantly lower than in the slices labelled with the nanobody. We conclude that the fluorescence signal observed in our dual-colour dSTORM experiments arises from the specific detection of mEos4b-GlyRb by the nanobody, rather than from background, crossreactivity or wrong attribution of colour during spectral demixing. We have added these data and explanations in the results (p. 7) and in the figure legend of Fig. S4F,G.

      (8) What resolutions/precisions were obtained in SMLM experiments? Should perform Fourier Ring Correlation (FRC) on SR images to state resolutions obtained (particularly useful for when they're presenting distance histograms, as this will be dependent on resolution). Likewise for precision, what was mean precision? Can they show histograms of localisation precision. 

      This is an interesting question in the context of our experiments with low-copy GlyRs, since the spatial resolution of SMLM is limited also by the density of molecules, i.e. the sampling of the structure in question (Nyquist-Shannon criterion). Accordingly, the priority of the PALM experiments was to improve the sensibility of SMLM for the identification of mEos4b-GlyRb subunits, rather than to maximize the spatial resolution. The mean localisation precision in PALM was 33 +/- 12 nm, as calculated from the fitting parameters of each detection (Zeiss, ZEN software), which ultimately result from their signal-to-noise ratio. This is a relatively low precision for SMLM, which can be explained by the low brightness of mEos4b compared to organic fluorophores together with the elevated fluorescence background in tissue slices.

      In the case of dSTORM, the aim was to study the relative distribution of GlyRs within the synaptic scaffold, for which a higher localisation precision was required (p. 6). Therefore, detections with a precision ≥ 25 nm were filtered during analysis with NEO software (Abbelight). The retained detections had a mean localisation precision of 12 +/- 5 for CF680 (Sylite) and 11 +/- 4 for AF647 (nanobody). These values are given in the revised manuscript (pp. 18, 22).

      (9) Why were DBSCAN parameters selected? How can they rule out multiple localisations per fluor? If low copy numbers (<10), then why bother with DBSCAN? Could just measure distance to each one. 

      Multiple detections of the same fluorophore are intrinsic to dSTORM imaging and have not been eliminated from the analysis. Small clusters of detections likely represent individual molecules (e.g. single receptors in the extrasynaptic regions, Fig. 2A). DBSCAN is a robust clustering method that is quite insensitive to minor changes in the choice of parameters. For dSTORM of synaptic gephyrin clusters (CF680), a relatively low length (80 nm radius) together with a high number of detections (≥ 50 neighbours) were chosen to reconstruct the postsynaptic domain with high spatial resolution (see point 8). In the case of the GlyR (nanobody-AF647), the clustering was done mostly for practical reasons, as it provided the coordinates of the centre of mass of the detections. The low stringency of this clustering (200 nm radius, ≥ 5 neighbours) effectively filters single detections that can result from background noise or incorrect demixing. An additional reference explaining the use of DBSCAN including the choice of parameters is given on p. 22 (see also R2 point 4).

      (10) For microscopy experiment methods, state power densities, not % or "nominal power". 

      Done. We now report the irradiance (laser power density) instead of nominal power (pp. 18, 21). 

      (11) In general, not much data presented. Any SI file with extra images etc.? 

      The original submission included four supplementary figures with additional data and representative images that should have been available to the reviewer (Figs. S1-S4). The SI file has been updated during revision (new Fig. S4E-G). 

      (12) Clarification of the discussion on GlyR expression and synaptic localization: The discussion on GlyR expression, complex formation, and synaptic localization is sometimes unclear, and needs terminological distinctions between "expression level", "complex formation" and "synaptic localization". For example, the authors state:"What then is the reason for the low protein expression of GlyRβ? One possibility is that the assembly of mature heteropentameric GlyR complexes depends critically on the expression of endogenous GlyR α subunits." Does this mean that GlyRβ proteins that fail to form complexes with GlyRα subunits are unstable and subject to rapid degradation? If so, the authors should clarify this point. The statement "This raises the interesting possibility that synaptic GlyRs may depend specifically on the concomitant expression of both α1 and β transcripts." suggests a dependency on α1 and β transcripts. However, is the authors' focus on synaptic localization or overall protein expression levels? If this means synaptic localization, it would be beneficial to state this explicitly to avoid confusion. To improve clarity, the authors should carefully distinguish between these different aspects of GlyR biology throughout the discussion. Additionally, a schematic diagram illustrating these processes would be highly beneficial for readers. 

      We thank the reviewer to point this out. We are dealing with several processes; protein expression that determines subunit availability and the assembly of pentameric GlyRs complexes, surface expression, membrane diffusion and accumulation of GlyRb-containing receptor complexes at inhibitory synapses. We have edited the manuscript, particularly the discussion and tried to be as clear as possible in our wording.

      We chose not to add a schematic illustration for the time being, because any graphical representation is necessarily a simplification. Instead, we preferred to summarise the main numbers in tabular form (Table 1). We are of course open to any other suggestions.

      (13) Interpretation of GlyR localization in the context of nanodomains. The distribution of GlyR molecules on inhibitory synapses appears to be non-homogeneous, instead forming nanoclusters or nanodomains, similar to many other synaptic proteins. It is important to interpret GlyR localization in the context of nanodomain organization. 

      The dSTORM images in Fig. 2 are pointillist representations that show individual detections rather than molecules. Small clusters of detections are likely to originate from a single AF647 fluorophore (in the case of nanobody labelling) and therefore represent single GlyRb subunits. Since GlyR copy numbers are so low at hippocampal synapses (≤ 5), the notion of nanodomain is not directly applicable. Our analysis therefore focused on the integration of GlyRs within the postsynaptic scaffold, rather than attempting to define nanodomain structures (see also response to point 8 of R1). A clarification has been added in the revised manuscript (p. 6).

      Reviewer #1 (Significance): 

      The paper presents biological and technical advances. The biological insights revolve mostly on the documentation of Glycine receptors in particular synapses in forebrain, where they are typically expressed at very low levels. The authors provide compelling data indicating that the expression is of physiological significance. The authors have done a nice job of combining genetically-tagged mice with advanced microscopy methods to tackle the question of distributions of synaptic proteins. Overall these advances are more incremental than groundbreaking. 

      We thank the reviewer for acknowledging both the technical and biological advances of our study. While we recognize that our work builds upon established models, we consider that it also addresses important unresolved questions, namely that GlyRs are present and specifically anchored at inhibitory synapses in telencephalic regions, such as the hippocampus and striatum. From a methodological point of view, our study demonstrates that SMLM can be applied not only for structural analysis of highly abundant proteins, but also to reliably detect proteins present at very low copy numbers. This ability to identify and quantify sparse molecule populations adds a new dimension to SMLM applications, which we believe increases the overall impact of our study beyond the field of synaptic neuroscience.

      Reviewer #2 (Evidence, reproducibility and clarity): 

      In their manuscript "Single molecule counting detects low-copy glycine receptors in hippocampal and striatal synapses" Camuso and colleagues apply single molecule localization microscopy (SMLM) methods to visualize low copy numbers of GlyRs at inhibitory synapses in the hippocampal formation and the striatum. SMLM analysis revealed higher copy numbers in striatum compared to hippocampal inhibitory synapses. They further provide evidence that these low copy numbers are tightly linked to post-synaptic scaffolding protein gephyrin at inhibitory synapses. Their approach profits from the high sensitivity and resolution of SMLM and challenges the controversial view on the presence of GlyRs in these formations although there are reports (electrophysiology) on the presence of GlyRs in these particular brain regions. These new datasets in the current manuscript may certainly assist in understanding the complexity of fundamental building blocks of inhibitory synapses. 

      However I have some minor points that the authors may address for clarification: 

      (1) In Figure 1 the authors apply PALM imaging of mEos4b-GlyRß (knockin) and here the corresponding Sylite label seems to be recorded in widefield, it is not clearly stated in the figure legend if it is widefield or super-resolved. In Fig 1 A - is the scale bar 5 µm? Some Sylite spots appear to be sized around 1 µm, especially the brighter spots, but maybe this is due to the lower resolution of widefield imaging? Regarding the statistical comparison: what method was chosen to test for normality distribution, I think this point is missing in the methods section. 

      This is correct; the apparent size of the Sylite spots does not reflect the real size of the synaptic gephyrin domain due to the limited resolution of widefield imaging including the detection of outof-focus light. We have clarified in the legend of Fig. 1A that Sylite labelling was with classic epifluorescence microscopy. The scale bar in Fig. 1A corresponds to 5 µm. Since the data were not normally distributed, nonparametric tests (Kruskal- Wallis one-way ANOVA with Dunn’s multiple comparison test or Mann-Whitney U-test for pairwise comparisons) were used (p. 23). 

      Moreover I would appreciate a clarification and/or citation that the knockin model results in no structural and physiological changes at inhibitory synapses, I believe this model has been applied in previous studies and corresponding clarification can be provided. 

      The Glrbeos/eos mouse model has been described previously and does not exhibit any structural or physiological phenotypes (Maynard et al. 2021 eLife). The issue was also raised by reviewer R1 (point 5) and has been clarified in the revised manuscript (p. 4).

      (2) In the next set of experiments the authors switch to demixing dSTORM experiments - an explanation why this is performed is missing in the text - I guess better resolution to perform more detailed distance measurements? For these experiments: which region of the hippocampus did the authors select, I cannot find this information in legend or main text. 

      Yes, the dSTORM experiments enable dual-colour structural analysis at high spatial resolution (see response to R1 point 7). An explanation has been added (p. 6).

      (3) Regarding parameters of demixing experiments: the number of frames (10.000) seems quite low and the exposure time higher than expected for Alexa 647. Can the authors explain the reason for chosing these particular parameters (low expression profile of the target - so better separation?, less fluorophores on label and shorter collection time?) or is there a reference that can be cited? The laser power is given in the methods in percentage of maximal output power, but for better comparison and reproducibility I recommend to provide the values of a power meter (kW/cm2) as lasers may change their maximum output power during their lifetime. 

      Acquisition parameters (laser power, exposure time) for dSTORM were chosen to obtain a good localisation precision (~12 nm; see R1 point 8). The number of frames is adequate to obtain well sampled gephyrin scaffolds in the CF680 channel. In the case of the GlyR (nanobody-AF647), the concept of spatial resolution does not really apply due to the low number of targets (see R1, point 13). Power density (irradiance) values have now been given (pp. 18, 21).

      (4) For analysis of subsynaptic distribution: how did the authors decide to choose the parameters in the NEO software for DBSCAN clustering - was a series of parameters tested to find optimal conditions and did the analysis start with an initial test if data is indeed clustered (K-ripley) or is there a reference in literature that can be provided? 

      DBSCAN parameters were optimised manually, by testing different values. Identification of dense and well-delimited gephyrin clusters (CF680) was achieved with a small radius and a high number of detections (80 nm, ≥ 50 neighbours), whereas filtering of low-density background in the AF647 channel (GlyRs) required less stringent parameters (200 nm, ≥ 5) due to the low number of target molecules. Similar parameters were used in a previous publication (Khayenko et al. 2022, Angewandte Chemie). The reference has been provided on p. 22 (see also R1 point 9).

      (5) A conclusion/discussion of the results presented in Figure 5 is missing in the text/discussion. 

      This part of the manuscript has been completely overhauled. It includes new experimental data, quantification of the data (new Fig.5), as well as the discussion and interpretation of our findings (see also R1, point 3). In agreement with our earlier interpretation, the data confirm that low availability of GlyRa1 subunits limits the expression and synaptic targeting of GlyRa1/b heteropentamers. The observation that GlyRa1 overexpression with lentivirus increases the size of the postsynaptic gephyrin domain further points to a structural role, whereby GlyRs can enhance the stability (and size) of inhibitory synapses in hippocampal neurons, even at low copy numbers (pp. 13-14). 

      (6) In line 552 "suspension" is misleading, better use "solution" 

      Done.

      Reviewer #2 (Significance): 

      Significance: The manuscript provides new insights to presence of low-copy numbers by visualizing them via SMLM. This is the first report that visualizes GlyR optically in the brain applying the knock-in model of mEOS4b tagged GlyRß and quantifies their copy number comparing distribution and amount of GlyRs from hippocampus and striatum. Imaging data correspond well to electrophysiological measurements in the manuscript. 

      Field of expertise: Super-Resolution Imaging and corresponding analysis 

      Reviewer #4 (Evidence, reproducibility and clarity): 

      In this study, Camuso et al., make use of a knock-in mouse model expressing endogenously mEos4b-tagged GlyRβ to detect endogenous glycine receptors using single-molecule localization microscopy. The main conclusion from this study is that in the hippocampus GlyRβ molecules are barely detected, while inhibitory synapses in the ventral striatum seem to express functionally relevant GlyR numbers. 

      I have a few points that I hope help to improve the strength of this study. 

      - In the hippocampus, this study finds that the numbers of detections are very low. The authors perform adequate controls to indicate that these localizations are above noise level. Nevertheless, it remains questionable that these reflect proper GlyRs. The suggestion that in hippocampal synapses the low numbers of GlyRβ molecules "are important in assembly or maintenance of inhibitory synaptic structures in the brain" is on itself interesting, but is not at all supported. It is also difficult to envision how such low numbers could support the structure of a synapse. A functional experiment showing that knockdown of GlyRs affects inhibitory synapse structure in hippocampal neurons would be a minimal test of this. 

      It is not clear what the reviewer means by “it remains questionable that these reflect proper GlyRs”. The PALM experiments include a series of stringent controls (see R1, point 1) demonstrating the existence of low-copy GlyRs at inhibitory synapses in the hippocampus (Fig. 1) and in the striatum (Fig. 3), and are backed up by dSTORM experiments (Fig. 2). We have no reason to doubt that these receptors are fully functional (as demonstrated for the ventral striatum (Fig. 4). However, due to their low number, a role in inhibitory synaptic transmission is clearly limited, at least in the hippocampus and dorsal striatum. 

      We therefore propose a structural role, where the GlyRs could be required to stabilise the postsynaptic gephyrin domain in hippocampal neurons. This is based on the idea that the GlyRgephyrin affinity is much higher than that of the GABAAR-gephyrin interaction (reviewed in Kasaragod & Schindelin 2018 Front Mol Neurosci). Accordingly, there is a close relationship between GlyRs and gephyrin numbers, sub-synaptic distribution, and dynamics in spinal cord synapses that are mostly glycinergic (Specht et al. 2013 Neuron; Maynard et al. 2021 eLife; Chapdelaine et al. 2021 Biophys J). It is reasonable to assume that low-copy GlyRs could play a similar structural role at hippocampal synapses. A knockdown experiment targeting these few receptors is technically very challenging and beyond the scope of this study. However, in response to the reviewer's question we have conducted new experiments in cultured hippocampal neurons (new Fig. 5). They demonstrate that overexpression of GlyRa1/b heteropentamers increases the size of the postsynaptic domain in these neurons, supporting our interpretation of a structural role of low-copy GlyRs (p. 14).

      - The endogenous tagging strategy is a very strong aspect of this study and provides confidence in the labeling of GlyRβ molecules. One caveat however, is that this labeling strategy does not discriminate whether GlyRβ molecules are on the cell membrane or in internal compartments. Can the authors provide an estimate of the ratio of surface to internal GlyRβ molecules? 

      Gephyrin is known to form a two-dimensional scaffold below the synaptic membrane to which inhibitory GlyRs and GABAARs attach (reviewed in Alvarez 2017 Brain Res). The majority of the synaptic receptors are therefore thought to be located in the synaptic membrane, which is supported by the close relationship between the sub-synaptic distribution of GlyRs and gephyrin in spinal cord neurons (e.g. Maynard et al. 2021 eLife). To demonstrate the surface expression of GlyRs at hippocampal synapses we labelled cultured hippocampal neurons expressing mEos4b-GlyRa1 with anti-Eos nanobody in non-permeabilised neurons (see Author response image 1). The close correspondence between the nanobody (AF647) and the mEos4b signal confirms that the majority of the GlyRs are indeed located in the synaptic membrane.

      Author response image 1.

      Left: Lentivirus expression of mEos4b-GlyRa1 in fixed and non-permeabilised hippocampal neurons (mEos4b signal). Right: Surface labelling of the recombinant subunit with anti-Eos nanoboby (AF647). 

      - “We also estimated the absolute number of GlyRs per synapse in the hippocampus. The number of mEos4b detections was converted into copy numbers by dividing the detections at synapses by the average number of detections of individual mEos4b-GlyRβ containing receptor complexes”. In essence this is a correct method to estimate copy numbers, and the authors discuss some of the pitfalls associated with this approach (i.e., maturation of fluorophore and detection limit). Nevertheless, the authors did not subtract the number of background localizations determined in the two negative control groups. This is critical, particularly at these low-number estimations. 

      We fully agree that background subtraction can be useful with low detection numbers. In the revised manuscript, copy numbers are now reported as background-corrected values. Specifically, the mean number of detections measured in wildtype slices was used to calculate an equivalent receptor number, which was then subtracted from the copy number estimates across hippocampus, spinal cord and striatum. This procedure is described in the methods (p. 20) and results (p. 5, 8), and mentioned in the figure legends of Fig. 1C, 3C. The background corrected values are given in the text and Table 1.

      - Furthermore, the authors state that "The advantage of this estimation is that it is independent of the stoichiometry of heteropentameric GlyRs". However, if the stoichometry is unknown, the number of counted GlyRβ subunits cannot simply be reported as the number of GlyRs. This should be discussed in more detail, and more carefully reported throughout the manuscript. 

      The reviewer is right to point this out. There is still some debate about the stoichiometry of heteropentameric GlyRs. Configurations with 2a:3b, 3a:2b and 4a:1b subunits have been advanced (e.g. Grudzinska et al. 2005 Neuron; Durisic et al. 2012 J Neurosci; Patrizio et al. 2017 Sci Rep; Zhu & Gouaux 2021 Nature). We have therefore chosen a quantification that is independent of the underlying stoichiometry. Since our quantification is based on very sparse clusters of mEos4b detections that likely originate from a single receptor complex (irrespective of its stoichiometry), the reported values actually reflect the number of GlyRs (and not GlyRb subunits). We have clarified this in the results (p. 5) and throughout the manuscript (Table 1). 

      - The dual-color imaging provides insights in the subsynaptic distribution of GlyRβ molecules in hippocampal synapses. Why are similar studies not performed on synapses in the ventral striatum where functionally relevant numbers of GlyRβ molecules are found? Here insights in the subsynaptic receptor distribution would be of much more interest as it can be tight to the function. 

      This is an interesting suggestion. However, the primary aim of our study was to identify the existence of GlyRs in hippocampal regions. At low copy numbers, the concept of sub-synaptic domains (SSDs, e.g. Yang et al. 2021 EMBO Rep) becomes irrelevant (see R1 point 13). It should be pointed out that the dSTORM pointillist images (Fig. 2A) represent individual GlyR detections rather than clusters of molecules. In the striatum, our specific purpose was to solve an open question about the presence of GlyRs in different subregions (putamen, nucleus accumbens).

      - It is unclear how the experiments in Figure 5 add to this study. These results are valid, but do not seem to directly test the hypothesis that "the expression of α subunits may be limiting factor controlling the number of synaptic GlyRs". These experiments simply test if overexpressed α subunits can be detected. If the α subunits are limiting, measuring the effect of α subunit overexpression on GlyRβ surface expression would be a more direct test. 

      Both R1 and R2 have also commented on the data in Fig. 5 and their interpretation. We have substantially revised this section as described before (see R1 point 3) including additional experiments and quantification of the data (new Fig. 5). The findings lend support to our earlier hypothesis that GlyR alpha subunits (in particular GlyRa1) are the limiting factor for the expression of heteropentameric GlyRa/b in hippocampal neurons (pp. 13-14). Since the GlyRa1 subunit itself does not bind to gephyrin (Patrizio et al. 2017 Sci Rep), the synaptic localisation of the recombinant mEos4b-GlyRa1 subunits is proof that they have formed heteropentamers with endogenous GlyRb subunits and driven their membrane trafficking, which the GlyRb subunits are incapable of doing on their own.

      Reviewer #4 (Significance): 

      These results are based on carefully performed single-molecule localization experiments, and are well-presented and described. The knockin mouse with endogenously tagged GlyRβ molecules is a very strong aspect of this study and provides confidence in the labeling, the combination with single-molecule localization microscopy is very strong as it provides high sensitivity and spatial resolution. 

      The conceptual innovation however seems relatively modest, these results confirm previous studies but do not seem to add novel insights. This study is entirely descriptive and does not bring new mechanistic insights. 

      This study could be of interest to a specialized audience interested in glycine receptor biology, inhibitory synapse biology and super-resolution microscopy. 

      My expertise is in super-resolution microscopy, synaptic transmission and plasticity 

      As we have stated before, the novelty of our study lies in the use of SMLM for the identification of very small numbers of molecules, which requires careful control experiments. This is something that has not been done before and that can be of interest to a wider readership, as it opens up SMLM for ultrasensitive detection of rare molecular events. Using this approach, we solve two open scientific questions: (1) the demonstration that low-copy GlyRs are present at inhibitory synapses in the hippocampus, (2) the sub-region specific expression and functional role of GlyRs in the ventral versus dorsal striatum.

      The following review was provided later under the name “Reviewer #4”. To avoid confusion with the last reviewer from above we will refer to this review as R4-2.

      Reviewer #4-2 (Evidence, reproducibility and clarity):  

      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      The authors investigate the presence of synaptic glycine receptors in the telencephalon, whose presence and function is poorly understood. 

      Using a transgenically labeled glycine receptor beta subunit (Glrb-mEos4b) mouse model together with super-resolution microscopy (SLMM, dSTORM), they demonstrate the presence of a low but detectable amount of synaptically localized GLRB in the hippocampus. While they do not perform a functional analysis of these receptors, they do demonstrate that these subunits are integrated into the inhibitory postsynaptic density (iPSD) as labeled by the scaffold protein gephyrin. These findings demonstrate that a low level of synaptically localized glycerine receptor subunits exist in the hippocampal formation, although whether or not they have a functional relevance remains unknown.

      They then proceed to quantify synaptic glycine receptors in the striatum, demonstrating that the ventral striatum has a significantly higher amount of GLRB co-localized with gephyrin than the dorsal striatum or the hippocampus. They then recorded pharmacologically isolated glycinergic miniature inhibitory postsynaptic currents (mIPSCs) from striatal neurons. In line with their structural observations, these recordings confirmed the presence of synaptic glycinergic signaling in the ventral striatum, and an almost complete absence in the dorsal striatum. Together, these findings demonstrate that synaptic glycine receptors in the ventral striatum are present and functional, while an important contribution to dorsal striatal activity is less likely.

      Lastly, the authors use existing mRNA and protein datasets to show that the expression level of GLRA1 across the brain positively correlates with the presence of synaptic GLRB.

      The authors use lentiviral expression of mEos4b-tagged glycine receptor alpha1, alpha2, and beta subunits (GLRA1, GLRA1, GLRB) in cultured hippocampal neurons to investigate the ability of these subunits to cause the synaptic localization of glycine receptors. They suggest that the alpha1 subunit has a higher propensity to localize at the inhibitory postsynapse (labeled via gephyrin) than the alpha2 or beta subunits, and may therefore contribute to the distribution of functional synaptic glycine receptors across the brain.

      Major comments:

      - Are the key conclusions convincing?

      The authors are generally precise in the formulation of their conclusions.

      (1) They demonstrate a very low, but detectable, amount of a synaptically localized glycine receptor subunit in a transgenic (GlrB-mEos4b) mouse model. They demonstrate that the GLRB-mEos4b fusion protein is integrated into the iPSD as determined by gephyrin labelling. The authors do not perform functional tests of these receptors and do not state any such conclusions.

      (2) The authors show that GLRB-mEos4b is clearly detectable in the striatum and integrated into gephyrin clusters at a significantly higher rate in the ventral striatum compared to the dorsal striatum, which is in line with previous studies.

      (3) Adding to their quantification of GLRB-mEos4b in the striatum, the authors demonstrate the presence of glycinergic miniature IPSCs in the ventral striatum, and an almost complete absence of mIPSCs in the dorsal striatum. These currents support the observation that GLRB-mEos4b is more synaptically integrated in the ventral striatum compared to the dorsal striatum.

      (4) The authors show that lentiviral expression of GLRA1-mEos4b leads to a visually higher number of GLR clusters in cultured hippocampal neurons, and a co-localization of some clusters with gephyrin. The authors claim that this supports the idea that GLRA1 may be an important driver of synaptic glycine receptor localization. However, no quantification or statistical analysis of the number of puncta or their colocalization with gephyrin is provided for any of the expressed subunits. Such a claim should be supported by quantification and statistics 

      A thorough analysis and quantification of the data in Fig.5 has been carried out as requested by all the other reviewers (e.g. R1, point 3). The new data and results have been described in the revised manuscript (pp. 9-10, 13-14).

      - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      One unaddressed caveat is the fact that a GLRB-mEos4b fusion protein may behave differently in terms of localization and synaptic integration than wild-type GLRB. While unlikely, it is possible that mEos4b interacts either with itself or synaptic proteins in a way that changes the fused GLRB subunit’s localization. Such an effect would be unlikely to affect synaptic function in a measurable way, but might be detected at a structural level by highly sensitive methods such as SMLM and STORM in regions with very low molecule numbers (such as the hippocampus). Since reliable antibodies against GLRB in brain tissue sections are not available, this would be difficult to test. Considering that no functional measures of the hippocampal detections exist, we would suggest that this possible caveat be mentioned for this particular experiment.

      This question has also been raised before (R1, point 5). According to an earlier study the mEos4b-GlyRb knock-in does not cause any obvious phenotypes, with the possible exception of minor loss of glycine potency (Maynard et al. 2021 eLife). The fact that the synaptic levels in the spinal cord in heterozygous animals are precisely half of those of homozygous animals argues against differences in receptor expression, heteropentameric assembly, forward trafficking to the plasma membrane and integration into the synaptic membrane as confirmed using quantitative super-resolution CLEM (Maynard et al. 2021 eLife). Accordingly, we did not observe any behavioural deficits in these animals, making it a powerful experimental model. We have added this information in the revised manuscript (p. 4). 

      In addition, without any quantification or statistical analysis, the author’s claims regarding the necessity of GLRA1 expression for the synaptic localization of glycine receptors in cultured hippocampal neurons should probably be described as preliminary (Fig. 5).

      As mentioned before, we have substantially revised this part (R1, point 3). The quantification and analysis in the new Fig. 5 support our earlier interpretation.

      - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The authors show that there is colocalization of gephyrin with the mEos4b-GlyRβ subunit using the Dual-colour SMLM. This is a powerful approach that allows for a claim to be made on the synaptic location of the glycine receptors. The images presented in Figure 1, together with the distance analysis in Figure 2, display the co-localization of the fluorophores. The co-localization images in all the selected regions, hippocampus and striatum, also show detections outside of the gephyrin clusters, which the authors refer to as extrasynaptic. These punctated small clusters seem to have the same size as the ones detected and assigned as part of the synapse. It would be informative if the authors analysed the distribution, density and size of these nonsynaptic clusters and presented the data in the manuscript and also compared it against the synaptic ones. Validating this extrasynaptic signal by staining for a dendritic marker, such as MAP-2 or maybe a somatic marker and assessing the co-localization with the non-synaptic clusters would also add even more credibility to them being extrasynaptic. 

      The existence of extrasynaptic GlyRs is well attested in spinal cord neurons (e.g. Specht et al. 2013 Neuron; this study see Fig. S2). The fact that these appear as small clusters of detections in SMLM recordings results from the fact that a single fluorophore can be detected several times in consecutive image frames and because of blinking. Therefore, small clusters of detections likely represent single GlyRs (that can be counted), and not assemblies of several receptor complexes. Due to their diffusion in the neuronal membrane, they are seen as diffuse signals throughout the somatodendritic compartment in epifluorescence images (e.g. Fig. 5A). SMLM recordings of the same cells resolves this diffuse signal into discrete nanoclusters representing individual receptors (Fig. 5B). It is not clear what information co-localisation experiments with specific markers could provide, especially in hippocampal neurons, in which the copy numbers (and density) of GlyRs is next to zero.

      In addition we would encourage the authors to quantify the clustering and co-localization of virally expressed GLRA1, GLRA2, and GLRB with gephyrin in order to support the associated claims (Fig. 5). Preferably, the density of GLR and gephyrin clusters (at least on the somatic surface, the proximal dendrites, or both) as well as their co-localization probability should be quantified if a causal claim about subunit-specific requirements for synaptic localization is to be made.

      Quantification of the data have been carried out (new Fig.5C,D). The results have been described before (R1, point 3) and support our earlier interpretation of the data (pp. 13-14).

      Lastly, even though it may be outside of the scope of such a study analysing other parts of the hippocampal area could provide additional important information. If one looks at the Allen Institute’s ISH of the beta subunit the strongest signal comes from the stratum oriens in the CA1 for example, suggesting that interneurons residing there would more likely have a higher expression of the glycine receptors. This could also be assessed by looking more carefully at the single cell transcriptomics, to see which cell types in the hippocampus show the highest mRNA levels. If the authors think that this is too much additional work, then perhaps a mention of this in the discussion would be good. 

      We have added the requested information from the ISH database of the Allen Institute in the discussion as suggested by the reviewer (p. 12). However, in combination with the transcriptomic data (Fig. S1) our finding strongly suggest that the expression of synaptic GlyRs depends on the availability of alpha subunits rather than on the presence of the GlyRb transcript. This is obvious when one compares the mRNA levels in the hippocampus with those in the basal ganglia (striatum) and medulla. While the transcript concentrations of GlyRb are elevated in all three regions and essentially the same, our data show that the GlyRb copy numbers at synapses differ over more than 2 orders of magnitude (Fig. 1B, Table 1). 

      - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Since the labeling and some imaging has been performed already, the requested experiment would be a matter of deploying a method of quantification. In principle, it should not require any additional wet-lab experiments, although it may require additional imaging of existing samples.

      - Are the data and the methods presented in such a way that they can be reproduced?

      Yes, for the most part.

      - Are the experiments adequately replicated and statistical analysis adequate?

      Yes

      Minor comments:

      - Specific experimental issues that are easily addressable.

      N/A

      - Are prior studies referenced appropriately?

      Yes

      - Are the text and figures clear and accurate?

      Yes, although quantification in figure 5 is currently not present.

      A quantification has been added (see R1, point 3).

      - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      This paper presents a method that could be used to localize receptors and perhaps other proteins that are in low abundance or for which a detailed quantification is necessary. I would therefore suggest that Figure S4 is included into Figure 2 as the first panel, showcasing the demixing, followed by the results. 

      We agree in principle with this suggestion. However, the revised Fig. S4 is more complex and we think that it would distract from the data shown in Fig. 2. Given that Fig. S4 is mostly methodological and not essential to understand the text, we have kept it in the supplement for the time being. We leave the final decision on this point to the editor.

      Reviewer #4-2 (Significance): 

      [This review was supplied later]

      - Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Using a novel and high resolution method, the authors have provided strong evidence for the presence of glycine receptors in the murine hippocampus and in the dorsal striatum. The number of receptors calculated is small compared to the numbers found in the ventral striatum. This is the first study to quantify receptor numbers in these region. In addition it also lays a roadmap for future studies addressing similar questions. 

      - Place the work in the context of the existing literature (provide references, where appropriate).

      This is done well by the authors in the curation of the literature. As stated above, the authors have filled a gap in the presence of glycine receptors in different brain regions, a subject of importance in understanding the role they play in brain activity and function. 

      - State what audience might be interested in and influenced by the reported findings.

      Neuroscientists working at the synaptic level, on inhibitory neurotransmission and on fundamental mechanisms of expression of genes at low levels and their relationship to the presence of the protein would be interested. Furthermore, researchers in neuroscience and cell biology may benefit from and be inspired by the approach used in this manuscript, to potentially apply it to address their own aims. 

      We thank the reviewer for the positive assessment of the technical and biological implications of our work, as well as the interest of our findings to a wide readership of neuroscientists and cell biologists. 

      - Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Synaptic transmission, inhibitory cells and GABAergic synapses functionally and structurally, cortex and cortical circuits. No strong expertise in super-resolution imaging methods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This very thorough anatomical study addresses the innervation of the Drosophila male reproductive tract. Two distinct glutamatergic neuron types were classified: serotonergic (SGNs) and octopaminergic (OGNs). By expansion microscopy, it was established that glutamate and serotonin /octopamine are co-released. The expression of different receptors for 5-HT and OA in muscles and epithelial cells of the innervation target organs was characterized. The pattern of neurotransmitter receptor expression in the target organs suggests that seminal fluid and sperm transport and emission are subjected to complex regulation. While silencing of abdominal SGNs leads to male infertility and prevents sperm from entering the ejaculatory duct, silencing of OGNs does not render males infertile. 

      Strengths: 

      The studied neurons were analysed with different transgenes and methods, as well as antibodies against neurotransmitter synthesis enzymes, building a consistent picture of their neurotransmitter identity. The careful anatomical description of innervation patterns together with receptor expression patterns of the target organs provides a solid basis for advancing the understanding of how seminal fluid and sperm transport and emission are subjected to complex regulation. The functional data showing that SGNs are required for male fertility and for the release of sperm from the seminal vesicle into the ejaculatory duct is convincing. 

      Weaknesses: 

      The functional analysis of the characterized neurons is not as comprehensive as the anatomical description, and phenotypic characterization was limited to simple fertility assays. It is understandable that a full functional dissection is beyond the scope of the present work. The paper contains experiments showing neuron-independent peristaltic waves in the reproductive tract muscles, which are thematically not very well integrated into the paper. Although very interesting, one wonders if these experiments would not fit better into a future work that also explores these peristaltic waves and their interrelation with neuromodulation mechanistically. 

      Reviewer #2 (Public review): 

      Summary: 

      Cheverra et al. present a comprehensive anatomical and functional analysis of the motor neurons innervating the male reproductive tract in Drosophila melanogaster, addressing a gap in our understanding of the peripheral circuits underlying ejaculation and male fertility. They identify two classes of multi-transmitter motor neurons-OGNs (octopamine/glutamate) and SGNs (serotonin/glutamate)-with distinct innervation patterns across reproductive organs. The authors further characterize the differential expression of glutamate, octopamine, and serotonin receptors in both epithelial and muscular tissues of these organs. Behavioral assays reveal that SGNs are essential for male fertility, whereas OGNs and glutamatergic transmission are dispensable. This work provides a high-resolution map linking neuromodulatory identity to organ-specific motor control, offering a valuable framework to explore the neural basis of male reproductive function. 

      Strengths: 

      Through the use of an extensive set of GAL4 drivers and antibodies, this work successfully and precisely defines the neurons that innervate the male reproductive tract, identifying the specific organs they target and the nature of the neurotransmitters they release. It also characterizes the expression patterns and localization of the corresponding neurotransmitter receptors across different tissues. The authors describe two distinct groups of dual-identity neurons innervating the male reproductive tract: OGNs, which co-express octopamine and glutamate, and SGNs, which co-express serotonin and glutamate. They further demonstrate that the various organs within the male reproductive system differentially express receptors for these neurotransmitters. Based on these findings, the authors propose that a single neuron capable of co-releasing a fast-acting neurotransmitter alongside a slower-acting one may more effectively synchronize and stagger events that require precise timing. This, together with the differential expression of ionotropic glutamate receptors and metabotropic aminergic receptors in postsynaptic muscle tissue, adds an additional layer of complexity to the coordinated regulation of fluid secretion, organ contractility, and directional sperm movement-all contributing to the optimization of male fertility. 

      Weaknesses: 

      The main weakness of the manuscript is the lack of detail in the presentation of the results. Specifically, all microscopy image figures are missing information about the number of samples (N), and in the case of colocalization experiments, quantitative analyses are not provided. Additionally, in the first behavioral section, it would be beneficial to complement the data table with figures similar to those presented later in the manuscript for consistency and clarity. 

      Wider context: 

      This study delivers the first detailed anatomical map connecting multi-transmitter motor neurons with specific male reproductive structures. It highlights a previously unrecognized functional specialization between serotonergic and octopaminergic pathways and lays the groundwork for exploring fundamental neural mechanisms that regulate ejaculation and fertility in males. The principles uncovered here may help explain how males of Drosophila and other organisms adjust reproductive behaviors in response to environmental changes. Furthermore, by shedding light on how multi-transmitter systems operate in reproductive control, this model could provide insights into therapeutic targets for conditions such as male infertility and prostate cancer, where similar neuronal populations are involved in humans. Ultimately, this genetically accessible system serves as a powerful tool for uncovering how multi-transmitter neurons orchestrate coordinated physiological actions necessary for the functioning of complex organs. 

      Reviewer #3 (Public review): 

      Summary: 

      This work provides an overview of the motor neuron landscape in the male reproductive system. Some work had been done to elucidate the circuits of ejaculation in the spine, as well as the cord, but this work fills a gap in knowledge at the level of the reproductive organs. Using complementary approaches, the authors show that there are two types of motor neurons that are mutually exclusive: neurons that co-express octopamine and glutamate and neurons that co-express serotonin and glutamate. They also show evidence that both types of neurons express large dense core vesicles, indicating that neuropeptides play a role in male fertility. This paper provides a thorough characterization of the expression of the different glutamate, octopamine, and serotonin receptors in the different organs and tissues of the male reproductive system. The differential expression in different tissues and organs allows building initial theories on the control of emission and expulsion. Additionally, the authors characterize the expression of synaptic proteins and the neuromuscular junction sites. On a mechanistic level, the authors show that neither octopamine/glutamate neuron transmission nor glutamate transmission in serotonin/glutamate neurons is required for male fertility. This final result is quite surprising and opens up many questions on how ejaculation is coordinated. 

      Strengths: 

      This work fills an important gap in the characterization of innervation of the male reproductive system by providing an extensive characterization of the motor neurons and the potential receptors of motor neuron release. The authors show convincing evidence of glutamate/monoamine co-release and of mutual exclusivity of serotonin/glutamate and octopamine/glutamate neurons. 

      Weaknesses: 

      (1) Often, it is mentioned that the expression is higher or lower or regional without quantification or an indication of the number of samples analysed. 

      (2) The experiment aimed at tracking sperm in the male reproductive system is difficult to interpret when it is not assessed whether ejaculation has occurred. 

      (3) The experiment looking at peristaltic waves in the male organs is missing labeling of the different regions and quantification of the observed waves. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) While the peripheral innervations are very carefully described, it is not clear to which SGNs and OGNs (i.e., cell bodies in the central nervous system) these innervations belong. Are SV, AG, and ED innervated by branches of one neuron or by separate neurons? Multi-color flip-out experiments could provide an answer to this. 

      We agree this is important and are planning these experiments for follow-up study.

      (2) In contrast, for the analysis of the VT19028 split line (Figure 9), only vnc and cell body images are shown. How do the arborisations of these split combinations look in the periphery? Are the same reproductive organs innervated as shown in Figure 2?

      Figure 9S3 was inadvertently omitted from the initial submission.  That figure is now included and shows that the VT019028 split broadly innervates the SV, AG, and ED.

      (3) In the discussion, I think it would be helpful to offer some potential explanations for the role of octopaminergic and glutamatergic signaling. If not required for basic fertility, they probably have some other role.

      Thank you, we have included speculation in the Discussion section "Potential for adaptation to environment".

      (4) Line 543: Figure 8S4 E, (not 8E). 

      Correction made.

      Reviewer #2 (Recommendations for the authors): 

      (1) Line 213-217 

      Comment:

      The use of "significantly less expression" may be misleading, as no quantification or statistical analysis is provided to support this comparison. 

      Suggestion:

      Consider using a more neutral term, such as "markedly less" or "noticeably less," unless quantitative data and statistical analysis are included to substantiate the claim.

      Good recommendation.This suggestion has been incorporated.

      (2) Line 264-267 

      Comment:

      The observation regarding the distinct morphology of SGNs and OGNs is interesting and could strengthen the argument regarding functional differences. 

      Suggestion: 

      Consider including a quantification of morphological complexity (e.g., branching) to support the claim. A method such as Sholl analysis (Sholl, 1953), as adapted in Fernández et al., 2008, could be applied. 

      This is a good suggestion, and we will consider it as part of a follow-up study.

      (3) Line 269-271 

      Comment:

      The anatomical context of the observation is not explicitly stated. 

      Suggestion:

      Add "in the ED" for clarity: "With the TRH-GAL4 experiment in the ED, vGlut-40XMYC (Figure 5S1, A and E) and 6XV5-vMAT (Figure 5S1, B and F) were both present with a highly overlapping distribution (Figure 5S1, I)." 

      Suggestion has been incorporated.

      (4) Line 275-276 

      Comment:

      The claim about the reduced ability to distinguish SGNs and OGNs in the ED would benefit from quantitative support. 

      Suggestion:

      Include a morphological comparison or quantification between SGNs and OGNs in the ED and SV to reinforce this point.

      Certain information on morphological comparison can be inferred within the images themselves, and we will include quantitation in a follow-up study.

      (5) Line 277-279 

      Comment:

      As with line 269, the anatomical site could be specified more clearly. 

      Suggestion: 

      Rephrase as: "With the Tdc2-GAL4 experiment in the ED, vGlut-40XMYC (Figure 5S1, M and Q) and 6XV5-vMAT (Figure 5S1, N and R) were both observed in a highly overlapping distribution (Figure 5S1, U)." 

      Suggestion has been incorporated.

      (6) Line 348-350 

      Comment:

      The phrase "significantly higher density" implies a statistical comparison that is not shown. 

      Suggestion:

      If no quantification is provided, replace with a qualitative term such as "visibly higher" or "notably more dense." Alternatively, add a quantitative analysis with statistical testing to justify the use of "significantly." 

      Suggestion has been incorporated.

      (7) Lines 415-458 (Section comment) 

      Comment:

      There appears to be differential localization of neurotransmitter receptor expression (glutamate in muscle vs. 5-HT in epithelium or neurons), which could have functional implications. 

      Suggestion:

      Expand this section to briefly discuss the differential localization patterns of these receptors and potential implications for signal transduction in male reproductive tissues. 

      (8) Lines 638-682 (Section comment) 

      Comment:

      The table summarizing fertility phenotypes would be more informative with additional detail on experimental outcomes. 

      Suggestion:

      Add a column showing the number of fertile males over the total tested (e.g., "n fertile / n total"). Also, clarify whether the fertility assays are identical to those reported in Figure 10S2, and whether similar analyses were conducted for females. Consider including a figure summarizing fertility results for all genotypes listed in the table, similar to Figure 10S2. 

      The fertility tests reported in Table 1 were separate from those reported in Figure 10S2.  For these tests, the results were clear-cut with 100% of males and females reported as infertile exhibiting the infertile phenotype.  For the males and females reported as fertile, it was also clear-cut with nearly 100% showing fertility at a high level.  In subsequent figures we attempted to assess degrees of fertility.

      (9) Line 724-727 

      Comment:

      There seems to be a mistake in the identification of the driver lines used to silence OA neurons. Also, figure references might be incorrect. 

      Suggestion:

      The OA neuron driver line should be corrected to "Tdc2-GAL4-DBD ∩ AbdB-AD" instead of TRH-GAL4. Additionally, the figure references should be verified; specifically, the letter "B" (in "Figure 10B, D" and "10B, E") appears to be unnecessary or misplaced.

      Thanks for catching this, the corrections have been made.

      (10) Line 872-877 

      Comment:

      The discussion on the co-release of fast-acting glutamate and slower aminergic neurotransmitters is interesting and well-articulated. However, it remains somewhat disconnected from the behavioral findings. 

      Suggestion:

      Consider linking this proposed mechanism to the results observed in the mating duration assays. For instance, the sequential action of neurotransmitters described here could potentially underlie the prolonged mating observed when specific neuromodulators are active, helping to functionally integrate molecular and behavioral data. 

      (11) Line 926-928 

      Comment:

      The interpretation of 5-HT7 receptor expression in the sphincter is compelling, suggesting a role in regulating its function. However, this anatomical observation could be further contextualized with the functional data. 

      Suggestion:

      It may strengthen the interpretation to explicitly connect this finding with the fertility assays, where SGNs - presumably acting via serotonergic signaling - are shown to be necessary for male fertility. This would support a functional role for 5-HT7 in reproductive success via sphincter regulation.

      This has been added. 

      (12) Figure 1 

      Comment:

      The figure legend is generally clear, but could benefit from more consistency and precision in the color-coded labeling. Additionally, the naming of some structures could be more explicit. 

      Suggestion: 

      Revise the figure and the legend as follows:

      Figure 1. The Drosophila male reproductive system. A) Schematic diagram showing paired testes (colour), SVs (green), AGs (purple), Sph (red), ED (gray), and EB (colour). B) Actual male reproductive system. Te - testes, SV - seminal vesicle, AG - accessory gland, Sph - singular sphincter, ED - ejaculatory duct, EB - ejaculatory bulb. Scale bar: 200 µm.

      This suggestion has been incorporated.

      (13) Figure 3S2 

      Comment:

      There appears to be a typographical error in the description of the genotypes, which may lead to confusion. 

      Suggestion:

      Correct the legend to reflect the appropriate genotypes:

      Figure 3S2. Expression of vGlut-LexA and Tdc2-GAL4 in the Drosophila male reproductive system. A, D, G, J, M, P) vGlut-LexA, LexAop-6XmCherry; B, E, H, K, N, Q) Tdc2-GAL4, UAS-6XGFP; C, F, I, L, O, R) Overlay. Scale bars: O - 50 µm; R - 10 µm.

      The corrections have been made.

      (14) Figure 3S3

      Comment:

      The genotypes for panels D and E appear to be incomplete; the DBD component of the split-GAL4 drivers is missing. 

      Suggestion:

      Update the figure legend to: 

      Figure 3S3. Fruitless and Doublesex expression in the Drosophila male reproductive system. A) fru-GAL4, UAS-6XGFP; B) vGlut-LexA, LexAop-6XmCherry; C) Overlay; D) Tdc2-AD ∩ dsx-GAL4-DBD; E) TRH-AD ∩ dsx-GAL4-DBD. Scale bar: 200 µm.

      The corrections have been made.

      (15) Figure 4S4 

      Comment: 

      There is a repeated segment in the figure legend, which makes it unclear and redundant. 

      Suggestion:

      Edit the legend to remove the duplicated lines: 

      Figure 4S4. Expression of vGlut, TβH-GFP, and 5-HT at the junction of the SV and AGs with the ED of the Drosophila male reproductive system. A) vGlut-40XV5; B) TβH-GFP; C) 5-HT; D) vGlut-40XV5, TβH-GFP overlay; E) vGlut-40XV5, 5-HT overlay; F) TβH-GFP, 5-HT overlay. Scale bar: 50 µm.

      The correction has been made.

      (16) Figure 6S5 

      Comment:

      Within this figure, the orientation and/or scale of the tissue varies noticeably between individual panels, making it difficult to directly compare the different experimental conditions. 

      Suggestion:

      For improved clarity and interpretability, consider standardizing the orientation and size of the tissue shown across all panels within the figure. Consistent presentation will facilitate direct comparisons between treatments or genotypes. 

      There is often variation in the size of the male reproductive organs. They were all acquired at the same magnification. The only point of this figure is there is no vGAT or vAChT at these NMJs and the result is unambiguously negative. 

      (17) Figure 10 

      Comment:

      Panel A appears redundant, as it shows the same information as the other panels but without indicating statistical significance. 

      Suggestion:

      Consider removing panel A and keeping only the remaining four graphs, which include relevant statistical comparisons and clearly show significant differences.

      We realize there is some redundancy of panel A with the other panels, but we feel there is value in having all the genotypes in a single panel for comparison.

      Reviewer #3 (Recommendations for the authors): 

      Here are some suggestions to improve the manuscript: 

      (1) Prot B GFP experiment: the authors should explain better the time chosen to look at the sperm content of the male reproductive system. At 10 minutes, it is expected that the male has already ejaculated, and therefore, a failure to ejaculate would result in more sperm in the reproductive system, not less. Since we are not certain when the male ejaculates, it would be important to do the analysis at different time points.

      In the Prot-GFP experiments, the 10-minute time point was chosen because we nearly always observe sperm in the ejaculatory duct of control males.  In the experimental males, we never observed sperm in the ejaculatory duct at this time point.  Also, no Prot-GFP sperm were observed in the reproductive tract of females mated to experimental males even when mating was allowed to go to completion, while abundant sperm were found in females mated to Prot-GFP controls.  Figure 10S1 has been updated to include Images of these female reproductive systems.  The results showing the absence of Prot-GFP sperm in the female reproductive tract mated to experimental males indicates sperm transfer in these males isn't occurring earlier during the copulation process than in control males and that we didn't miss it by only examining at the ejaculatory duct.

      (2) Discuss what may be the role of the octopamine/glutamate neurons and glutamate transmission in serotonin/glutamate neurons in the male reproductive system, given that they are not required for fertility (at least under the context in which it was tested). It is quite a striking result that deserves some attention. 

      We agree it is a surprising result and have included speculation on the role of glutamate and octopamine in male reproduction in the Discussion section "Potential for adaptation to environment".

      (3) Very important: 

      (a) Figure 3 is present in the Word document but not the PDF. 

      (b) Figure 9S3 is not present 

      (c) In Figure 5 X), the legend does not correspond to the panel.

      All of these corrections have been made. 

      (4) Other suggestions:

      (a) A summary schematic (or several) of the findings would make it an easier read.

      (b) Explain why the ejaculatory bulb was left out of the analysis.

      (c) Explain in the main text some of the tools, such as, BONT-C and the conditional vGlut mutation.

    1. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Arnould et. al. develop an unbiased, affinity-guided reagent to label P2X7 receptor and use super-resolution imaging to monitor P2X7 redistribution in response to inflammatory signaling.

      Strengths:

      I think the X7-uP probe that they developed is very useful for visualizing localization of P2X7 receptor. They convincingly show that under inflammatory conditions, there is a reorganization of P2X7 localization into receptor clusters. Moreover, I think they have shown a very clever way to specifically label any receptor of interest. This has broad appeal.

      I think the authors have done a very nice job addressing my original concerns. Here are those original concerns and my new comments related to how the authors address them.

      (1) While the authors state that chemical modification of AZ10606120 to produce the X7-UP reagent has "minimal impact" on the inhibition of P2X7, we can see from Figure 2A and 2B that it does not antagonize P2X7 as effectively as the original antagonist. For the sake of completeness and quantitation, I think it would be great if the authors could determine the IC50 for X7-uP and compare it to the IC50 of AZ10606120.

      The authors now show the relative inhibition of X7-uP compared to AZ10606120 at different concentrations. This provides a nice comparison to give the reader an idea of how effectively X7-uP inhibits P2X7 receptor. This is great.

      (2) Do the authors know whether modification of the lysines with biotin affects the receptor's affinity for ATP (or ability to be activated by ATP)? What about P2X7 that has been modified with biotin and then labeled with Alexa 647? For the sake of completeness and quantitation, I think it would be great if the authors could determine the EC50 of biotinylated P2X7 for ATP as well as biotinylated and then Alexa 647 labeled P2X7 for ATP and compare these values to the affinity of unmodified WT P2X7 for ATP.

      I agree with the authors that assessing the functional integrity of P2X7 following biotinylation and fluorophore labeling is outside the scope of this paper but would be important for studies involving dynamic or post-labeling functional analyses such as live trafficking.

      (3) It is a little misleading to color the fluorescence signal from mScarlet green (for example, in Figure 3 and Figure 4). The fluorescence is not at the same wavelength as GFP. In fact, the wavelength (570 nm - 610 nm) for emission is closer to orange/red than to green. I think this color should be changed to differentiate the signal of mScarlet from the GFP signal used for each of the other P2X receptor subtypes.

      The authors have now changed the mScarlet color to orange, which solves my concern.

      (4) It is my understanding that P2X6 does not form homotrimers. Thus, I was a little surprised to see that the density and distribution of P2X6-GFP in Figure 3 looks very similar to the density and distribution of the other P2X subtypes. Do the authors have an explanation for this? Are they looking at P2X6 protomers inserted into the plasma membrane? Does the cell line have endogenous P2X receptor subtypes? Is Figure 3 showing heterotrimers with P2X6 receptor? A little explanation might be helpful.

      The authors address this point very well and include nice data to show that P2X6 does not insert into the plasma membrane as a homotrimer.

      (5) It is easy to overlook the fact that the antagonist leaves the binding pocket once the biotin has been attached to the lysines. It might be helpful if the authors made this a little more apparent in Figure 1 or in the text describing the NASA chemistry reaction.

      The authors have modified Figure 1 to make it easier to understand the NASA chemistry reaction.

      I congratulate the authors on an outstanding paper!

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      In this paper, the authors developed a chemical labeling reagent for P2X7 receptors, called X7-uP. This labeling reagent selectively labels endogenous P2X7 receptors with biotin based on ligand-directed NASA chemistry (Ref. 41). After labeling the endogenous P2X7 receptor with biotin, the receptor can be fluorescently labeled with streptavidin-AlexaFluor647. The authors carefully examined the binding properties and labeling selectivity of X7-uP to P2X7, characterized the labeling site of P2X7 receptors, and demonstrated fluorescence imaging of P2X7 receptors. The data obtained by SDS-PAGE, Western blot, and fluorescence microscopy clearly show that X7-uP labels the P2X7 receptor. Finally, the authors fluorescently labeled the endogenous P2X7 in BV2 cells, which are a murine microglia model, and used dSTORM to reveal a nanoscale P2X7 redistribution mechanism under inflammatory conditions at high resolution. 

      Strengths: 

      X7-uP selectively labels endogenous P2X7 receptors with biotin. Streptavidin-AlexaFluor647 binds to the biotin labeled to the P2X7 receptor, allowing visualization of endogenous P2X7 receptors. 

      We thank the reviewer for their positive comment.

      Weaknesses: 

      Weaknesses & Comments 

      (1) The P2X7 receptor exists in a trimeric form. If it is not a monomer under the conditions of the pull-down assay in Figure 2C, the quantitative values may not be accurate. 

      We thank the reviewer for this comment. As shown in Figure 2C, the band observed on the denaturing SDS-PAGE corresponds to the monomeric form of the P2X7 receptor. While we cannot exclude the presence of non-monomeric species under native conditions, no such higher-order forms are visible in the gel. This observation supports the conclusion that the quantitative values presented are based on the monomeric form and are therefore reliable.

      (2) In Figure 3, GFP fluorescence was observed in the cell. Are all types of P2X receptors really expressed on the cell surface ? 

      We thank the reviewer for this excellent comment, which was also raised by reviewer 2. To address this concern, we performed a commercial cell-surface protein biotinylation assay to assess whether GFP-tagged P2X receptors reach the plasma membrane. As expected, all P2X subtypes except P2X6 were detected at the cell surface in HEK293T cells, thereby validating our confocal fluorescence microscopy assay. These new data are now included in Figure 3 — figure supplement 1.

      (3) The reviewer was not convinced of the advantages of the approach taken in this paper, because the endogenous receptor labeling in this study could also be done using conventional antibody-based labeling methods. 

      We thank the reviewer for raising this important point and would like to highlight several advantages of our approach compared to conventional antibody-based labeling.

      First, commercially available P2X7 antibodies often suffer from poor specificity and are generally not suitable for reliably detecting endogenous P2X7 receptors, as documented in previous studies (e.g., PMID: 16564580 and PMID: 15254086). While recent advances have been made using nanobodies with improved specificity for P2X7 (e.g., PMID: 30074479 and PMID: 38953020), our strategy is distinct and complementary to nanobody-based approaches.

      Second, antibodies rely on non-covalent interactions with the receptor, which can result in dissociation over time. In contrast, our X7-uP probe covalently biotinylates lysine residues on the P2X7 receptor through stable amide bond formation. This covalent labeling ensures that the biotin moiety remains permanently attached, an advantage not afforded by reversible binding strategies.

      Third, by selectively biotinylating P2X7 receptors, our method provides a versatile platform for the chemical attachment of a wide range of probes or functional moieties. Although we did not demonstrate this application in the current study, we believe this modularity represents an additional advantage of our approach.

      We have now revised the discussion to highlight these key advantages, allowing the reader to form their own opinion. We hope this addresses the reviewer’s concerns and clarifies the benefits of our approach.

      (4) Although P2X7 was successfully labeled in this paper, it is not new as a chemistry. There is a need for more attractive functional evaluation such as live trafficking analysis of endogenous P2X7. 

      We agree with the reviewer that the underlying chemistry is not novel per se. However, to our knowledge, it has not previously been applied to the P2X7 receptor, and thus constitutes a novel application with specific relevance for studying native P2X7 biology.

      We also appreciate the reviewer’s suggestion regarding live trafficking analysis of endogenous P2X7. While this is indeed a valuable and interesting direction, we believe it lies beyond the scope of the present study, as it would first require demonstrating that the labeling itself does not affect P2X7 function (see below). This important step would necessitate additional experiments, which we consider more appropriate for a follow-up investigation.

      (5) The reviewer has concerns that the use of the large-size streptavidin to label the P2X7 receptor may perturbate the dynamics of the receptor. 

      We thank the reviewer for raising this important point. Although we did not directly measure receptor dynamics, it is indeed possible that tetrameric streptavidin (tStrept-A 647) could promote P2X7 clustering by cross-linking nearby receptors due to its tetravalency (see also point 7 raised by the reviewer). To address this concern, we performed additional dSTORM experiments using a monomeric form of streptavidin-Alexa 647 (mSA) (see PMID: 26979420). Owing to its reduced size and lack of tetravalency, mSA has been shown to minimize artificial crosslinking of synaptic receptors (PMID: 26979420). A drawback of using mSA, however, is that the monomeric form carries only two fluorophores (estimated degree of labeling, DOL ≈ 2, PMID: 26979420), whereas the tetrameric form, according to the manufacturer’s certificate of analysis (Invitrogen S21374), has an average DOL of three fluorophores per monomer, resulting in a total of ~12 fluorophores per streptavidin.

      We tested three conditions with mSA incubation: (i) control BV2 cells (without X7-uP), (ii) untreated X7-uP-labeled BV2 cells, and (iii) X7-uP-labeled BV2 cells treated with LPS and ATP (using the same concentrations and incubation times described in the manuscript). As shown in Author response image 1, only LPS+ATP treatment induced a clear increase in the mean cluster density compared to quiescent (untreated) BV2 cells. This effect closely matches the results obtained with tStrept-A 647, supporting the conclusion the tetrameric streptavidin does not artificially promote P2X7 clustering. It is also possible that the cellular environment of BV2 microglia differs from the confined architecture of synapses, which may further explain why cross-linking effects are less pronounced in our system.

      As expected, the overall fluorescence signal with mSA was about tenfold lower than with tStrept-A 647, consistent with the expected fluorophore stoichiometry. This lower signal may explain why the values for the untreated condition appeared slightly higher than for the control, although the difference was not statistically significant (P = 0.1455).

      We hope these additional experiments adequately address the reviewer’s concerns.

      Author response image 1.

      BV2 labeling with monomeric streptavidin–Alexa 647 (mSA).(A) Bright-field and dSTORM images of BV2 cells labeled with mSA in the presence (untreated and LPS+ATP) or absence (control) of 1 µM X7-uP. Treatment: LPS (1 µg/mL for 24 hours) and ATP (1 mM for 30 minutes). Scale bars, 10 µm. Insets: Magnified dSTORM images. Scale bars, 1 µm.(B) Quantification of the number of localizations (n = 2 independent experiments). Bars represent mean ± s.e.m. One-way ANOVA with Tukey’s multiple comparisons (P values are indicated above the graph).

      (6) It is better to directly label Alexa647 to the P2X7 receptor to avoid functional perturbation of P2X7. 

      Directly labeling of Alexa647 to the P2X7 receptor would require the design and synthesis of a novel probe, which is currently not available. Implementing such a strategy would involve substantial new experimental work that lies beyond the scope of the present study.

      (7) In all imaging experiments, the addition of streptavidin, which acts as a cross-linking agent, may induce P2X7 receptor clustering. This concern would be dispelled if the receptors were labeled with a fluorescent dye instead of biotin and observed. 

      We refer the reviewer to our response in point 5, where we addressed this concern by comparing tetrameric and monomeric streptavidin conjugates. As noted above (see also point 6), directly labeling the receptor with a fluorescent dye would require the development of a new probe, which is outside the scope of the present study.

      (8) There are several mentions of microglia in this paper, even though they are not used. This can lead to misunderstanding for the reader. The author conducted functional analysis of the P2X7 receptor in BV-2 cells, which are a model cell line but not microglia themselves. The text should be reviewed again and corrected to remove the misleading parts that could lead to misunderstanding. e.g. P8. lines 361-364

      First, it combines N-cyanomethyl NASA chemistry with the high-affinity AZ10606120 ligand, enabling rapid labeling in microglia (within 10 min)

      P8. lines 372-373 

      Our results not only confirm P2X7 expression in microglia, as previously reported (6, 26-33), but also reveal its nanoscale localization at the cell surface using dSTORM. 

      We agree with the reviewer’s comment. We have now modified the text, including the title.

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, Arnould et. al. develop an unbiased, affinity-guided reagent to label P2X7 receptor and use super-resolution imaging to monitor P2X7 redistribution in response to inflammatory signaling. 

      Strengths: 

      I think the X7-uP probe that they developed is very useful for visualizing localization of P2X7 receptor. They convincingly show that under inflammatory conditions, there is a reorganization of P2X7 localization into receptor clusters. Moreover, I think they have shown a very clever way to specifically label any receptor of interest. This has broad appeal 

      We thank the reviewer for their positive comment.

      Weaknesses: 

      Overall, the manuscript is novel and interesting. However, I do have some suggestions for improvement. 

      (1) While the authors state that chemical modification of AZ10606120 to produce the X7-UP reagent has "minimal impact" on the inhibition of P2X7, we can see from Figure 2A and 2B that it does not antagonize P2X7 as effectively as the original antagonist. For the sake of completeness and quantitation, I think it would be great if the authors could determine the IC50 for X7-uP and compare it to the IC50 of AZ10606120. 

      We thank the reviewer for this insightful comment. Unfortunately, due to the limited availability of X7-uP, we were not able to establish a complete concentration–response curve to determine its IC<sub>50</sub>, which would require testing at concentrations >1 µM. Nevertheless, to estimate the effect of the modification, we assessed current inhibition at 300 µM X7-uP and compared it with the reported IC<sub>50</sub> of AZ10606120 (10 nM). Under these conditions, both compounds produced a similar level of inhibition, indicating that while the chemical modification reduces potency relative to AZ10606120, X7-uP still functions as an effective probe for P2X7. We have now included these data in Figure 2 and revised the text accordingly.

      (2) Do the authors know whether modification of the lysines with biotin affects the receptor's affinity for ATP (or ability to be activated by ATP)? What about P2X7 that has been modified with biotin and then labeled with Alexa 647? For the sake of completeness and quantitation, I think it would be great if the authors could determine the EC50 of biotinylated P2X7 for ATP as well as biotinylated and then Alexa 647 labeled P2X7 for ATP and compare these values to the affinity of unmodified WT P2X7 for ATP.

      We thank the reviewer for raising this important point. At present, we have not determined whether modification of lysine residues with biotin, or subsequent labeling with Alexa647, affects the ATP sensitivity or functional properties of P2X7. However, we believe this does not impact the conclusions of the current study, as all functional assays were conducted prior to X7-uP labeling. The labeling is used here as a terminal "snapshot" to visualize the endogenous receptor without interfering with the functional characterization.

      We fully agree that assessing the functional integrity of P2X7 following biotinylation and fluorophore labeling—such as by determining the EC<sub>50</sub> for ATP—would be essential for studies involving dynamic or post-labeling functional analyses, such as live trafficking. However, as noted earlier in our response to Reviewer 1 (point 4), these experiments lie beyond the scope of the current study.

      (3) It is a little misleading to color the fluorescence signal from mScarlet green (for example, in Figure 3 and Figure 4). The fluorescence is not at the same wavelength as GFP. In fact, the wavelength (570 nm - 610 nm) for emission is closer to orange/red than to green. I think this color should be changed to differentiate the signal of mScarlet from the GFP signal used for each of the other P2X receptor subtypes. 

      As suggested, we changed the mScarlet color to orange for all relevant figures.

      (4) It is my understanding that P2X6 does not form homotrimers. Thus, I was a little surprised to see that the density and distribution of P2X6-GFP in Figure 3 looks very similar to the density and distribution of the other P2X subtypes. Do the authors have an explanation for this? Are they looking at P2X6 protomers inserted into the plasma membrane? Does the cell line have endogenous P2X receptor subtypes? Is Figure 3 showing heterotrimers with P2X6 receptor? A little explanation might be helpful.

      We thank the reviewer for raising this important point. Indeed, it is well established that P2X6 does not form functional channels, which supports the conclusion that it does not form homotrimeric complexes. Although previous studies have shown that P2X6–GFP expression is generally lower, more diffuse, and not efficiently targeted to the cell surface compared with other P2X subtypes (see PMID: 12077178), the similar fluorescence distribution and density observed in our Figure 3 do not imply that P2X6 forms homotrimers.

      We did not directly assess the presence of endogenous P2X6 in our HEK293T cells; however, according to the Human Protein Atlas, there is no detectable P2X6 RNA expression in HEK293 cells (nTPM = 0), indicating that endogenous P2X6 is not expressed in this cell line. To further investigate surface expression (see also point 2 of reviewer 1), we performed a commercial cell-surface protein biotinylation assay to assess whether GFP-tagged P2X6 reaches the plasma membrane. As expected, P2X6 was not detected at the cell surface in HEK293T cells, whereas GFP-tagged P2X1 to P2X5 were readily detected. These results further support the conclusion that P2X6 does not insert into the plasma membrane as a homotrimer, thereby validating our confocal fluorescence microscopy assay. These new data are now included in Figure 3 — figure supplement 1.

      (5) It is easy to overlook the fact that the antagonist leaves the binding pocket once the biotin has been attached to the lysines. It might be helpful if the authors made this a little more apparent in Figure 1 or in the text describing the NASA chemistry reaction.

      We thank the reviewer for this insightful suggestion. To address this, we have modified Figure 1A and updated the legend.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript describes the development of a covalent labeling probe (X7-uP) that selectively targets and tags native P2X7 receptors at the plasma membrane of BV2 microglial cells. Using super-resolution imaging (dSTORM), the authors demonstrate that P2X7 receptors form nanoscale clusters upon microglial activation by lipopolysaccharide (LPS) and ATP, correlating with synergistic IL-1β release. These findings advance understanding of P2X7 reorganization during inflammation and provide a generalizable labeling strategy for monitoring endogenous P2X7 in immune cells. 

      Strengths: 

      (1) The authors designed X7-uP by coupling a high-affinity, P2X7-specific antagonist (AZ10606120) with N-cyanomethyl NASA chemistry to achieve site-directed biotinylation. This approach offers high specificity, minimal off-target reactivity, and a straightforward pull-down/imaging readout. 

      (2) The results connect P2X7's nanoscale clustering directly with IL-1β secretion in microglia, reinforcing the role of P2X7 in inflammation. By localizing endogenous P2X7 at single-molecule resolution, the authors reveal how LPS priming and ATP stimulation synergistically reorganize the receptor. 

      (3) The authors systematically validate their method in recombinant systems (HEK293 cells) and in BV2 cells, showing selective inhibition, mutational confirmation of the binding site, and Western blot pulldown experiments.

      We thank the reviewer for their positive comment.

      Weaknesses: 

      (1) While the data strongly indicate that P2X7 clustering contributes to IL-1β release, the manuscript would benefit from additional experiments (if feasible) or discussion on how receptor clustering interfaces with downstream inflammasome assembly. Clarification of whether the P2X7 clusters physically colocalize with known inflammasome proteins would solidify the mechanism. 

      We thank the reviewer for this valuable suggestion. Determining the physical colocalization of P2X7 clusters with known inflammasome components would provide important insight into the molecular partners involved in inflammasome activation. However, we believe that such an investigation would constitute a substantial study on its own and therefore lies beyond the scope of the present work.

      Nevertheless, in response to the reviewer’s suggestion, we have added a short paragraph at the end of the Discussion section addressing potential mechanisms by which P2X7 clustering may contribute to downstream inflammasome activation. We also revised the text to tone down the hypothesis of physical colocalization.

      (2) The authors might expand on the scope of X7-uP in other native cells that endogenously express P2X7 (e.g., macrophages, dendritic cells). Although they mention the possibility, demonstrating the probe's applicability in at least one other primary immune cell type would strengthen its general utility. 

      We thank the reviewer for this valuable suggestion. Again, we believe that such an investigation would constitute a substantial study on its own and therefore lies beyond the scope of the present work.

      (3) The authors do include appropriate negative controls, yet providing additional details (e.g., average single-molecule on-time or blinking characteristics) in supplementary materials could help readers assess cluster calculations. 

      As suggested, we have included additional data showing single-molecule blinking events in untreated and LPS+ATP-treated BV2 cells, along with the corresponding movies. The data are now presented in Figure 5—supplement figure 3A and B and Figure 5—Videos 1 and 2.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      (1) On line 96, the authors refer to the "ballast" domain of P2X7 receptor but do not cite the original article from which this nomenclature originated (McCarthy et al., 2019, Cell). This article should be cited to give appropriate credit. 

      Done.

      (2) On line 602, the authors state that they use models from PDB 1MK5 and 6U9W to generate the cartoons in Figure 6. The manuscripts from which these PDB files were generated need to be appropriately cited. 

      Done.

      (3) On line 319, the authors say "300 mM BzATP" but I think they mean 300 uM.

      Done. Thank you for catching the typo.

      Reviewer #3 (Recommendations for the authors): 

      Overall, excellent data quality. The paper would benefit from a discussion of the physiological implications of clustering. It would also be helpful to elaborate about the potential mechanisms for clustering: diffusion and/or insertion. Finally, the authors should comment on work by Mackinnon's (PMID: 39739811) and Santana lab (PMID: 31371391) on two distinct models for clustering of proteins. 

      As suggested by the reviewer, we have revised the discussion to incorporate their comments. First, we have added the following text:

      “Upon BV2 activation, we observed significant nanoscale reorganization of P2X7. Both LPS and ATP (or BzATP) trigger P2X7 upregulation and clustering, increasing the overall number of surface receptors and the number of receptors per cluster, from one to three (Figure 6). By labeling BV2 cells with X7-uP shortly after IL-1b release, we were able to correlate the nanoscale distribution of P2X7 with the functional state of BV2 cells, consistent with the two-signal, synergistic model for IL-1b secretion observed in microglia and other cell types (Ferrari et al, 1996; Perregaux et al, 2000; Ferrari et al, 2006; Di Virgilio et al, 2017; He et al, 2017; Swanson et al, 2019). In this model, LPS priming leads to intracellular accumulation of pro-IL-1b, while ATP stimulation activates P2X7, triggering NLRP3 inflammasome activation and the subsequent release of mature IL-1b.

      What is the mechanism underlying P2X7 upregulation that leads to an overall increase in surface receptors—does it result from the lateral diffusion of previously masked receptors already present at the plasma membrane, or from the insertion of newly synthesized receptors from intracellular pools in response to LPS and ATP? Although our current data do not distinguish between these possibilities, a recent study suggests that the a1 subunit of the Na<sup>+</sup>/K</sup>+</sup>-ATPase (NKAa1) forms a complex with P2X7 in microglia, including BV2 cells, and that LPS+ATP induces NKAa1 internalization (Huang et al, 2024). This internalization appears to release P2X7 from NKAa1, allowing P2X7 to exist in its free form. We speculate that the internalization of NKAa1 induced by both LPS and ATP exposes previously masked P2X7 sites, including the allosteric AZ10606120 sites, thus making them accessible for X7-uP labeling.”

      Second, we have added a short paragraph at the end of the Discussion section addressing potential mechanisms by which P2X7 clustering may contribute to downstream inflammasome activation:

      “What mechanisms underlie P2X7 clustering in response to inflammatory signals? Several models have been proposed to explain membrane protein clustering, including recruitment to structural scaffolds (Feng & Zhang, 2009), partitioning into membrane domains enriched in specific chemical components such as lipid rafts (Simons & Ikonen, 1997), and self-assembly mechanisms (Sieber et al, 2007). These self-assembly mechanisms include an irreversible stochastic model (Sato et al, 2019) and a more recent reversible self-oligomerization model which gives rise to higher-order transient structures (HOTS) (Zhang et al, 2025). Supported by cryogenic optical localization microscopy with very high resolution (~5 nm), the HOTS model has been observed in various membrane proteins, including ion channels and receptors (Zhang et al, 2025). Furthermore, HOTS are suggested to be dynamically modulated and to play a functional role in cell signaling, potentially influencing both physiological and pathological processes (Zhang & MacKinnon, 2025). While this hypothesis is compelling, our current dSTORM data lack sufficient spatial resolution to confirm whether P2X7 trimers form HOTS via self-oligomerization. Further biophysical and ultra-high-resolution imaging studies are required to test this model in the context of P2X7 clustering.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Pournejati et al investigates how BK (big potassium) channels and CaV1.3 (a subtype of voltage-gated calcium channels) become functionally coupled by exploring whether their ensembles form early-during synthesis and intracellular trafficking-rather than only after insertion into the plasma membrane. To this end, the authors use the PLA technique to assess the formation of ion channel associations in the different compartments (ER, Golgi or PM), single-molecule RNA in situ hybridization (RNAscope), and super-resolution microscopy.

      Strengths:

      The manuscript is well written and addresses an interesting question, combining a range of imaging techniques. The findings are generally well-presented and offer important insights into the spatial organization of ion channel complexes, both in heterologous and endogenous systems.

      Weaknesses:

      The authors have improved their manuscript after revisions, and some previous concerns have been addressed.

      Still, the main concern about this work is that the current experiments do not quantitatively or mechanistically link the ensembles observed intracellularly (in the endoplasmic reticulum (ER) or Golgi) to those found at the plasma membrane (PM). As a result, it is difficult to fully integrate the findings into a coherent model of trafficking. Specifically, the manuscript does not address what proportion of ensembles detected at the PM originated in the ER. Without data on the turnover or halflife of these ensembles at the PM, it remains unclear how many persist through trafficking versus forming de novo at the membrane. The authors report the percentage of PLApositive ensembles localized to various compartments, but this only reflects the distribution of pre-formed ensembles. What remains unknown is the proportion of total BK and Ca<sub>V</sub>1.3 channels (not just those in ensembles) that are engaged in these complexes within each compartment. Without this, it is difficult to determine whether ensembles form in the ER and are then trafficked to the PM, or if independent ensemble formation also occurs at the membrane. To support the model of intracellular assembly followed by coordinated trafficking, it would be important to quantify the fraction of the total channel population that exists as ensembles in each compartment. A comparable ensemble-to-total ratio across ER and PM would strengthen the argument for directed trafficking of pre-assembled channel complexes.

      We appreciate the reviewer’s thoughtful comment and agree that quantitatively linking intracellular hetero-clusters to those at the plasma membrane is an important and unresolved question. Our current study does not determine what proportion of ensembles at the plasma membrane originated during trafficking. It also does not quantify the fraction of total BK and Ca<sub>V</sub>1.3 channels engaged in these complexes within each compartment. Addressing this requires simultaneous measurement of multiple parameters—total BK channels, total Ca<sub>V</sub>1.3 channels, hetero-cluster formation (via PLA), and compartment identity—in the same cell. This is technically challenging. The antibodies used for channel detection are also required for the proximity ligation assay, which makes these measurements incompatible within a single experiment.

      To overcome these limitations, we are developing new genetically encoded tools to enable real-time tracking of BK and Ca<sub>V</sub>1.3 dynamics in live cells. These approaches will enable us to monitor channel trafficking and the formation of hetero-clusters, as detected by colocalization. This kind of experiments will provide insight into their origin and turnover. While these experiments are beyond the scope of the current study, the findings in our current manuscript provide the first direct evidence that BK and CaV channels can form hetero-clusters intracellularly prior to reaching the plasma membrane. This mechanistic insight reveals a previously unrecognized step in channel organization and lays the foundation for future work aimed at quantifying ensemble-to-total ratios and determining whether coordinated trafficking of pre-assembled complexes occurs.

      This limitation is acknowledged in the discussion section, page 23. It reads: “Our findings highlight the intracellular assembly of BK-Ca<sub>V</sub>1.3 hetero-clusters, though limitations in resolution and organelle-specific analysis prevent precise quantification of the proportion of intracellular complexes that ultimately persist on the cell surface.”

      Reviewer #2 (Public review):

      Summary:

      The co-localization of large conductance calcium- and voltage activated potassium (BK) channels with voltage-gated calcium channels (CaV) at the plasma membrane is important for the functional role of these channels in controlling cell excitability and physiology in a variety of systems.

      An important question in the field is where and how do BK and CaV channels assemble as 'ensembles' to allow this coordinated regulation - is this through preassembly early in the biosynthetic pathway, during trafficking to the cell surface or once channels are integrated into the plasma membrane. These questions also have broader implications for assembly of other ion channel complexes

      Using an imaging based approach, this paper addresses the spatial distribution of BKCaV ensembles using both overexpression strategies in tsa201 and INS-1 cells and analysis of endogenous channels in INS-1 cells using proximity ligation and superesolution approaches. In addition, the authors analyse the spatial distribution of mRNAs encoding BK and Cav1.3.

      The key conclusion of the paper that BK and Ca<sub>V</sub>1.3 are co-localised as ensembles intracellularly in the ER and Golgi is well supported by the evidence.However, whether they are preferentially co-translated at the ER, requires further work. Moreover, whether intracellular pre-assembly of BK-Ca<sub>V</sub>1.3 complexes is the major mechanism for functional complexes at the plasma membrane in these models requires more definitive evidence including both refinement of analysis of current data as well as potentially additional experiments.

      The reviewer raises the question of whether BK and Ca<sub>V</sub>1.3 channels are preferentially co-translated. In fact, I would like to propose that co-translation has not yet been clearly defined for this type of interaction between ion channels. In our current work, we 1) observed the colocalization between BK and Ca<sub>V</sub>1.3 mRNAs and 2) determined that 70% of BK mRNA in active translation also colocalizes with Ca<sub>V</sub>1.3 mRNA. We think these results favor the idea of translational complexes that can underlie the process of co-translation. However, and in total agreement with the Reviewer, the conclusion that the mRNA for the two ion channels is cotranslated would require further experimentation. For instance, mRNA coregulation is one aspect that could help to define co-translation. 

      To avoid overinterpretation, we have revised the manuscript to remove references to “co-translation” in the Results section and included the word “potential” when referring to co-translation in the Discussion section. We also clarified the limitations of our evidence in the Discussion that can be found on page 25: “It is important to note that while our data suggest mRNA coordination, additional experiments are required to directly assess co-translation.”

      Strengths & Weaknesses

      (1) Using proximity ligation assays of overexpressed BK and CaV1.3 in tsa201 and INS1 cells the authors provide strong evidence that BK and CaV can exist as ensembles (ie channels within 40 nm) at both the plasma membrane and intracellular membranes, including ER and Golgi. They also provide evidence for endogenous ensemble assembly at the Golgi in INS-1 cells and it would have been useful to determine if endogenous complexes are also observe in the ER of INS-1 cells. There are some useful controls but the specificity of ensemble formation would be better determined using other transmembrane proteins rather than peripheral proteins (eg Golgi 58K).

      We thank the reviewer for their thoughtful feedback and for recognizing the strength of our proximity ligation assay data supporting BK–Ca<sub>V</sub>1.3 hetero-clusters formation at both the plasma membrane and intracellular compartments. As for specificity controls, we appreciate the suggestion to use transmembrane markers. To strengthen our conclusion, we have performed an additional experiment comparing the number of PLA puncta formed by the interaction of Ca<sub>V</sub>1.3 and BK channels with the number of PLA puncta formed by the interaction of Ca<sub>V</sub>1.3 channels and ryanodine receptors in INS-1 cells. As shown in the figure below, the number of interactions between Ca<sub>V</sub>1.3 and BK channels is significantly higher than that between Ca<sub>V</sub>1.3 and RyR<sub>2</sub>. Of note, RyR<sub>2</sub> is a protein resident of the ER. These results provide additional evidence of the existence of endogenous complex formation in INS-1 cells. We have added this figure as a supplement.

      (2) Ensemble assembly was also analysed using super-resolution (dSTORM) imaging in INS-1 cells. In these cells only 7.5% of BK and CaV particles (endogenous?) co-localise that was only marginally above chance based on scrambled images. More detailed quantification and validation of potential 'ensembles' needs to be made for example by exploring nearest neighbour characteristics (but see point 4 below) to define proportion of ensembles versus clusters of BK or Cav1.3 channels alone etc. For example, it is mentioned that a distribution of distances between BK and Cav is seen but data are not shown.

      We thank the reviewer for this comment. To address the request for more detailed quantification and validation of ensembles, we performed additional analyses:

      Proportion of ensembles vs isolated clusters: We quantified clusters within 200 nm and found that 37 ± 3% of BK clusters are near one or more CaV1.3 clusters, whereas 15 ± 2% of CaV1.3 clusters are near BK clusters. Figure 8– Supplementary 1A

      Distance distribution: As shown in Figure 8–Supplementary 1B, the nearestneighbor distance distribution for BK-to-CaV1.3 in INS-1 cells (magenta) is shifted toward shorter distances compared to randomized controls (gray), supporting preferential localization of BK–CaV1.3 hetero-clusters.

      Together, these analyses confirm that BK–CaV1.3 ensembles occur more frequently than expected by chance and exhibit an asymmetric organization favoring BK proximity to CaV1.3 in INS-1 cells. We have included these data and figures in the revised manuscript, as well as description in the Results section. 

      (3) The evidence that the intracellular ensemble formation is in large part driven by cotranslation, based on co-localisation of mRNAs using RNAscope, requires additional critical controls and analysis. The authors now include data of co-localised BK protein that is suggestive but does not show co-translation. Secondly, while they have improved the description of some controls mRNA co-localisation needs to be measured in both directions (eg BK - SCN9A as well as SCN9A to BK) especially if the mRNAs are expressed at very different levels. The relative expression levels need to be clearly defined in the paper. Authors also use a randomized image of BK mRNA to show specificity of co-localisation with Cav1.3 mRNA, however the mRNA distribution would not be expected to be random across the cell but constrained by ER morphology if cotranslated so using ER labelling as a mask would be useful?

      We thank the reviewer for these constructive suggestions. We measured mRNA colocalization in both directions as recommended. As shown in the figure below, colocalization between KCNMA1 and SCN9A transcripts was comparable in both directions, with no statistically significant difference, supporting the specificity of the observed associations. We decided not to add this to the original figure to keep the figure simple. 

      We agree that co-localization of BK protein with BK mRNA is not conclusive evidence of co-translation, and we do not intend to mislead readers in our conclusion. Consequently, we were careful in avoiding the use of co-translation in the result section and added the word “potential” when referring to co-translation in the Discussion section. We added a sentence in the discussion to caution our interpretation: “It is important to note that while our data suggest mRNA coordination, additional experiments are required to directly assess cotranslation.”

      Author response image 1.

      (4) The authors attempt to define if plasma membrane assemblies of BK and CaV occur soon after synthesis. However, because the expression of BK and CaV occur at different times after transient transfection of plasmids more definitive experiments are required. For example, using inducible constructs to allow precise and synchronised timing of transcription. This would also provide critical evidence that co-assembly occurs very early in synthesis pathways - ie detecting complexes at ER before any complexes 

      We appreciate the reviewer’s insightful suggestion regarding the use of inducible constructs to synchronize transcription timing. This is an excellent approach and would allow direct testing of whether co-assembly occurs early in the synthesis pathway, including detection of complexes at the ER prior to plasma membrane localization. These experiments are beyond the scope of the present work but represent an important direction for future studies.

      We have added the following sentence to the Discussion section (page 24) to highlight this idea. “Future experiments using inducible constructs to precisely control transcription timing will enable more precise quantification of heterocluster formation in the ER compartment prior to plasma membrane insertion and reduce the variability introduced by differences in expression timing after plasmid transfection.” 

      (5) While the authors have improved the definition of hetero-clusters etc it is still not clear in superesolution analysis, how they separate a BK tetramer from a cluster of BK tetramers with the monoclonal antibody employed ie each BK channel will have 4 binding sites (4 subunits in tetramer) whereas Cav1.3 has one binding site per channel. Thus, how do authors discriminate between a single BK tetramer (molecular cluster) with potential 4 antibodies bound compared to a cluster of 4 independent BK channels.

      We appreciate the reviewer’s thoughtful comment regarding the interpretation of super-resolution data. We agree that distinguishing a single BK tetramer from a cluster of multiple BK channels is challenging when using an antibody that can bind up to four sites per channel. To clarify, our analysis does not attempt to resolve individual subunits within a tetramer; rather, it focuses on the nanoscale spatial proximity of BK and Ca<sub>V</sub>1.3 signals.

      We want to note that this limitation applies only to the super-resolution maps in Figures 8C and 9D and does not affect Airyscan-based analyses or measurements of BK–Ca<sub>V</sub>1.3 proximity.

      To address how we might distinguish between a single BK tetramer and a cluster of multiple BK channels, we considered two contrasting scenarios. In the first case, we assume that all four α-subunits within a tetramer are labeled. Based on cryoEM structures, a BK tetramer measures approximately 13 nm × 13 nm (≈169 nm²). Adding two antibody layers (primary and secondary) would increase the footprint by ~14 nm in each direction, resulting in an estimated area of ~41 nm × 41 nm (≈1681 nm²). Under this assumption, particles smaller than ~1681 nm² would likely represent individual tetramers, whereas larger particles would correspond to clusters of multiple tetramers. 

      In the second scenario, we propose that steric constraints at the S9–S10 segment, where the antibody binds, limit labeling to a single antibody per tetramer. If true, the localization precision would approximate 14 nm × 14 nm—the combined size of the antibody complex and the channel—close to the resolution limit of the microscope. To test this, we performed a control experiment using two antibodies targeting the BK C-terminal domain, raised in different species and labeled with distinct fluorophores. Super-resolution imaging revealed that only ~12% of particles were colocalized, suggesting that most channels bind a single antibody.

      If multiple antibodies could bind each tetramer, we would expect much greater colocalization.

      Although these data are not included in the manuscript, we have added the following clarification to the Results section (page 19): “It is important to note that this technique does not allow us to distinguish between labeling of four BK αsubunits within a tetramer and labeling of multiple BK channel clusters. Hence, particles smaller than ~1680 nm² may represent either a single tetramer or a cluster. This limitation applies to Figures 8C and 9D and does not affect measurements of BK–Ca<sub>V</sub>1.3 proximity.”

      Author response image 2.

      (6) The post-hoc tests used for one way ANOVA and ANOVA statistics need to be defined throughout

      We thank the reviewer for highlighting the need for clarity regarding our statistical analyses. We have now specified the post-hoc tests used for all one-way ANOVA and ANOVA comparisons throughout the manuscript, and updated figure legends.

      Reviewer #3 (Public review):

      Summary:

      The authors present a clearly written and beautifully presented piece of work demonstrating clear evidence to support the idea that BK channels and Cav1.3 channels can co-assemble prior to their assertion in the plasma membrane.

      Strengths:

      The experimental records shown back up their hypotheses and the authors are to be congratulated for the large number of control experiments shown in the ms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have sufficiently addressed the specific points previously raised and the manuscript has improved clarity in those aspects. My main concern, which still remains, is stated in the public review.

      Reviewer #3 (Recommendations for the authors):

      I am content that the authors have attempted to fully address my previous criticisms.

      I have only three suggestions

      (1) I think the word Homo-clusters at the bottom right of Figure 1 is erroneously included.

      We thank the reviewer for bringing this to our attention. The figure has been corrected accordingly.

      (2) The authors should, for completeness, to refer to the beta, gamma and LINGO subunit families in the Introduction and include appropriate references:

      Knaus, H. G., Folander, K., Garcia-Calvo, M., Garcia, M. L., Kaczorowski, G. J., Smith, M., & Swanson, R. (1994). Primary sequence and immunological characterization of betasubunit of high conductance Ca2+-activated K+ channel from smooth muscle. The Journal of Biological Chemistry, 269(25), 17274-17278.

      Brenner, R., Jegla, T. J., Wickenden, A., Liu, Y., & Aldrich, R. W. (2000a). Cloning and functional characterization of novel large conductance calcium-activated potassium channel beta subunits, hKCNMB3 and hKCNMB4. The Journal of Biological Chemistry, 275(9), 6453-6461.

      Yan, J & R.W. Aldrich. (2010) LRRC26 auxiliary protein allows BK channel activation at resting voltage without calcium. Nature. 466(7305):513-516

      Yan, J & R.W. Aldrich. (2012) BK potassium channel modulation by leucine-rich repeatcontaining proteins. Proceedings of the National Academy of Sciences 109(20):7917-22

      Dudem, S, Large RJ, Kulkarni S, McClafferty H, Tikhonova IG, Sergeant, GP, Thornbury, KD, Shipston, MJ, Perrino BA & Hollywood MA (2020). LINGO1 is a novel regulatory subunit of large conductance, Ca2+-activated potassium channels. Proceedings of the National Academy of Sciences 117 (4) 2194-2200

      Dudem, S., Boon, P. X., Mullins, N., McClafferty, H., Shipston, M. J., Wilkinson, R. D. A., Lobb, I., Sergeant, G. P., Thornbury, K. D., Tikhonova, I. G., & Hollywood, M. A. (2023). Oxidation modulates LINGO2-induced inactivation of large conductance, Ca2+-activated potassium channels. The Journal of Biological Chemistry, 299 (3) 102975.

      We agree with the reviewer’s suggestion and have revised the Introduction to include references to the beta, gamma, and LINGO subunit families. Appropriate citations have been added to ensure completeness and contextual relevance.

      Additionally, BK channels are modulated by auxiliary subunits, which fine-tune BK channel gating properties to adapt to different physiological conditions. The β, γ, and LINGO1 subunits each contribute distinct structural and regulatory features: β-subunits modulate Ca²⁺ sensitivity and can induce inactivation; γ-subunits shift voltage-dependent activation to more negative potentials; and LINGO1 reduces surface expression and promotes rapid inactivation (18-24). These interactions ensure precise control over channel activity, allowing BK channels to integrate voltage and calcium signals dynamically in various cell types.

      (3) I think it may be more appropriate to include the sentence "The probes against the mRNAs of interest and tested in this work were designed by Advanced Cell Diagnostics." (P16, right hand column, L12-14) in the appropriate section of the Methods, rather than in Results.

      We thank the reviewer for this helpful suggestion. In response, we have relocated the sentence to the appropriate section of the Methods, where it now appears with relevant context.

    1. Market justice preferences

      estructura sugerida: 1. Welfare states, commodification & policy feedbacks 2. Distributive justice & market justice 3. Key studies on market justice & main findings 4. Pensions 5. Operationalization

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      A previous study by Komada et al. demonstrated that MAP7 is expressed in both Sertoli and germ cells, and that Map7 gene-trap mutant mice display disrupted microtubule bundle formation in Sertoli cells, accompanied by defects in spermatid manchettes and germ cell loss. In the current study, Kikuchi et al. investigated the role of MAP7 in the formation of the Sertoli cell apical domain during the first wave of spermatogenesis. They generated a GFP-tagged MAP7 mouse line and demonstrated that the endogenous MAP7 protein localizes to the apical microtubules in Sertoli cells and to the manchette microtubules in step 9-11 spermatids. They also generated a new Map7 knockout (KO) mouse line in a genetic background distinct from the one used in the previous study. Focusing on stages before the emergence of step 9-11 spermatids, the authors aimed to isolate defects caused by the function of MAP7 in Sertoli cells. They report that loss of MAP7 impairs Sertoli cell polarity and apical domain formation, accompanied by the microtubule remodeling defect. Using the GFP-tagged MAP7 line, they performed immunoprecipitation-mass spectrometry and identified several MAP7-interacting proteins in the testis, including MYH9. They further observed that MAP7 deletion alters the distribution of MYH9. Single-cell RNA sequencing revealed that the loss of MAP7 in Sertoli cells resulted in slight transcriptomic shifts but had no significant impact on their functional differentiation. Single-cell RNA sequencing analysis also showed delayed meiotic progression in the MAP7-deficient testis. Overall, while the study provides some interesting discoveries of early Sertoli cell defects in MAP7-deficient testes, some conclusions are premature and not fully supported by the presented data. The mechanistic investigations remain limited in depth.

      Response: We thank the reviewer for this insightful summary. We agree that some of our initial interpretations were speculative and have revised the relevant sections to more accurately reflect the limitations of the current data. We also acknowledge that further mechanistic studies will be important to strengthen our conclusions, and we have outlined these plans in the individual responses below.

      Major comments:

      Although the infertility phenotype of the Map7 gene-trap mutant mice has been reported previously, it remains essential to assess fertility in this newly generated MAP7 knockout line. While the authors present testis size and histological differences between WT and KO mice (Extended Fig. 2e and 2f), there is no corresponding description or interpretation in the main text regarding fertility outcomes.

      Response: We thank the reviewer for raising this point. Although we had presented the differences in testis size and histology between wild-type and Map7-/- mice, we agree that a description of the corresponding fertility outcomes was missing from the main text. We have now revised the relevant part of the Results section as follows: “Consistent with observations in Map7 gene-trap mice, Map7-/- males exhibited reduced testis size and spermatogenic defects (Supplemental Fig. 2E, F). Notably, the cauda epididymis of Map7-/- males contained no mature spermatozoa (Supplemental Fig. 2F), indicating male infertility.” (page 5, line 33–page 6, line 2)

      • In Figure 2C, the authors identified Sertoli cells, spermatogonia cells, and spermatocytes using SEM, based on their cell morphology and adhesion to the basement membrane. Given that the loss of MAP7 disrupts the polarity and architecture of Sertoli cells, the position of germ cells will be affected, making this identification criterion less reliable.

      Response: We appreciate the reviewer’s comment. While the reviewer notes that cell identification was based on cell morphology and adhesion to the basement membrane, we clarify that nuclear morphology was also considered, as described in the original manuscript. Specifically, germ cells have spherical nuclei, whereas Sertoli cell nuclei are irregularly shaped (representative segmentation results can be provided as an additional Supplemental Figure upon request). Round spermatids at P21 can be distinguished from spermatocytes by their smaller nuclear size. In addition, spermatogonia remain attached to the basement membrane even in Map7-/- testes, as confirmed by GFRα1-positive spermatogonial stem cells (Figure 6A). Together, these features ensure reliable identification of each cell type, independent of the altered polarity observed in Map7-deficient Sertoli cells.

      • In Figure 2e, the number of Sox9-positive Sertoli cells in MAP7 knockout mice appears higher than that in the control at P17. Quantification of total Sox9-positive cells should be done to determine whether MAP7 deletion increases Sertoli cell numbers.

      Response: As suggested by the reviewer, we will quantify the density of SOX9-positive Sertoli cells per unit area of seminiferous tubule at P10 and P17 in Map7+/- and Map7-/- testes, and include the results in the revised manuscript.

      • To determine whether MAP7's role in regulating Sertoli cell polarity relies on germ cells, the authors treated mice with busulfan at P28 to delete germ cells, a stage after Sertoli cell polarity defect has developed in MAP7 knockout mice. This data is insufficient to support the conclusion that MAP7 regulates Sertoli cell polarity independently of the presence of germ cells. Germ cell deletion should be done before the Sertoli cell defect develops to address this question.

      Response: We appreciate the reviewer’s thoughtful comment regarding the interpretation of the busulfan experiments. While depletion of germ cells at P28 enabled us to assess Sertoli cell polarity in the absence of postnatal spermatogonia, these experiments do not definitively determine whether MAP7 regulates Sertoli cell polarity independently of germ cells. Neonatal germ-cell depletion would more directly test germ cell–independent effects; however, systemic busulfan administration at early developmental stages is highly toxic, often causing bone marrow failure and multi-organ damage, which precludes survival and confounds analysis of testis-specific effects. Although germ cell ablation could, in principle, be achieved using transgenic approaches that exploit the natural resistance of mice to diphtheria toxin (DTX) (reviewed in Smith et al., Andrology, 2015), these strategies require multiple transgenes and show minor variability in efficiency, making them impractical for our current experiments. Generating the necessary genetic combinations would require considerable time. We therefore plan to pursue alternative genetic approaches in future work.

      In the revised manuscript, we have modified the relevant section to more accurately reflect the limitations of the current experiments, as follows: “Busulfan was administered at P28, and testes were analyzed 6 weeks later, after complete elimination of germ cell lineages. Following treatment, Map7+/- mice showed testis-to-body weight ratios comparable to untreated Map7-/- mice (Supplemental Fig. 3D), and hematoxylin-eosin (HE) staining confirmed germ cell depletion (Fig. 2F; Supplemental Fig. 3E). In Map7+/- testes, most Sertoli nuclei remained basally positioned, indicating that once apical–basal polarity is established, it is stably maintained even in the absence of germ cells. In contrast, Map7-/- Sertoli nuclei were frequently misoriented toward the lumen under the same conditions (Fig. 2F; Supplemental Fig. 3E), suggesting that polarity defects in Map7-deficient Sertoli cells occur independently of germ cell presence.” (page 7, lines 20–28)

      In addition, we have added the following sentences to the Discussion section to highlight the implication of these findings: “In addition, even after germ cell depletion by busulfan treatment, Map7-deficient Sertoli cells failed to reestablish basal nuclear positioning, indicating that loss of MAP7 causes an intrinsic polarity defect. These findings suggest that MAP7 acts as a cell-autonomous regulator of Sertoli cell polarity, rather than mediating effects indirectly through germ cell–Sertoli cell interactions.” (page 15, lines17–21)

      • The resolution of the SEM images in Figure 3c is insufficient to evaluate tight and adherens junctions clearly. As such, these images do not convincingly support the claim that adherens junctions are absent in the KO testes.

      Response: We thank the reviewer for this insightful comment. Tight junctions can be reliably identified in SEM images as dense intercellular structures accompanied by endoplasmic reticulum aligned along the cell boundaries. The region immediately apical to the tight junctions likely corresponds to adherens junctions, which are also associated with the endoplasmic reticulum. Unlike tight junctions, these regions exhibit wider intercellular spaces, consistent with the looser membrane apposition characteristic of adherens junctions, although they cannot be unambiguously distinguished from gap junctions or desmosomes based on morphology alone. In the original figure, 2× binning reduced image resolution, which may have contributed to the reviewer’s concern.

      In the revised manuscript, we have re-acquired the SEM images in high-resolution mode, focusing on the relevant regions. The new high-resolution images have replaced the original panels in revised Figure 3C, providing clearer visualization of junctional structures at P10 and P21 in Map7+/- and Map7-/- testes. The original Figure 3C images have been moved to Supplemental Figure 4B for reference.

      The corresponding section in the Results has been revised as follows in the updated manuscript: “We then performed SEM to examine the effects of Map7 KO. In P21 Map7+/- testes, electron-dense regions along the basal side of Sertoli–Sertoli junctions corresponded to tight junctions closely associated with the endoplasmic reticulum, consistent with previous reports (Luaces et al. 2023) (Fig. 3C; Supplemental Fig. 4B). The region immediately apical to the tight junctions likely represents adherens junctions, which were also associated with the endoplasmic reticulum. Unlike tight junctions, these regions displayed wider intercellular spaces, reflecting the looser membrane apposition typical of adherens junctions, though they could not be definitively distinguished from gap junctions or desmosomes based on morphology alone (Fig. 3C; Supplemental Fig. 4B). At P10, both Map7+/- and Map7-/- testes lacked clearly defined tight junctions and adherens junction–like structures (Fig. 3C; Supplemental Fig. 4B). In P21 Map7-/- mice, Sertoli cells formed expanded basal tight junctions but failed to establish adherens junction–like structures (Fig. 3C; Supplemental Fig. 4B).” (page 8, line 34–page 9, line 12)

      • GFP-tagged reporter mice and HeLa cells were used for immunoprecipitation-mass spectrometry to identify proteins that interact with MAP7. Given that the authors aimed to elucidate the mechanism by which MAP7 regulates Sertoli cell cytoskeleton organization, the rationale for including HeLa cells is unclear and should be better justified or reconsidered.

      Response: We thank the reviewer for this comment. MAP7-egfpKI HeLa cells were used as a complementary system to identify MAP7-associated proteins, providing sufficient material and a controlled environment for robust detection. By comparing IP-MS results from MAP7-egfpKI HeLa cells and P17–P20 Map7-egfpKI testes, we can distinguish proteins that are specific to polarized Sertoli cells: proteins detected exclusively in P17–P20 testes may be involved in Sertoli cell polarization, whereas proteins detected in both systems likely represent general MAP7-associated factors that are not specific to Sertoli cell polarity.

      This rationale has been clarified in the revised manuscript by adding the following sentence to the Results section: “MAP7-egfpKI HeLa cells were used as a complementary system, providing sufficient material and a controlled environment for robust detection of MAP7-associated proteins. Comparison of IP-MS results between MAP7-egfpKI HeLa cells and P17–P20 Map7-egfpKI testes allows identification of MAP7-associated proteins that are specific to polarized Sertoli cells, whereas proteins detected in both systems likely represent general MAP7-associated proteins.” (page 9 lines 27-32)

      • The authors observed that MYH9, one of the MAP7-interacting proteins, does not colocalize with ectopic microtubule and F-actin structures in MAP7 KO testes and concluded that MAP7 facilitates the integration of microtubules and F-actin via interaction with NMII heavy chains. This conclusion is speculative and not adequately supported by the presented data.

      Response: We thank the reviewer for this insightful comment. We agree that our initial conclusion was speculative and have revised the relevant section to more accurately reflect the limitations of the current data. The revised text now reads as follows: “These findings indicate that MYH9 localization at the luminal interface depends on MAP7, and suggest that MAP7 helps coordinate microtubules and F-actin, potentially via its association with NMII heavy chains.” (page 10, lines 13–15)

      To further elucidate this mechanism, we will perform biochemical domain-mapping to define the MAP7 region responsible for MYH9 complex formation. We have already established a series of human MAP7 deletion mutants (as reported previously, EMBO Rep., 2018) and will conduct co-immunoprecipitation assays in HEK293 cells to identify the specific MAP7 domain required for complex formation with MYH9. Based on these results, we plan to use AlphaFold3 to predict the three-dimensional structure of the MAP7–MYH9 complex. These analyses will help clarify how MAP7 associates with the actomyosin network and provide additional mechanistic insights that complement our in vivo observations of MYH9 mislocalization in Map7-/- testes.

      • The authors used Spearman correlation coefficients to analyze six Sertoli cell clusters and generated a minimum spanning tree to infer differentiation trajectories. However, details on the method used for constructing the tree are lacking. Moreover, relying solely on Spearman correlation to define differentiation topology is oversimplified.

      Response: We appreciate the reviewer’s valuable feedback. We agree that Spearman correlation alone is insufficient to infer differentiation topology. In response, we reanalyzed the data using Monocle3, which implements branch-aware pseudotime inference to capture both cluster continuity and differentiation directionality. This reanalysis provides a more accurate reconstruction of differentiation trajectories among the six Sertoli cell clusters. Although the overall trajectories appeared different and a higher proportion of Map7-/- Sertoli cells exhibited very low pseudotime values, comparison of the control and Map7-/- trajectories revealed that the average node degree was nearly identical, indicating that the local graph structure—reflecting the connectivity among neighboring cells—was largely preserved. The numbers of branch points and the graph diameter differed slightly, likely due to differences in sample size (311 control vs. 434 Map7-/- Sertoli cells) and distribution bias rather than major topological changes. Accordingly, Figures 5C and 5D have been replaced with the updated Monocle3-based trajectory analysis, and the corresponding text in the Results section and figure legend have been revised as follows:

      “To reconstruct differentiation trajectories among the six Sertoli cell clusters, we reanalyzed the datasets using Monocle3, which incorporates branch-aware pseudotime inference. Cluster C1 was selected as the root based on shared specificity and entropy scores, consistent with its metabolically active and transcriptionally diverse profile (Fig. 5B, C; Supplemental Fig. 7). While the overall trajectories appeared altered, the proportion of Map7-/- Sertoli cells with very low pseudotime values was only modestly increased (Fig. 5D). Comparison with controls showed that the average node degree was nearly identical (Fig. 5C), indicating that the local graph structure, reflecting connectivity among neighboring cells, remained largely intact. Minor differences in branch points and graph diameter likely reflect inherent variability in the data rather than major topological changes (Supplemental Fig. 6B). Consistent with this, the relative proportions of the six clusters showed only modest shifts, suggesting that the overall architecture of Sertoli cell differentiation is largely preserved in the absence of MAP7.” (page 11, lines 7-18)

      “(C) Control and Map7-/- Sertoli cells were visualized separately using UMAPs constructed in Seurat. Using the same datasets, pseudotime trajectories were inferred with Monocle3. For root selection, shared_score (cluster overlap), specificity_score (cluster uniqueness), and entropy_score (transcriptional diversity) were computed, resulting in cluster 1 being selected as the root. The numbers of nodes, edges, branch points, average degree, and diameter of each trajectory are shown below the corresponding UMAPs. (D) Parallel comparison of pseudotime distributions between control and Map7-/- populations.” (page 30, lines 5-12)

      Minor comments:

      • Several extended data figures are redundant with main figures and do not provide additional value (e.g., Fig. 2d vs. Extended Data Fig. 3a; Fig. 2f vs. Extended Data Fig. 3d; Fig. 2C vs. Extended Data Fig. 4b; Fig. 3d vs. Extended Data Fig. 4c). The authors should consolidate or remove duplicates.

      Response: Regarding the concerns about redundancy between main and Supplemental figures, we would like to clarify the rationale for retaining certain Supplemental figures.

      Fig. 2D vs. Supplemental Fig. 3A: Due to space limitations in the main figure, only the merged three-color image was shown. We believe that the single-color grayscale images in Supplemental Fig. 3A provide additional clarity, allowing easier visualization of SOX9-positive Sertoli cell distribution and differences in F-actin structure.

      Fig. 2F vs. Supplemental Fig. 3E: In the main figure, only the high-magnification image was shown due to space constraints. The lower-magnification image in Supplemental Fig. 3E demonstrates that the selected field was not chosen arbitrarily, providing context for the observed structures. In addition, Supplemental Fig. 3E includes both low- and high-magnification images of age-matched busulfan (-) testes as a control for the busulfan (+) condition, further supporting the validity of the comparison.

      For the above-mentioned cases (Fig. 2D vs. Supplemental. 3A; Fig. 2F vs. Supplemental Fig. 3E), as well as other potentially overlapping figures (e.g., Fig. 3D vs. Supplemental Fig. 4C), we believe that the additional single-channel and lower-magnification images provide important context that cannot be fully conveyed in the main figures due to space limitations. Nevertheless, to address the reviewer’s concern, we will (i) clearly state the purpose of each Supplemental figure in the corresponding legends, and (ii) re-evaluate all figures to consolidate or remove any truly redundant panels. Our goal is to ensure that all figures collectively convey the data in the most concise and informative manner.

      • Figure citations in the main text do not consistently match figure content. For example, on page 7 (lines 5-6), the text refers to Extended Data Fig. 4a for SOX9 staining. Yet, it is the extended Data Fig. 3a that contains the relevant data. Similarly, the reference to Extended Data Fig. 4b and 4c on page 7 (lines 7-8) for adult defects is inaccurate.

      Response: We thank the reviewer for drawing attention to these inconsistencies. We have carefully checked all figure citations throughout the main text and corrected them so that they consistently match the figure content. The revised manuscript reflects these corrections.

      • In Figure 2e, percentages of Sertoli cells across three layers are shown. The figure legend should specify which layer(s) show statistically significant differences between WT and KO.

      Response: We are grateful to the reviewer for highlighting this point. Statistical comparisons were performed between Map7+/- and Map7-/- mice within each corresponding layer at P17. Statistical significance was assessed using Student’s t-test, and all three layers showed significant differences between Map7+/- and Map7-/- (P < 2.20 × 10⁻⁴). The figure legend has been revised accordingly as follows: “Statistical comparisons between Map7+/- and Map7-/- mice were performed for each corresponding layer at P17 using Student’s t-test. All three layers showed significant differences between Map7+/- and Map7-/- mice (*, P<2.20 × 10⁻⁴).” (page 28, lines 5-8)

      • The current color scheme for F-actin and TUBB3 in Figure 3 lacks sufficient contrast. Adjusting to more distinguishable colors would improve readability.

      Response: Response: We thank the reviewer for this helpful suggestion. In the original merged images, four channels (DNA, TUBB3, F-actin, and β-catenin) were displayed together, which reduced contrast between cytoskeletal signals. To improve clarity, we generated new merged images showing only TUBB3 and F-actin, allowing better visual distinction between these components. In addition, β-catenin and DNA are now displayed together as a separate merged image (β-catenin in yellow and DNA in blue) in the final column, highlighting the altered localization of β-catenin in Map7-/- testes.

      • Since multiple scale bars with different units are present within the same figures, adding units directly above or beside each scale bar would improve readability.

      Response: We thank the reviewer for the suggestion. Following this recommendation, we have added units directly above each scale bar in all figures to improve readability.

      • It is recommended to directly mark Sertoli cells, spermatogonia, and spermatocytes on the SEM images in Figure 2C for clearer visualization.

      Response: We thank the reviewer for the suggestion. We will follow this recommendation by performing segmentation and directly marking Sertoli cells, spermatogonia, and spermatocytes on the SEM images in Figure 2C to improve visualization.

      • The quantification of Sertoli cell positioning shown in Fig. 2C is already described in the main text and is unnecessary in the figure.

      Response: We appreciate the reviewer’s comment regarding the quantification of Sertoli cell positioning. Although the results are described in the main text, we believe that the visual presentation in Figure 2C is essential for conveying the spatial distribution pattern in an intuitive and comparative manner. To address the concern about redundancy, we have slightly revised the figure legend (page 27, lines 28–29) to clarify that this panel provides a visual summary of the quantitative data described in the text, thereby improving clarity without unnecessary duplication.

      _Referee cross-commenting_

      I concur with Reviewer 2 that the Map7-eGFP mouse model is a valuable tool for the research community. I also agree that performing MAP7-MYH9 double immunofluorescence staining to demonstrate their colocalization would further strengthen the authors' conclusions regarding their interaction. My overall assessment of the manuscript remains unchanged: the study represents an incremental advance that extends previous findings on MAP7 function but provides limited new mechanistic insight.

      Reviewer #1 (Significance):

      This study investigates the role of the microtubule-associated protein MAP7 in Sertoli cell polarity and apical domain formation during early stages of spermatogenesis. Using GFP-tagged and MAP7 knockout mouse models, the authors show that MAP7 localizes to apical microtubules and is required for Sertoli cell cytoskeletal organization and germ cell development. While the study identifies early Sertoli cell defects and candidate MAP7-interacting proteins, the mechanistic insights remain limited, and several conclusions require stronger experimental support. Overall, the discovery represents an incremental advance that extends prior findings on MAP7 function, providing additional but modest insights into the role of MAP7 in cytoskeletal regulation in male reproduction.

      Response: We thank the reviewer for their constructive comments and thoughtful evaluation of our manuscript. We appreciate the positive feedback regarding the value of the Map7-egfpKI mouse model for the research community. We also thank the reviewer for the suggestion to perform MAP7–MYH9 double immunofluorescence staining to demonstrate colocalization, which we agree will further strengthen the mechanistic support.

      We would like to clarify that several aspects of our findings represent novel contributions within a field where the mechanisms of microtubule remodeling during apical domain formation have remained largely unresolved. In particular, our study provides evidence that MAP7 is asymmetrically enriched at the apical microtubule network in Sertoli cells and contributes to the directional organization of these microtubules—an aspect of Sertoli cell polarity that has not been previously characterized. Our results further indicate that dynamic microtubule turnover, rather than stabilization alone, is required for proper apical domain formation, addressing a gap in current understanding of how microtubules are reorganized during early polarity establishment. In addition, the data support a role for MAP7 in coordinating microtubule and actomyosin organization, suggesting a scaffolding function that links these cytoskeletal systems. We also observe that Sertoli cell polarity can be functionally separated from cell identity and that disruptions in apical domain architecture precede delays in germ cell developmental progression. Taken together, these observations provide mechanistic insight that expands upon previous studies of MAP7 function at the cellular level.

      The conclusions are supported by multiple, complementary lines of evidence, including knockout and Map7-egfpKI mouse models, high-resolution electron microscopy, immunoprecipitation–mass spectrometry, and single-cell RNA sequencing. While we agree that further experiments, such as MAP7–MYH9 double staining, will strengthen the mechanistic framework, we will also perform complementary biochemical analyses to provide additional insight. Specifically, we plan to conduct domain-mapping experiments to identify the MAP7 region required for MYH9 complex formation, coupled with co-immunoprecipitation assays in cultured cells to validate this association.

      Although generating new mutant mouse lines is not feasible within the scope of this revision, and no in vitro system fully recapitulates Sertoli cell polarization, these complementary approaches will provide further mechanistic support. We believe that these planned experiments, together with the current dataset, will clarify the underlying mechanisms and reinforce the significance of our findings, while appropriately acknowledging the current limits of experimental evidence.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript the authors evaluate the role of Microtubule Associated Protein 7 (MAP7) in postnatal Sertoli cell development. The authors build two novel transgenic mouse lines (Map7-eGFP, Map7 knockout) which will be useful tools to the community. The transgenic mouse lines are used in paired advanced sequencing experiments and advanced imaging experiments to determine how Sertoli cell MAP7 is involved in the first wave of spermatogenesis. The authors identify MAP7 as an important regulator of Sertoli cell polarity and junction formation with loss of MAP7 disrupting intracellular microtubule and F-actin arrangement and Sertoli cell morphology. These structural issues impact the first wave of spermatogenesis causing a meiotic delay that limits round spermatid numbers. The authors also identify possible binding partners for MAP7, key among those MYH9.

      The authors did a great job building a complex multi-modal project that addressed the question of MAP7 function from many angles. The is an excellent balance of using many advanced methods while still keeping the project narrowed, to use only tools to address the real questions. The lack of quality testing on the germ cells outside of TUNEL is disappointing, but the Conclusion section implies that this sort of work is being done currently so the omission in this manuscript is acceptable. However, there is an issue with the imaging portion of the work on MYH9. The conclusions from the MYH9 data is currently overstated, super-resolution imaging of Map7 knockouts with microtubule and F-actin stains, and imaging that uses MYH9 with either Map7-eGFP or anti-MAP7 are also needed to both support the MAP7-MYH9 interaction normally and lack of interaction with failure of MYH9 to localize to microtubules and F-actin in knockouts. Since a Leica SP8 was used for the imaging, using either Leica LIGHTNING or just higher magnification will likely be the easiest solution.

      Response: We sincerely appreciate the reviewer’s thorough and positive evaluation of our study. We are encouraged that the reviewer recognized the overall strength of our multi-modal approach and the scientific value of the Map7-egfp knock-in and Map7 knockout genome-edited mouse models that we generated. We also thank the reviewer for highlighting the balance between methodological breadth and focused, hypothesis-driven investigation in our work.

      Regarding the reviewer’s valuable comments on the imaging data, we have addressed them as follows. We improved the cytoskeletal imaging data as described in response to the reviewer’s minor comments. Specifically, in the revised Figure 3B, we replaced the original images with higher-resolution confocal images to provide a clearer view of cytoskeletal organization. In addition, following Reviewer #1’s suggestion, we modified the panel layout to enlarge each field and enhance the contrast between TUBB3 and F-actin channels, allowing better visualization of their altered localization in Map7-/- testes.

      We agree that super-resolution imaging comparing control and Map7-/- testes stained for TUBB3 and F-actin would further strengthen the analysis. If the current resolution is still considered insufficient, we plan to perform additional imaging using a Carl Zeiss Airyscan or Leica Stellaris 5 system to further improve spatial resolution and confirm the observed cytoskeletal phenotypes. Finally, we will perform co-imaging of MYH9 with MAP7 to validate their spatial relationship under normal conditions, complementing the existing data obtained from Map7-/- testes.

      This manuscript is nicely organized with almost all of the results spelled out very clearly and almost always paired with figures that make compelling and convincing support for the conclusions. There are minor revision suggestions for improving the manuscript listed below. These include synching up Figure and Supplemental Figure reference mismatches. There are also many minor, but important, details that need to be added to the Methods section including many catalog numbers and some references.

      - Some of the imaging, especially Fig4F could benefit and be more convincing with super-resolution imaging in the 150nm range (SIM, Airyscan, LIGHTNING, SoRa) possibly even just imaging with a higher magnification objective (60x or 100x)

      Response: We appreciate the reviewer’s suggestion to improve the resolution of the imaging data. In addition to revising Figure 3B as described above, we have also replaced the images in Figure 4F with higher-resolution confocal images to provide a clearer view of MYH9 localization relative to microtubules and F-actin. These revised images highlight that MYH9 specifically accumulates at apical regions where microtubules and F-actin intersect, forming the apical ES, but is not localized to the basal ES-associated F-actin structures. To retain spatial context and allow readers to appreciate the overall distribution pattern, the original lower-magnification images from Figure 4F have been moved to Supplemental Figure 5.

      - SuppFig1D: Please add context in the legend to the meaning of the Yellow Stars and "O->U" labels. The latter would seem to be to indicate the Ovarian and Uterine sides of the image

      Response: In response to this comment, we revised the figure legend to clarify the annotations. The legend now states: “O, ovary side; U, uterus side. Asterisks indicate secretory cells that lack planar cell polarity.”

      - Pg6Line7: up to P23 or up to P35?

      Response: We appreciate the reviewer’s attention to this detail. The text has been revised for clarity as follows: “To examine the temporal dynamics of Sertoli cell polarity establishment, we analyzed seminiferous tubule morphology across the first wave of spermatogenesis, from postnatal day (P)10 to P35. To specifically assess the role of MAP7 in Sertoli cells while minimizing contributions from germ cells, our analysis focused on stages up to P23, before MAP7 expression becomes detectable in step 9–11 spermatids (Fig. 1), to exclude potential secondary effects resulting from MAP7 loss in germ cells.” (page 6, lines 5-10)

      - SuppFig4B: Does SuppFig4B reference back to Fig3B or Fig3C? If the latter please update this in the legend.

      - Pg7Line21-23: Is SuppFig3D,E meant to be referenced and not SuppFig5A,B?

      - Pg8Line22-25: Is SuppFig4A meant to be reference and not SuppFig5?

      - Pg8Line34-Pg9Line: Is SuppFig4B meant to be reference and not SuppFig5B?

      Response: We appreciate the reviewer’s careful reading. All mismatches in Supplemental figure references have been corrected, ensuring that each reference in the text now accurately corresponds to the appropriate data.

      - Pg9Line28-33: Would the authors be willing to rework this figure to include images that more closely match the reported findings? The current version does not strongly support the idea that MYH9 fails to localize to microtubule and F-actin domains in Map7 knockout P17 seminiferous tubules. This could also just be a matter of acquiring these images at a higher magnification or with a lower-end (150nm range) super-resolution system (SIM, Airyscan, LIGHTNING, SoRa etc)

      Response: Following the reviewer’s recommendation, we replaced the images in Figure 4F with higher-resolution confocal images to better visualize MYH9 localization relative to microtubules and F-actin in Map7+/- and Map7-/- testes. These revised images demonstrate that MYH9 specifically accumulates at apical regions where microtubules and F-actin intersect, but not at the basal ES-associated F-actin structures. To preserve spatial context, the original low-magnification images have been moved to Supplemental Figure 5. If additional resolution is required, we are prepared to acquire further images using an Airyscan or Stellaris 5 system.

      - SuppFig7A: The legend notes these are P23 samples but the image label says 8W. Please update this to whichever is the correct age.

      Response: We thank the reviewer for pointing out this discrepancy. The figure legend for Supplemental Figure 7A (now revised as Supplemental Figure 8A) has been corrected to indicate that the samples are from 8-week-old mice, consistent with the image label.

      - Pg16Line4-5: Please include in the text the vendor and catalog number for the C57BL/6 mice

      Response: The text now specifies: “C57BL/6NJcl mice were purchased from CLEA Japan (Tokyo, Japan)” (page 17, line 4). CLEA Japan does not assign catalog numbers to mouse strains.

      - Pg16Line18-19: Please include in the text the catalog number for the DMEM

      - Pg16Line19-20: Please include in the text the vendor and catalog number for the FBS

      - Pg16Line20: Please include in the text the vendor and catalog number for the Pen-Strep

      Response: We have added vendor and catalog information as follows: “Wild-type and MAP7-EGFPKI HeLa cells were cultured in Dulbecco’s Modified Eagle Medium (DMEM, 043-30085; Fujifilm Wako Pure Chemical, Osaka, Japan) supplemented with 10% fetal bovine serum (FBS, 35-015-CV; Corning, Corning, NY, USA) and penicillin–streptomycin (26253-84; Nacalai, Kyoto, Japan) at 37 °C in a humidified atmosphere containing 5% CO₂ 18.” (page 17, lines 18-22)

      - Pg17Line6-12: Thank you for including organized and detailed information about the primers, please also define the PCR protocol used including temperatures, timing, and cycles for Map7 knockout genotyping

      - Pg17Line20-27: Thank you for including organized and detailed information about the primers, please also define the PCR protocol used including temperatures, timing, and cycles for Map7-eGFP genotyping

      Response: The text has been updated to include the PCR conditions used for genotyping as follows: “Genotyping PCR was routinely performed as follows. Genomic DNA was prepared by incubating a small piece of the cut toe in 180 µL of 50 mM NaOH at 95 °C for 15 min, followed by neutralization with 20 µL of 1 M Tris-HCl (pH 8.0). After centrifugation for 20 min, 1 µL of the resulting DNA solution was used as the PCR template. Each reaction (8 µL total volume) contained 4 µL of Quick Taq HS DyeMix (DTM-101; Toyobo, Osaka, Japan) and a primer mix. PCR cycling conditions were as follows: 94 °C for 2 min; 35 cycles of 94 °C for 30 s, 65 °C for 30 s, and 72 °C for 1 min; followed by a final extension at 72 °C for 2 min and a hold at 4 °C. PCR products were analyzed using agarose gel electrophoresis. This protocol was also applied to other mouse lines and alleles generated in this study.” (page 18, lines 17–25)

      - Pg17Line30: Please include in the text the vendor and catalog number for the Laemmli sample buffer

      Response: We clarified that the buffer was prepared in-house.

      - Pg17Line32&SuppTable1: Thank you for including an organized and detailed table for the primary antibodies used, please also make either a similar table or expand the current table to include secondary antibody information

      - Pg17Line32: Please note in the text which primary antibodies and secondary antibodies from Supp Table 1

      Response: Supplementary Table 1 has been updated to include both primary and HRP-conjugated secondary antibodies. In the Immunoblotting section of the Materials and Methods, we specified the antibodies used: “The following primary antibodies were used: mouse anti-Actin (C4, 0869100-CF; MP Biomedicals, Irvine, CA, USA), mouse anti-Clathrin heavy chain (610500; BD Biosciences, Franklin Lakes, NJ, USA), rat anti-GFP (GF090R; Nacalai, 04404-84), rabbit anti-MAP7 (SAB1408648; Sigma-Aldrich, St. Louis, MO, USA), rabbit anti-MAP7 (C2C3, GTX120907; GeneTex, Irvine, CA, USA), and mouse anti-α-tubulin (DM1A, T6199; Sigma-Aldrich). Corresponding HRP-conjugated secondary antibodies were used for detection: goat anti-mouse IgG (12-349; Sigma-Aldrich), goat anti-rabbit IgG (12-348; Sigma-Aldrich), and goat anti-rat IgG (AP136P; Sigma-Aldrich). Detailed information for all primary and secondary antibodies is provided in Supplementary Table 1.” (page 19, lines 14-22)

      - Pg18Line2: Please include in the text the vendor and catalog number for the Bouin's

      Response: The text has been updated to indicate that Bouin’s solution was prepared in-house

      - Pg18Line3: Please include in the text the catalog number for the CREST-coated glass slides

      - Pg18Line7: Please include in the text the catalog number for the OCT compound

      - Pg18Line11: Please include in the text the vendor and catalog number for the Donkey Serum

      - Pg18Line11: Please include in the text the vendor and catalog number for the Goat Serum

      Response: The text now includes vendor and catalog information for all these reagents, including CREST-coated slides (SCRE-01; Matsunami Glass, Osaka, Japan), OCT compound (4583; Sakura Finetechnical, Tokyo, Japan), donkey serum (017-000-121; Jackson ImmunoResearch Laboratories, PA, USA), and goat serum (005-000-121; Jackson ImmunoResearch Laboratories).

      - Pg18Line13: Thank you for including an organized and detailed table for the primary antibodies used, please also make either a similar table or expand the current table to include secondary antibody information

      Response: We thank the reviewer for the suggestion. Supplementary Table 1 already includes information for the antibodies used for immunoblotting, and we have now added information for the Alexa Fluor-conjugated secondary antibodies used for immunofluorescence in this study.

      - Pg18Line18: Please include in the text the vendor and catalog number for the DAPI

      Response: The text has been updated to include the vendor and catalog number for DAPI (D9542; Sigma-Aldrich).

      - Pg18Line19: Please also include information about the objectives used including catalog numbers, detectors used (PMT vs HyD)

      Response: We thank the reviewer for the suggestion. The following information has been added to the Histological analysis section in Materials and Methods: “Objectives used were HC PL APO 40×/1.30 OIL CS2 (11506428; Leica) and HC PL APO 63×/1.40 OIL CS2 (11506350; Leica), with digital zoom applied as needed for high-magnification imaging. DAPI was detected using PMT detectors, while Alexa Fluor 488, 594, and 647 signals were captured using HyD detectors. Images were acquired in sequential mode with detector settings adjusted to prevent signal bleed-through.” (page 20, lines 13-17)

      - Pg18Line23: Please cite in the text the reference paper for Fiji (Schindelin et al. 2012 Nature Methods PMID: 22743772) and note the version of Fiji used

      - Pg18Line24: Please note the version of Aivia used

      Response: We have revised the text accordingly by citing the reference paper for Fiji (Schindelin et al., 2012, Nature Methods, PMID: 22743772) and noting the version used (v.2.16/1.54p). In addition, we have added the version of Aivia used in this study (version 14.1).

      - Pg18Line25: If possible, please use a more robust and reliable system than Microsoft Excel to do statistics (Graphpad Prism, Stata, R, etc), if this is not possible please note the version of Microsoft Excel used

      Response: We appreciate the reviewer’s suggestion. For basic statistical analyses such as the Student’s t-test, we used Microsoft Excel (Microsoft Office LTSC Professional Plus 2021), which has been sufficient for these standard calculations. For more advanced analyses, including ANOVA and single-cell RNA-seq analyses, we used R. These details have now been added to the text.

      - Pg18Line25: Please cite in the text the reference paper for R (R Core Team 2021 R Foundation for Statistical Computing "R: A Language and Environment for Statistical Computing") and note the version of R used

      - Pg18Line25: Please note the specific R package with version used to do ANOVA, and cite in the text the reference for this package

      Response: We have cited the reference for R (R Core Team, 2021. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria) and noted the version used (version 4.4.0) in the text. In addition, regarding ANOVA, we have added the following description: “For ANOVA analysis, linear models were fitted using the base stats package (lm function), and analysis of variance was conducted with the anova function.” (page 20, lines 23-25)

      - Pg18Line25: Please clarify, was a R package called "AVNOVA" used to do ANOVA or is this a typo?

      Response: We thank the reviewer for pointing this out. It was a typographical error — the correct term is “ANOVA”. The text has been corrected accordingly.

      - Pg18Line32: Please include in the text the catalog number for the EPON 812 Resin

      - Pg19Line3: Please include the version number for Stacker Neo

      - Pg19Line5: Please include the vendor and version number for Amira 2022

      - Pg19Line5: Please include the version number for Microscopy Image Browser

      - Pg19Line5: Please include the version number for MATLAB that was used to run Microscopy Image Browser

      Response: We added the catalog number for the EPON 812 resin and the vendor and version information for the software used. The following details have been included in the revised text:

      EPON 812 resin: TAAB Embedding Resin Kit with DMP-30 (T004; TAAB Laboratory and Microscopy, Berks, UK)

      Stacker Neo: version 3.5.3.0; JEOL

      Amira 2022: version 2022.1; Thermo Fisher Scientific

      Microscopy Image Browser: version 2.91

      Note that although Microscopy Image Browser is written in MATLAB, we used the standalone version that does not require a separate MATLAB installation.

      - Pg19Line: 9-10: Please include in the text the catalog number for the complete protease inhibitor

      - Pg19Line14: Please include in the text the catalog number for the Magnetic Agarose Beads

      - Pg19Line16: Please include in the text the catalog number for the GFP-Trap Magnetic Agarose Beads

      Response: We have added the catalog numbers for the complete protease inhibitor (4693116001), control magnetic agarose beads (bmab), and GFP-Trap magnetic agarose beads (gtma).

      - Pg19Line21: Please note in the text which primary antibodies and secondary antibodies from Supp Table 1

      - Pg19Line21-22: Please include in the text the catalog number for the ECL Prime

      Response: We thank the reviewer for the helpful suggestions. The description regarding immunoblotting (“Eluted samples were separated by SDS–PAGE, transferred to PVDF membranes…”) was reorganized: overlapping content has been removed, and the necessary information has been integrated into the “Immunoblotting” section, where details of the primary and secondary antibodies (listed in Supplementary Table 1) are already provided. In addition, the information for ECL Prime has been updated to “Amersham ECL Prime (RPN2236; Cytiva, Tokyo, Japan)”.

      - Pg20Line2: Please include the version number for Xcalibur

      Response: The version of Xcalibur used in this study (version 4.0.27.19) has been added to the text.

      - Pg20Line5: Please cite in the text the reference paper for SWISS-PROT (Bairoch and Apweiler 1999 Nucleic Acid Research PMID: 9847139)

      Response: The reference paper for SWISS-PROT (Bairoch and Apweiler, 1999, Nucleic Acids Research, PMID: 9847139) has been cited in the text.

      - Pg19Line26: Please include in the text the catalog number for the NuPAGE gels

      - Pg19Line28: Please include in the text the catalog number for the SimpleBlue SafeStain

      Response: Both catalog numbers have been added in the Mass spectrometry section as follows: 4–12% NuPAGE gels (NP0321PK2; Thermo Fisher Scientific) and SimplyBlue SafeStain (LC6060; Thermo Fisher Scientific).

      - Pg20Line26: Please include in the text the catalog number for the Chromium Singel Cell 3' Reagent Kits v3

      Response: The catalog number for the Chromium Single Cell 3′ Reagent Kits v3 (PN-1000075; 10x Genomics) has been added to the text.

      - Pg21Line3: Please cite in the text the reference paper for R (R Core Team 2021 R Foundation for Statistical Computing "R: A Language and Environment for Statistical Computing")

      Response: The reference for R (R Core Team, 2021. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria) has already been cited in the “Histological analysis” section, where ANOVA analysis is described.

      - Pg21Line3 Please cite in the text the reference for RStudio (Posit team (2025). RStudio: Integrated Development Environment for R. Posit Software, PBC, Boston, MA. URL http://www.posit.co/.)

      Response: The reference for RStudio (Posit team, 2025. RStudio: Integrated Development Environment for R. Posit Software, PBC, Boston, MA, USA. URL: http://www.posit.co/) has been added to the text.

      - Pg21Line23: Please include the version number for Metascape

      Response: The version of Metascape used in this study (v3.5.20250701) has been added to the text.

      - SuppFig12: please update the legend to include a description after the title and update the figure labeling to correspond to the legend. Also, this figure is currently not referenced anywhere in the text.

      Response: We have updated the legend for Supplemental Figure 12 (Supplemental Figure 13) to include a descriptive sentence after the title and have adjusted the figure labeling to match the legend. The revised legend now reads: “Full-scan images of the agarose gels shown in Supplemental Figs. 1B and 2C are displayed in the upper and lower left panels, respectively, while the corresponding full-scan images of the immunoblots shown in Supplemental Figs. 1C and 2D are presented in the upper and lower right panels, respectively.”

      As these images serve as source data, they are not referenced directly in the main text.

      _Referee cross-commenting_

      I generally agree with Reviewer 1 and specifically concur related to adding details about fertility assessment of the Map7 Knockout line, and enhancing the SEM imaging.

      Response: As noted in our response to Reviewer #1, we have re-acquired the SEM images in high-resolution mode, focusing on the relevant regions. The new high-resolution images have replaced the original panels in revised Figure 3C, providing clearer visualization of junctional structures at P10 and P21 in Map7+/- and Map7-/- testes. The original Figure 3C images have been moved to Supplemental Figure 4B for reference.

      Reviewer #2 (Significance):

      There are mouse lines, and datasets that will be useful resources to the field. This work also advances our understanding of a period in Sertoli cell development that is critical to fertility but very understudied.

      Response: We thank the reviewer for the positive comments and for recognizing the potential value of our mouse lines and datasets to the field, as well as the significance of our work in advancing the understanding of this critical but understudied period in Sertoli cell development.

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:

      The manuscript titled "Unravelling the Progression of the Zebrafish Primary Body Axis with Reconstructed Spatiotemporal Transcriptomics" presents a comprehensive analysis of the development of the primary body axis in zebrafish by integrating bulk RNA-seq, 3D images, and Stereo-Seq. The authors first clearly demonstrate the application of Palette for integrating RNA-seq and Stereo-Seq using published spatial transcriptomics data of Drosophila embryos. Subsequently, they produced serial bulk RNA-seq data for certain developmental stages of Danio rerio embryos and utilized published Stereo-Seq data. Through robust validation, the authors observe the molecular network involved in AP axis formation. While the authors show that integrating bulk RNA-seq data with Stereo-Seq improves spatial resolution, additional proof is required to demonstrate the extent of this improvement.

      Response: We thank the reviewer for the positive feedback on our Palette pipeline, zSTEP construction and analysis of primary body axis development. We appreciate the constructive suggestions provided, which we can implement to improve our manuscript. As pointed out by the reviewer, some analysis procedures were not described in sufficient detail. To address this, we have added more explanatory texts and additional schematic diagrams to make the methods clearer and more understandable. We also thank the reviewer for the meticulous reading and for reminding us to include parameters, references and essential texts, which significantly improve the manuscript quality and make the manuscript more rigorous. Furthermore, as suggested by the reviewer, the extent of the improvement on the spatial resolution was not clearly demonstrated in the manuscript. Therefore, we have provided an additional figure to show the original expression on the stacked Stereo-seq slices and 3D live image compared to the expression from zSTEP, and the results indicate that zSTEP provides better, more continuous expression patterns. We still have two remaining tasks that are expected to be completed within the next month. We hope our responses have address the concerns raised by the reviewer, and we are pleased to provide any additional proof as needed.

      Major Comments:

      1. Lines 66-68: Discuss the limitations of existing tools and explicitly state the advantages of using Palette.

      Response: We thank the reviewer for the valuable suggestion. We have added the following new texts after line 68 to emphasize the features and advantages of Palette.

      "Newly developed tools are committed to integrating bulk and/or scRNA-seq data with ST data to enhance spatial resolution, focusing on expression at the spot level. However, gene expression patterns are closely correlated to the biological functions and are more critical for understanding biological processes. Therefore, a tool focusing on inferring spatial gene expression patterns would be desirable."

      1. Body Pattern Genes Analysis: For both Drosophila and Danio rerio, it would be valuable to examine body pattern genes in Stereo-Seq and apply Palette to determine if the resolution of the segments improves or merges. The resolution of the A-P axis is convincing, but further evidence for other segments would be beneficial.

      Response: We thank the reviewer for the suggestions. For the Drosophila data, we only used two adjacent slices for Palette performance assessment, and thus were only able to evaluate the expression patterns within the slice.

      For the zebrafish data, although we have construct zSTEP as a 3D transcriptomic atlas, we have to admit that the left-right (LR) and dorsal-ventral (DV) patterning is not satisfactory enough. Here we show a section from the dorsal part of 16 hpf zSTEP that displays a relatively well-defined left-right pattern (Fig. 2). Along the left-right axis, the notochord cells are centrally located, flanked by somite cells on either side, with the outermost cells being pronephros.

      One reason for the limited LR and DV patterning is that the original annotation of the ST data does not clearly distinguish all the cell types. Another reason is likely due to the disordered cell positions when stacking ST slices. Thus, our zSTEP is most suitable for investigating the AP patterns, while the performances on LR and DV patterns may not achieve the same level of accuracy.

      See response letter for the figure.

      1. Figure 2d: Include the A-P line for which the intensity profile was plotted in the main figure, rather than just in the supplementary material. Additionally, consider simplifying the plot by not combining three lines into one, as it complicates the interpretation of observations.

      Response: We thank the reviewer for the helpful suggestions. We have updated Figure 2d and Figure S1b by adding a A-P line on each subfigure (Fig. 3). Additionally, as the reviewer suggested, we have separated the intensity plots so that each subfigure now includes a dedicated intensity plot along A-P axis.

      See response letter for the figure.

      1. Drosophila Data Analysis: While the alignment and validation of Danio rerio sections are clearly explained, the analysis and validation of Drosophila data are insufficiently detailed. Provide a more thorough explanation of how the intensity profiles between BDGP in situ data and Stereo-Seq data are adjusted.

      Response: We thank the reviewer for raising this issue. To make the analysis procedure clearer, we have updated Figure 2a (Fig. 4) and added explanatory texts in the figure legends to describe the processing procedure for the Drosophila ST data.

      See response letter for the figure.

      Additionally, the following sentences have been added into the Methods section to describe the generation of the intensity profiles.

      "The intensity plot profiles along AP axis were generated through the following steps: The expression pattern plot images or in situ hybridization images were imported into ImageJ and converted to grayscale. The colour was then inverted, and a line of a certain width (here set as 10) was drawn across from the anterior part to the posterior part (Fig. S1a). The signal intensities along the width of the line were measured and imported into R for generating intensity plots."

      1. Figure 3d: Present a plot with the expected expression profiles of the three genes if the embryo is aligned as anticipated.

      Response: We thank the reviewer for this helpful suggestion, which improves the clarity of our manuscript. We have added the following subfigure in as Figure 3d (Fig. 5) to show the expected expression profiles of the three midline genes along left-right axis.

      See response letter for the figure.

      1. Analysis Without Palette: Between lines 277-438, the outcome of using Palette with bulk RNA-seq and Stereo-Seq is convincing. However, consider the following:

      o What would be the observations if the analysis were conducted solely with Stereo-Seq data, without incorporating bulk RNA-seq data and employing Palette?

      Response: We thank the reviewer for raising this important question. Here we show the comparison of ST expression on stacked Stereo-seq slices, ST expression projected on 3D live images, and the Palette-inferred expression (Fig. 6). The stacked ST slices do not fully reflect the zebrafish morphology, and the gene expression appears sparse, making it look massive (the first row). While after projecting ST expression onto the live image, the expression patterns can be observed on zebrafish morphology, but the expression is still sparsely distributed in spots (the second row). However, the expression patterns captured by Palette in zSTEP show more continuous expression patterns (the third row), which are more similar to the observations in in situ hybridization images (the fourth row). We are considering put these analyses into the supplementary figure.

      See response letter for the figure.

      o This study uses only Stereo-Seq as the spatial transcriptomics reference. It would strengthen the argument to use at least one other spatial transcriptomics method, such as Visium or MERFISH, in conjunction with bulk RNA-seq and Palette, to demonstrate whether Palette consistently improves gene expression resolution.

      Response: We thank the reviewer for raising this professional question. To demonstrate a broad application of Palette, it would be necessary to test Palette performance using different types of ST references. We plan to perform extra analyses to evaluate Palette performance using Visium and MERFISH data as ST references, respectively. Additionally, our Palette pipeline only takes the overlapped genes for inference. As only hundreds of genes can be detected by MERFISH, Palette can only infer the expression patterns of these genes. As mentioned in the work of Liu et al. (2023), MERFISH can independently resolve distinct cell types and spatial structures, and thus we believe Palette will also show great performance when using MERFISH as ST reference. We've already started the analyses and expect to accomplish it within the next month. And we will update the analyses as separated tutorials to the GitHub repository.

      Reference:

      Liu, J. et al. Concordance of MERFISH spatial transcriptomics with bulk and single-cell RNA sequencing. Life Sci Alliance 6 (2023).

      1. PDAC Data Analysis: Provide a more detailed explanation of the PDAC data analysis and use appropriate colors in the tissue images to clearly distinguish cell types.

      Response: We thank the reviewer for the suggestions. We have updated the colours used in the tissue images to be consistent to the colours in tissue clustering analysis. Additionally, we have added an additional subfigure in supplementary figure (Fig. 7) with more explanatory texts in the figure legends to provide a more thorough explanation for the analysis.

      See response letter for the figure.

      1. Comparison with Other Methods: State the limitations of not using STitch3D and Spateo for alignment and explain why these methods were not employed.

      Response: We thank the reviewer for raising this constructive comment. We fully agree with you that the introduction of published alignment algorithms would be helpful in our analysis. Currently, the slice alignment is adjusted manually, and thus the main limitation of not using these tools is that manual operation may induce bias compared to the alignment generated by computational algorithm. Unfortunately, STitch3D and Spateo are not included in this study because of two reasons. First, these two newly developed tools have been recently posted, and our analyses were largely completed before that. Therefore, we only mentioned these tools in the Discussion section. Second, we do not want to embed too many external tools into our analysis, which may increase the difficulties for researchers' operation. Specifically, STitch3D and Spateo are configured to run in Python environment, while Palette is based on R packages. Moreover, without these tools, our current manual alignment also achieves desired performance. However, we value this enlightening suggestion by the reviewer and therefore plan to further compare the performance of manual alignment versus the mentioned two alignment tools. At present, we have a preliminary comparison scheme and collected relevant datasets. Hopefully, we will complete this analysis within the next 1 to 2 weeks.

      Minor Comments:

      1. References: Add references to the statements in lines 51-53.

      Response: We thank the reviewer for reminding us of the missing references. We have added the works of Junker et al. (2014), Liu et al. (2022), Chen et al. (2022), Wang et al. (2022), Shi et al. (2023) and Satija et al. (2015) as references in line 53 as follows.

      "Thus, great efforts are ongoing to construct gene expression maps of these models with higher resolution, depth, and comprehensiveness1-6."

      References:

      1. Junker, J.P. et al. Genome-wide RNA Tomography in the zebrafish embryo. Cell 159, 662-675 (2014).
      2. Liu, C. et al. Spatiotemporal mapping of gene expression landscapes and developmental trajectories during zebrafish embryogenesis. Dev Cell 57, 1284-1298 e1285 (2022).
      3. Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777-1792 e1721 (2022).
      4. Wang, M. et al. High-resolution 3D spatiotemporal transcriptomic maps of developing Drosophila embryos and larvae. Dev Cell 57, 1271-1283 e1274 (2022).
      5. Shi, H. et al. Spatial atlas of the mouse central nervous system at molecular resolution. Nature 622, 552-561 (2023).
      6. Satija, R. et al. Spatial reconstruction of single-cell gene expression data. Nature biotechnology 33, 495-502 (2015)
      1. Scientific Name Consistency: Ensure consistency in using either "Danio rerio" or "zebrafish" throughout the manuscript.

      Response: We thank the reviewer for this suggestion. We have changed "Danio rerio" to "zebrafish" to make "zebrafish" consistent throughout the manuscript.

      1. Related References: Include the following relevant references:

      o https://academic.oup.com/bib/article/25/4/bbae316/7705532

      o https://www.life-science-alliance.org/content/6/1/e202201701

      Response: We thank the reviewer for bringing these two relevant works to us. Baul et al. (2024) presented STGAT leveraging Graph Attention Networks for integrating spatial transcriptomics and bulk RNA-seq, and Liu et al. (2023) demonstrated the concordance of MERFISH ST with bulk and single-cell RNA-seq. Both are excellent works and relevant to our work. We have added these two references in line 61 and line 68, respectively.

      References:

      Baul, S. et al. Integrating spatial transcriptomics and bulk RNA-seq: predicting gene expression with enhanced resolution through graph attention networks. Brief Bioinform 25 (2024).

      Liu, J. et al. Concordance of MERFISH spatial transcriptomics with bulk and single-cell RNA sequencing. Life Sci Alliance 6 (2023).

      1. Figure 1a: In the Venn diagram, include the number of genes in the bulk and Stereo-Seq datasets, as well as the number of overlapping genes.

      Response: We thank the reviewer reminding us to include these important numbers. And in our current manuscript, we have added the following sentences in the Methods section to provide the gene numbers (Fig. 8). While the Venn diagram in Figure 1a serves as a schematic representation, so we did not include the gene numbers, as these may vary depending on the actual data.

      "Palette was performed on the aligned slices using the overlapped genes. For the 10 hpf embryo, there were 24,658 genes in the bulk data, 18,698 genes in the Stereo-seq data, and 16,601 overlapped genes. For the 12 hpf embryo, there were 23,018 genes in the bulk data, 18,948 genes in the Stereo-seq data, and 16,401 overlapped genes. For the 16 hpf embryo, there were 24,357 genes in the bulk data, 23,110 genes in the Stereo-seq data, and 19,539 overlapped genes."

      See response letter for the figure.

      1. Figure 1 Improvement: Enlarge Figure 1 and reduce repetitive elements, such as parts of the deconvolution and Figure 1b.

      Response: We thank the reviewer for the helpful suggestion. We agree with the reviewer that the deconvolution sections appear repetitive. We have updated Figure 1 (Fig. 9) by replacing these repetitive elements with a clearer and simpler diagram.

      See response letter for the figure.

      1. Figure 3f: Explain the black discontinuous line in the plot.

      Response: We thank the reviewer for the reminder. We are sorry about the lack of the explanation. We have added the below explanation for the black discontinuous line in the legend of Figure 3 (Fig. 10) as follows.

      See response letter for the figure.

      1. Line 610: State the percentage of unpaired imaging spots.

      Response: We thank the review for the reminder. We are sorry about not including the paired and unpaired spot number. We have added the number of paired spots with the percentage in the total spots in the Method section as follows.

      "The numbers of mapped spots for the 10 hpf, 12 hpf and 16 hpf embryos are 15,379 (69.4% of the total spots), 14,697 (70.5% of the total spots) and 21,605 (77.2% of the total spots), respectively."

      1. Lines 616-618: Specify the unit for the spot diameter.

      Response: We thank the reviewer for the reminder. Again, we are sorry about not including the spot diameter information in our previous version of manuscript. We have added the spot diameter in Method section as follows.

      "In the Stereo-seq data, each spot contained 15 × 15 DNA nanoball (DNB) spots (The diameter of each spot is near 10 μm)."

      Reviewer #1 (Significance):

      This algorithm will be useful not only for the field of developmental biology but also for wider applications in spatial omics. Although I have expertise in spatial omics technology development, my understanding of computational biology is limited, which restricts my ability to fully evaluate the Palette algorithm presented in this paper.

      Response: We thank the reviewer for recognizing our work, and we greatly appreciate the constructive suggestions from the reviewer. Although the reviewer acknowledged limited expertise in computational biology, the comments from the reviewer are highly professional and valuable. Following the suggestions from the reviewer, we have not only included more explanatory texts and figures to make the analysis procedures clearer and more understandable, but also supplemented the important parameters that were missing in our previous manuscript. We also provided extra figure to demonstrate the improvements of zSTEP on gene expression patterns. We believe that our work is now more scientific and more understandable, and we will continue working to solve the remaining issues as planned. We express our thanks for the reviewer again.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The authors of the study introduce the Palette method, a novel approach designed to infer spatial gene expression patterns from bulk RNA-sequencing (RNA-seq) data. This method is complemented by the development of the DreSTEP 3D spatial gene expression atlas of zebrafish embryos, establishing a comprehensive resource for visualizing gene expression and investigating spatial cell-cell interactions in developmental biology.

      Response: We sincerely appreciate the reviewer's positive feedback on our Palette pipeline and the zSTEP 3D spatial expression atlas of zebrafish embryos. We also thank the reviewer for the professional comments and constructive suggestions. The reviewer raised the concerns from the aspect of algorithm design and computational biology, which we did not address well in our previous manuscript. We agree with the reviewer that we did not clarify the selection criteria of the parameters in detail, and we are now working on the additional analyses to address this issue.

      We also agree with the reviewer that we did not provide enough discussion of the strategies used in the pipeline, the features of Palette and the application scenarios of Palette and zSTEP. For wide use of our tools, it is significantly important to state these aspects. In this revised version, we have added more paragraphs in the Discussion section to address this issue. Additionally, we acknowledge that we did not adequately demonstrate the computational efficacy and computational requirements, which are important for researchers. We are also working on the additional analyses to address this issue.

      Finally, we thank the reviewer again for the professional and constructive suggestions. These suggestions are addressable, and by following them, we believe our manuscript will see a significant improvement, especially in the Palette pipeline part, making the pipeline more rigorous and easier to access. We are confident that we can complete the planned additional tasks within the next 1-2 months.

      1. The efficacy of the Palette method may be compromised by its dependency on the quality of the reference spatial transcriptomics data. As highlighted in the study, variations in data quality can lead to significant challenges in reconstructing accurate spatial expression patterns from bulk data. This underscores the necessity of evaluating quality parameters, such as the number of gene detections and spatial resolution, to ensure reliable outcomes. Additional studies should rigorously assess how these quality factors influence the accuracy and efficiency of the algorithm in various data contexts, particularly under diverse conditions of gene detection.

      Response: We thank the reviewer for this valuable suggestion. We agree with the reviewer that the quality of the reference ST data may greatly influence the performance and efficacy of the Palette, and we have added paragraphs in the Discussion section to further discuss the impact of ST data quality on Palette performance. As mentioned by the reviewer, gene detections and spatial resolution are two important parameters that can influence the Palette performance. Low gene detection may impact the clustering process, making the cell types of spots not distinguished well. To evaluate the performance of Palette when ST data shows low gene detection, we plan to applied Palette using MERFISH data as the ST reference, which only captures hundreds of genes. Moreover, we will also investigate the impact of spatial resolution on Palette performance by merging ST spots to simulate lower resolution scenarios, as well as the impact of gene detection by randomly reducing detected genes. Through the comparison among the inferred expression patterns with ST data of different spatial resolutions or different numbers of detected genes, we can better access the performance of Palette and provide guidance to researchers on the appropriate ST data requirements for optimal performance. These analyses will take another one month to accomplish after this round of revision due to the limited response time.

      1. The methodology raises pertinent questions regarding how the clustering results from different algorithms may affect the reconstructions by the Palette method. The authors would better provide a detailed discussion/comparison of clustering processes that optimize the reconstruction of spatial patterns, ensuring precision in the downstream analyses.

      Response: We thank the reviewer for the constructive comments. We agree with the reviewer that the differences in clustering results would impact the inference of the Palette. In our Palette pipeline, rather than develop a new methodology for clustering, we employ the BayesSpace for spot clustering, which considers both spot transcriptional similarity and neighbouring structure for clustering. In this case, researchers may adjust the parameters in the BayesSpace package to achieve optimal clustering results. Actually, in most cases, the spot identities were achieved through UMAP analysis, which only considers the transcriptional differences but does not consider the spatial information. This kind of clustering strategy will potentially lead to an intricate arrangement of spots belonging to different clusters, and may result in sparse gene expression in Palette outcome, which is different from the patterns in bona fide tissues. Therefore, a suitable clustering strategy will definitely help capture the local patterns.

      Moreover, our Palette pipeline also can use the clustering results from the tissue histomorphology. Using tissue histomorphology for clustering would be a good choice, as it is closer to the real case. The following Figure (Fig. 11) displays the Palette performance on PDAC datasets using both spatial clustering and histomorphology clustering strategies. The result using histomorphology clustering captures the weak pattern (indicated by the red circle) that were missed when using the spatial clustering (Fig. 11d).

      See response letter for the figure.

      1. The choice to utilize only highly expressed genes in the initial stages of the Palette algorithm also warrants further exploration. Addressing the criteria for determining which genes qualify as "highly expressed" and outlining robust cutoff will enhance the algorithm's rigor and applicability. Similarly, in the iterative estimation of gene expression across spatial spots, establishing optimal iteration conditions is crucial. Implementing a loss function may offer a systematic method for concluding iterations, thus refining computational efficiency.

      Response: We thank the reviewer for the professional suggestions. As pointed out by the reviewer, the selection of highly expressed genes and the iteration times are two important parameters in our pipeline. The definition of highly expressed genes and the number of highly expressed genes are important for achieving a satisfactory clustering performance. We tested the impact of different numbers of highly expressed genes on cluster performance in our preliminary analyses, while we did not summarize these tests and specify the parameters. Therefore, we plan to include a supplementary figure showing the clustering performances under different definitions of highly expressed genes and different numbers of highly expressed genes. Additionally, for the iteration conditions, we have tested different iteration numbers to find out a suitable iteration number to achieve a stable expression in each spot. The following figure (Fig. 1) shows the results after performing Palette with different iteration times. We randomly selected 20 cells and compared their expression across tests with varying iteration times. The results indicate that for a ST dataset with 819 spots, the expression in each spot becomes nearly stable after 5000 iteration times. We previously did not consider the computational efficiency, while here the reviewer raises a valuable and professional suggestion to implement a loss function to determine the optimal number of iterations. We greatly appreciate this suggestion, and plan to apply a loss function to summarize the optimal iteration times for ST datasets of different sizes. This will provide guidance for potential researchers in selecting iteration times and enhance computational efficiency.

      See response letter for the figure.

      1. Performance metrics relating to processing speed and computational demands remain inadequately addressed in the current framework. Understanding how the Palette method scales across varying gene counts and bulk RNA-seq datasets will be essential for potential applications in larger biological contexts. Notably, the quantitative demands of analyzing 20,000 genes when processing 10, 100, or 1,000 bulk RNA profiles must be articulated to guide researchers in planning accordingly.

      Response: We thank the reviewer for this valuable and professional suggestion. In our previous analyses, we did not consider the computation efficiency, processing speed and computational demands, which are important information for potential researchers. To address this issue, we will list our computer configuration first. And under this configuration, we plan to run Palette on datasets with different numbers of overlapped genes or ST references with varying spot numbers, and then summarize the running times into a metrics table. This will help researchers estimate the running time for their datasets and guide them in planning the analyses. We will begin the analyses soon and expect to complete the analysis within the next 1 to 2 months.

      Minor opinions:

      1. Despite the promising advances offered by the zebrafish 3D reconstruction, there is a lack of details regarding numbers of the spatial transcriptomics (ST) data utilized, and the number of bulk RNA-seq data employed in the analyses. These parameters need to be clarified.

      Response: We thank the reviewer for reminding us of these parameters. We are sorry for not including these parameters in our previous manuscript. We have now included the numbers of bulk, ST and overlap genes in the Methods section as follows (Fig. 12).

      "Palette was performed on the aligned slices using the overlapped genes. For the 10 hpf embryo, there were 24,658 genes in the bulk data, 18,698 genes in the Stereo-seq data, and 16,601 overlapped genes. For the 12 hpf embryo, there were 23,018 genes in the bulk data, 18,948 genes in the Stereo-seq data, and 16,401 overlapped genes. For the 16 hpf embryo, there were 24,357 genes in the bulk data, 23,110 genes in the Stereo-seq data, and 19,539 overlapped genes."

      See response letter for the figure.

      1. Issues regarding spatial cell-cell communication, especially concerning interactions over longer distances, necessitate careful consideration. Introducing spatial distance constraints could help formulate more realistic models of cellular interactions, a vital aspect of embryonic development.

      Response: We thank the reviewer for this essential comment. We agree with the reviewer that the spatial distance is an essential factor to investigate in vivo cell-cell communication during embryonic development. Therefore, in our analyses, we employed CellChat for spatial cell-cell communication analysis, which can be used to infer and visualize spatial cell-cell communication network for ST datasets, considering the spatial distance as constrains of the computed communication probability. However, during our analyses, we observed that there were interactions between cell types over longer distances, as mentioned by the reviewer. We then investigated how these interactions of longer distances occurred. Here, we show the FGF interaction between tail bud and neural crest cells from our spatial cell-cell analysis as an example, and the distance between these two cell types appears quite significant (Fig. 13). We labelled tail bud cells and neural crest cells on the selected midline section and observed that, although most neural crest cells are distributed anteriorly, a small number of neural crest cells are located at tail, close to the tail bud cells. Therefore, the observed interaction between tail bud and neural crest cells is likely due to their adjacent distribution in the tail region, while the anteriorly distributed of neural crest spot in spatial cell-cell communication analysis reflects the anterior positioning of most neural crest cells. As a result, the distances shown on the spatial cell-cell communication analysis are not the real distance between two cell types.

      In most cases in our spatial cell-cell communication analyses, the observed interactions over longer distances are likely influenced by this visualization strategy. Additionally, pre-processing the dataset may enhance the performance of the analyses. Here we performed systematic analyses of the entire embryo, which can make the interactions between cell types appear massive. To investigate specific biological questions, researchers can subset cell types of interest or categorize them into different subtypes based on their positions.

      See response letter for the figure.

      1. Evaluation metrics such as the Adjusted Rand Index (ARI) and Root Mean Square Error (RMSE) represent critical tools for systematically measuring the similarity of inferred spatial patterns, yet their specific application within this context should be elaborated.

      Response: We thank the reviewer for recommending these two tools. We have applied them to evaluate the similarity between the expression patterns (Fig. 14). The inclusion of these statistical values makes our comparisons of expression patterns more scientific and convincing. And we have added the following texts in the Methods section to describe the calculation of these two values.

      "The Adjusted Rand Index (ARI) and Root Mean Square Error (RMSE) were used to evaluate the similarity of the expression patterns. The expression patterns of in situ hybridization images were considered as the expected values, and the expression patterns of ST data and inferred expression patterns were compared to the expected values. Common positions along the AP axis within all three expression profiles were used, and the RMSE were calculated based on the scaled intensity of these positions. Values greater than the threshold were set to 1; otherwise, they were set to 0, and the ARI was then calculated based on the intensity category. Higher ARI and lower RMSE indicate greater similarity."

      See response letter for the figure.

      1. The study's limitations surrounding ST data quality cannot be overstated. Discussing scenarios where only limited or poor-quality ST data are available will be crucial for guiding future studies. Furthermore, a clear explanation of how enhanced specificity and accuracy translate into tangible biological insights is essential for demystifying the underlying mechanisms driving developmental processes.

      Response: We thank the reviewer for raising this essential suggestion. We have realized that in our previous manuscript, our discussion on the advantages and limitations of Palette and zSTEP was neither broad nor detailed enough.

      Therefore, in our revised manuscript, we have added the following paragraphs to further discuss the advantages and limitations of Palette and zSTEP, as well as the potential application of zSTEP in developmental biology.

      In this section, we have emphasized again the impact of ST data quality on the performance of Palette and zSTEP, and then compared Palette with the strategy that uses well-established marker genes to infer spatial information. We demonstrated that although Palette cannot achieve single cell resolution, it captures the major expression patterns, which are closely correlated to biological functions and critical for embryonic development. Furthermore, we further discussed that zSTEP is not only a valuable tool for investigating gene expression patterns, but also has the potential in evaluating the reaction-diffusion model to investigate the complicated and well-choreographed pattern formation during embryonic development.

      As here we have provided a more comprehensive discussion about Palette and zSTEP, we think that the potential researchers will better understand the application scenarios of our inference pipeline and our datasets. We hope our study can assist and inspire further research in the field of spatial transcriptomics and developmental biology.

      "Thirdly, the performance of Palette and zSTEP heavily relied on the quality of ST data. If the quality of ST data is not of sufficient quality, the low-expression genes may not be detected or only appear in very few scattered spots, and the performance of spot clustering could also be affected. Moreover, in this study, for example, the Stereo-seq data of 12 hpf zebrafish embryo had fewer slices on the right side (Fig. S3b), resulting in more blank spots in the right part of zSTEP for the 12 hpf embryo. However, with the ongoing advancements in spatial resolution and data quality, the performance of Palette is expected to be enhanced and demonstrate even greater potential for analysing spatiotemporal gene expression.

      On the other hand, compared to the brilliant strategy that infers spatial information of scRNA-seq data from well-established genes, our Palette pipeline cannot achieve single cell resolution. However, our Palette pipeline is based on the ST reference, and thus preserves the real positional relationships between spots. Furthermore, the focus of our pipeline is to infer the gene expression patterns, which are closely correlated to biological functions and critical for embryonic development, rather than the sparse expression within individual spots. In this regard, our Palette pipeline can be advantageous, as it allows for reconstruction of the major expression profiles, which are often more relevant for understanding developmental processes. Additionally, our Palette can be applied to serial sections, enabling the construction of 3D ST atlas.

      Finally, while the current analyses demonstrated that zSTEP can serve as a valuable tool for identifying genes having specific patterns at certain developmental stages, the exploration of zSTEP is still limited. During animal development, pattern formation is always one of the most important developmental issues. As demonstrated by the reaction-diffusion (RD) model, morphogen molecules are produced at specific regions of the embryo, forming morphogen gradients to guide cell specification, while interactions between different morphogens instruct more complicated and well-choreographed pattern formation. Our Palette constructed zSTEP, as a comprehensive transcriptomic expression pattern during development, could be leveraged to evaluate and prove the RD model during development, including AP patterning. Moreover, the investigation of gene expression patterns should not be limited to morphogens and TFs, and further investigation of their roles in AP patterning is desirable. Additionally, here a random forest model may be sufficient for investigating the most essential morphogens and TFs for AP axis refinement, while more sophisticated machine learning models may be required for addressing more specific biological questions."

      Reviewer #2 (Significance):

      The Palette pipeline demonstrates a marked improvement in specificity and accuracy when predicting spatial gene expression patterns. Evaluative studies on Drosophila and zebrafish datasets affirm its enhanced performance compared to existing methodologies. By effectively reconstructing spatial information from bulk transcriptomic data, the Palette method innovatively merges the philosophy of leveraging single-cell transcriptomic data for deconvolution analyses. This integration is pivotal, advancing traditional bulk RNA-seq approaches while laying the groundwork for future research.

      One of the notable achievements in this work is the construction of the DreSTEP atlas, which integrates serial bulk RNA-seq data with advanced 3D imaging techniques. This resource grants researchers unprecedented access to the visualization of gene expression patterns across the zebrafish embryo, facilitating the investigation of spatial relationships and cell-cell interactions critical for developmental processes. Such capabilities are invaluable for understanding the intricate dynamics of embryogenesis and the distinct roles of individual cell types.

      Response: We thank the reviewer for the positive evaluation of our work, either the Palette pipeline or zSTEP. The reviewer has strong expertise in algorithm development and computational biology, and the concerns and suggestions from the reviewer are significantly precious and valuable for us. Regarding the bioinformatics tool development, we did not have extensive experiences, and thus we did not thoroughly address the selection criteria or clarify the parameters used in the pipeline, which may influence the application by other researchers. Therefore, we sincerely appreciate the professional suggestions from the reviewer, which we can follow to address these issues, improve our manuscript and make our work more impactful for researchers. Additionally, we did not consider computation efficiency, processing speed and computational demands, which would be important factors for other researchers to use Palette. We would like to add extra analyses to address these aspects.

      Currently, based on the suggestions from the reviewer, we have added extra texts discussing the clustering strategy in Palette pipeline, the advantages and limitations of Palette, and the potential application of zSTEP in developmental biology. We believe that readers will now have a clearer understanding of the performance of Palette and the application scenarios of both Palette and zSTEP. We have not fully addressed the comments raised by the reviewer yet, while we are working on the planned additional analyses and expect to complete all these tasks within the next 1-2 months. We sincerely thank the reviewer for the professional and valuable suggestions, which definitely improve our work and will make it accessible for a wide range of researchers.

      Finally, through this review process, we have learned a lot about the important considerations and requirements when designing bioinformatics tools, and we benefit a lot from the thoughtful guidance. We express our thanks to the reviewer again for the guidance, and we will try our best to address the remaining issues to further improve our manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Evidence, reproducibility and clarity

      In this study, Dong and colleagues developed a computational pipeline to use spatial transcriptomics (ST) datasets as a reference to infer the spatial patterns of gene expression from bulk RNA sequencing data. This approach aims to overcome the low read depth and limited gene detection capabilities in current ST datasets, while exploiting its ability to provide highly resolved spatial information. By combining bulk RNA-seq datasets from 3 developmental stages during early zebrafish development with previously available ST and imaging datasets, the authors build DreSTEP (Danio rerio spatiotemporal expression profiles). Using this approach, they go on to identify the morphogens and transcription factors involved in anteroposterior patterning.

      The paper is well written, and the pipeline presented in this study is likely to be useful beyond the case studies included in this study. There are a few questions that, in my view, would be important to clarify to increase the impact of this work:

      Response: We sincerely appreciate the positive feedback from the reviewer on the Palette pipeline and zebrafish spatiotemporal expression profiles zSTEP. We thank the reviewer for the constructive suggestions, which have inspired us to think deeply about application and advantages of Palette and zSTEP for future studies.

      We fully agree with the reviewer that we do not sufficiently clarify the advantages and limitations of our inference pipeline in the original manuscript. The questions raised by the reviewer are very insightful. For example, while the inference expression patterns may closely resemble the in situ hybridization observation, which we consider as good performance, the reviewer pointed out that we should consider whether weak, yet real expression may have been removed. These questions have motivated us to think more deeply about the underlying principles and assumptions of our inference pipeline. Following the reviewer's questions, we have expanded our discussion on the application of zSTEP in developmental biology and the features of Palette compared to the existing strategies.

      We believe that after incorporating the revisions, our current manuscript now demonstrates the application scenario of Palette clearer and suggested the application of zSTEP for investigating biological questions in developmental biology. We are grateful for the reviewer's guidance, which helps us increase the impact of our work.

      1. The authors mention that they used a variable factor to adjust expression differences between the ST and bulk RNA-seq datasets. It would be important for the authors to comment on how much overlap in gene expression is necessary between the datasets for an accurate calculation of this variable factor? Can this be directly tested, for instance, by testing how their conclusions vary if expression is adjusted by a variable factor calculated from only a smaller set of genes?

      Response: We thank the reviewer for the professional questions. We are sorry about not including the gene numbers in our previous manuscript. And now we have provided the numbers of genes in bulk and ST data and the numbers of the overlapped genes (Fig. 15).

      "Palette was performed on the aligned slices using the overlapped genes. For the 10 hpf embryo, there were 24,658 genes in the bulk data, 18,698 genes in the Stereo-seq data, and 16,601 overlapped genes. For the 12 hpf embryo, there were 23,018 genes in the bulk data, 18,948 genes in the Stereo-seq data, and 16,401 overlapped genes. For the 16 hpf embryo, there were 24,357 genes in the bulk data, 23,110 genes in the Stereo-seq data, and 19,539 overlapped genes."

      See response letter for the figure.

      For Palette implementation, we took all the overlapped genes. To calculate the variable factor, we aggregated the expression of each gene in the ST data, and then used the expression of the bulk data to divide the aggregated expression for variable factor calculation. As a result, each overlapped gene was assigned a variable factor to adjust its expression, based on its difference between bulk and ST data. The rationale behind this approach is that by considering the ST data as a whole, we can effectively reduce the variations among individual spots. This allows the variable factors to provide reasonable adjustment to gene expression.

      Above all, the variable factors can be directly calculated. Currently Palette only can infer the expression patterns of overlapped genes. It means when the number of overlapped genes is small, such as MERFISH only detecting hundreds of genes, Palette can only infer the expression patterns of these genes. However, if the MERFISH data have good quality, which enable resolving distinct cell types, we believe Palette will also show good performance when using MERFISH as ST reference. Additionally, we plan to perform Palette using MERFISH as ST reference to further demonstrate its broad application when using different ST references.

      1. Palette gives rise to highly spatially precise patterns, which closely match those found in ISH. However, the smoothening of the expression can also remove weak, yet real, local expression patterns, as shown for idgf6 in Fig. 2a. Can the authors test this more extensively for other genes?

      Response: We thank the reviewer for this essential question. We agree with the reviewer that weak, yet real expression might be removed in our Palette inference pipeline. The weak, sparse expression may be due to the ST technique itself or the variations in samples. However, that sparse gene expression may not have biological meaning, and the focus of our pipeline in to capture the expression patterns, which are closely correlated with functions and crucial for embryonic development. Therefore, our algorithm considers spot characteristics and emphasize cluster-specific expression, resulting in spatial-specific expression patterns. In most cases, the main gene expression patterns can be captured, which can help understand gene functions and roles in embryonic development. We have updated Supplementary Figure S1a (Fig. 16) to include more gene patterns to demonstrate this point.

      See response letter for the figure.

      1. Using adjacent slices for ST and "bulk RNA-seq" may provide better results than those obtained when comparing two independent datasets. Could the authors also extend the analysis of Palette's functionalities by using separate, previously available but independent datasets, for ST and bulk RNA-seq in Drosophila as well?

      Response: We thank the reviewer for the valuable question. We agree with the reviewer that using adjacent slices may provide better results. The idea here is that the inferred spatial expression patterns from pseudo bulk RNA-seq can be used to compare with the real expression of ST to evaluate Palette performance. We have updated our Figure 2a (Fig. 17) to illustrate the analysis clearer.

      See response letter for the figure.

      To demonstrate the Palette's functionalities, we have used Palette to infer zebrafish bulk RNA-seq slice (Junker et al., 2014) using Stereo-seq slice (Liu et al., 2022) as ST reference, and these two datasets are separate and independent. We agree with the reviewer that it would be good to use separate datasets to test in Drosophila to further demonstrate the Palette's functionalities. However, unfortunately, we did not find the Drosophila serial bulk RNA-seq data along left-right axis of the corresponding stages, and thus we might be unable to perform the extra analyses using independent Drosophila datasets.

      References:

      Junker, J.P. et al. Genome-wide RNA Tomography in the zebrafish embryo. Cell 159, 662-675 (2014).

      Liu, C. et al. Spatiotemporal mapping of gene expression landscapes and developmental trajectories during zebrafish embryogenesis. Dev Cell 57, 1284-1298 e1285 (2022).

      1. The DreSTEP analysis in zebrafish embryos is interesting and validates well-established observations in the field. Can the authors also discuss whether and how their dataset allows them to refine our understanding of the spatial or temporal pattern of the morphogens and TFs involved in AP patterning? This would further validate their approach.

      Response: We appreciate the reviewer for recognition of our zSTEP and raising this valuable question, which has inspired us to think more deeply about the potential application of zSTEP in developmental biology. As the reviewer noted, our zSTEP analyses have validated well-established observations in the field. Rather than focusing on the sparse expression detected in ST data, zSTEP emphasizes the gene expression patterns that are closely correlated with biological functions and critical for embryonic development. Therefore, zSTEP can serve as a valuable tool for identifying the genes having specific patterns at certain developmental stages.

      Pattern formation is one of the most important developmental issues for all animals. The reaction-diffusion (RD) model is a widely recognized theoretical framework used to explain self-regulated pattern formation in developing animal embryos (Kondo & Miura, 2010). Morphogen molecules are produced at specific regions of the embryo, forming morphogen gradients to guide cell specification. Most importantly, interactions between different morphogens instruct more complicated and well-choreographed pattern formation. Our Palette-constructed zSTEP provides a comprehensive transcriptomic expression pattern, including all morphogens and TFs, across the whole embryo during development. These valuable resources, in our opinion, could be leveraged to evaluate and prove the RD model during development, including AP patterning. In our current zSTEP analyses, we have already identified genes that exhibit specific expression patterns along AP axis, some of which have not been fully characterized. These genes could be potential targets for further investigation into their roles in AP patterning, although they are not the primary focus of this study. Additionally, our analyses only focused on morphogens and TFs, but zSTEP can be used to investigate the expression patterns of other genes as well. Moreover, we employed a random forest model to investigate the most essential morphogens and TFs for AP axis refinement, which is one of the basic applications of zSTEP. To investigate specific biological questions of interest, it would be worth exploring the use of more sophisticated machine learning models.

      We have added the following paragraph in the Discussion section to discuss the potential application of zSTEP in future studies.

      "Finally, while the current analyses demonstrated that zSTEP can serve as a valuable tool for identifying genes having specific patterns at certain developmental stages, the exploration of zSTEP is still limited. During animal development, pattern formation is always one of the most important developmental issues. As demonstrated by the reaction-diffusion (RD) model, morphogen molecules are produced at specific regions of the embryo, forming morphogen gradients to guide cell specification, while interactions between different morphogens instruct more complicated and well-choreographed pattern formation. Our Palette constructed zSTEP, as a comprehensive transcriptomic expression pattern during development, could be leveraged to evaluate and prove the RD model during development, including AP patterning. Moreover, the investigation of gene expression patterns should not be limited to morphogens and TFs, and further investigation of their roles in AP patterning is desirable. Additionally, here a random forest model may be sufficient for investigating the most essential morphogens and TFs for AP axis refinement, while more sophisticated machine learning models may be required for addressing more specific biological questions."

      Reference

      Kondo, S. & Miura, T. Reaction-Diffusion model as a framework for understanding biological pattern formation. Science 329, 1616-1620 (2010).

      1. Can the authors comment on the limits of this inference pipeline? And how it performs as compared to single-cell RNA sequencing datasets where spatial information is inferred from well-established marker genes?

      Response: We appreciate the reviewer for this insightful question, which has inspired us to further explore the advantages and limitations of the Palette pipeline in comparison with other inference strategies. As mentioned in the Discussion section, a key limitation of the inference pipeline is its heavy reliance on the quality of ST data. It is obvious that if the quality of ST data is not of sufficient quality, the low-expression genes may not be detected or only appear in very few scattered spots. We think it is a common issue for any inference tools using ST data as the reference. However, with the ongoing advancements in spatial resolution and data quality, the performance of Palette is expected to be improved.

      As a comparison, the single-cell RNA sequencing datasets where spatial information is inferred from well-established marker genes do not face this limitation. The ground-breaking work by Satija et al. (2015) used such a strategy that combined scRNA-seq and in situ hybridizations of well-established marker genes to infer spatial location, enabling single cell resolution, as it maintains the high read depth and gene detection. One advantages of this scRNA-seq-based strategy is that it provides the transcriptomics of individual cells, rather than a combination of cell within a ST spot, although the positional relationships between cells are not real.

      However, compared to the inference from ST data, the positional relationships between cells are not directly captured. On the other hand, as the embryonic development progresses, more cell types will be specified, and the body patterning becomes more complex. In this scenario, using well-established marker gene to infer spatial information would be much more challenging. Additionally, there are not many scRNA-seq datasets of serial sections, and thus this strategy may not be used to construct 3D ST atlas.

      In contrast, our Palette inference pipeline is based on the ST data, which preserves the real positional relationships between spots. Although our inference pipeline cannot achieve single cell resolution, it focuses on the gene expression patterns rather than the sparse expression within individual spots. By applying Palette to paired serial sections, we were able to generated a 3D spatial expression atlas of zebrafish embryos, which has showed promising performance for investigating gene expression patterns and their involvement in AP patterning.

      Reference

      Satija, R. et al. Spatial reconstruction of single-cell gene expression data. Nature biotechnology 33, 495-502 (2015)

      We have updated the following paragraphs to further demonstrating the limitation of the inference pipeline in details in the Discussion section.

      "Thirdly, the performance of Palette and zSTEP heavily relied on the quality of ST data. If the quality of ST data is not of sufficient quality, the low-expression genes may not be detected or only appear in very few scattered spots, and the performance of spot clustering could also be affected. Moreover, in this study, for example, the Stereo-seq data of 12 hpf zebrafish embryo had fewer slices on the right side (Fig. S3b), resulting in more blank spots in the right part of zSTEP for the 12 hpf embryo. However, with the ongoing advancements in spatial resolution and data quality, the performance of Palette is expected to be enhanced and demonstrate even greater potential for analysing spatiotemporal gene expression.

      On the other hand, compared to the brilliant strategy that infers spatial information of scRNA-seq data from well-established genes, our Palette pipeline cannot achieve single cell resolution. However, our Palette pipeline is based on the ST reference, and thus preserves the real positional relationships between spots. Furthermore, the focus of our pipeline is to infer the gene expression patterns, which are closely correlated to biological functions and critical for embryonic development, rather than the sparse expression within individual spots. In this regard, our Palette pipeline can be advantageous, as it allows for reconstruction of the major expression profiles, which are often more relevant for understanding developmental processes. Additionally, our Palette can be applied to serial sections, enabling the construction of 3D ST atlas."

      Reviewer #3 (Significance):

      This study tackles an important challenge in biology - the difficult to resolve gene expression patterns with high spatial precision and in a high-throughput manner. By integrating sequencing datasets from previously published studies, as well as newly-generated datasets, the authors provide evidence that their novel inference pipeline enables them to obtain high-quality spatial information simply from bulk RNA-seq datasets, using ST as a reference. The development of this pipeline - Palette - is a major part of this manuscript and its applicability is validated using datasets from Drosophila and zebrafish embryos. This in an important advance for the field, but it would be nice for the authors to further comment on i) the validity of some of their approaches and how they may influence the quality of their inference, as well as, ii) potential pitfalls/limitations of this approach as compared to others available in the field. This would synthetize both previous and current findings into a conceptual and technological framework that would have a strong impact well beyond cell and developmental biology.

      Audience: This study would be relevant for a broad audience of biologists, interested in morphogen signaling, gene regulatory networks and cell fate specification.

      Expertise in zebrafish development, gastrulation, morphogen signaling and morphogenesis.

      Response: We thank the reviewer for providing the positive feedback, arising these valuable questions, which have motivated us to deeply consider the design concept and further application of Palette and zSTEP. Based on the insightful questions from the reviewer, we have added two extra paragraphs in the Discussion section to further discuss the potential application of zSTEP in developmental biology and application scenarios of the Palette pipeline. Specially, we have demonstrated that the performance of the inference pipeline relies on the spatial resolution and data quality of the ST data. We have then compared the advantages and limitations of Palette with the existing brilliant spatial inference strategy, which infers spatial information of scRNA-seq from well-established marker genes. Although our inference pipeline cannot achieve single cell resolution, it can capture the major expression patterns, which are closely correlated to functions and critical for embryonic development. We believe this will help readers gain a clearer understanding of the advantage and limitations of our pipeline compared to other tools, as well as the tasks for which Palette and our constructed zSTEP can be utilized. We express our thanks to the reviewer again for the valuable comments.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this study, Dong and colleagues developed a computational pipeline to use spatial transcriptomics (ST) datasets as a reference to infer the spatial patterns of gene expression from bulk RNA sequencing data. This approach aims to overcome the low read depth and limited gene detection capabilities in current ST datasets, while exploiting its ability to provide highly resolved spatial information. By combining bulk RNAseq datasets from 3 developmental stages during early zebrafish development with previously available ST and imaging datasets, the authors build DreSTEP (Danio rerio spatiotemporal expression profiles). Using this approach, they go on to identify the morphogens and transcription factors involved in anteroposterior patterning.

      The paper is well written, and the pipeline presented in this study is likely to be useful beyond the case studies included in this study. There are a few questions that, in my view, would be important to clarify to increase the impact of this work:

      1. The authors mention that they used a variable factor to adjust expression differences between the ST and bulk RNAseq datasets. It would be important for the authors to comment on how much overlap in gene expression is necessary between the datasets for an accurate calculation of this variable factor? Can this be directly tested, for instance, by testing how their conclusions vary if expression is adjusted by a variable factor calculated from only a smaller set of genes?
      2. Palette gives rise to highly spatially precise patterns, which closely match those found in ISH. However, the smoothening of the expression can also remove weak, yet real, local expression patterns, as shown for idgf6 in Fig. 2a. Can the authors test this more extensively for other genes?
      3. Using adjacent slices for ST and "bulk RNAseq" may provide better results than those obtained when comparing two independent datasets. Could the authors also extend the analysis of Palette's functionalities by using separate, previously available but independent datasets, for ST and bulk RNAseq in Drosophila as well?
      4. The DreSTEP analysis in zebrafish embryos is interesting and validates well-established observations in the field. Can the authors also discuss whether and how their dataset allows them to refine our understanding of the spatial or temporal pattern of the morphogens and TFs involved in AP patterning? This would further validate their approach.
      5. Can the authors comment on the limits of this inference pipeline? And how it performs as compared to single-cell RNA sequencing datasets where spatial information is inferred from well-established marker genes?

      Significance

      This study tackles an important challenge in biology - the difficult to resolve gene expression patterns with high spatial precision and in a high-throughput manner. By integrating sequencing datasets from previously published studies, as well as newly-generated datasets, the authors provide evidence that their novel inference pipeline enables them to obtain high-quality spatial information simply from bulk RNAseq datasets, using ST as a reference. The development of this pipeline - Palette - is a major part of this manuscript and its applicability is validated using datasets from Drosophila and zebrafish embryos. This in an important advance for the field, but it would be nice for the authors to further comment on i) the validity of some of their approaches and how they may influence the quality of their inference, as well as, ii) potential pitfalls/limitations of this approach as compared to others available in the field. This would synthetize both previous and current findings into a conceptual and technological framework that would have a strong impact well beyond cell and developmental biology.

      Audience: This study would be relevant for a broad audience of biologists, interested in morphogen signaling, gene regulatory networks and cell fate specification.

      Expertise in zebrafish development, gastrulation, morphogen signaling and morphogenesis.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors of the study introduce the Palette method, a novel approach designed to infer spatial gene expression patterns from bulk RNA-sequencing (RNA-seq) data. This method is complemented by the development of the DreSTEP 3D spatial gene expression atlas of zebrafish embryos, establishing a comprehensive resource for visualizing gene expression and investigating spatial cell-cell interactions in developmental biology.

      Major concerns:

      1. The efficacy of the Palette method may be compromised by its dependency on the quality of the reference spatial transcriptomics data. As highlighted in the study, variations in data quality can lead to significant challenges in reconstructing accurate spatial expression patterns from bulk data. This underscores the necessity of evaluating quality parameters, such as the number of gene detections and spatial resolution, to ensure reliable outcomes. Additional studies should rigorously assess how these quality factors influence the accuracy and efficiency of the algorithm in various data contexts, particularly under diverse conditions of gene detection.
      2. The methodology raises pertinent questions regarding how the clustering results from different algorithms may affect the reconstructions by the Palette method. The authors would better provide a detailed discussion/comparison of clustering processes that optimize the reconstruction of spatial patterns, ensuring precision in the downstream analyses.
      3. The choice to utilize only highly expressed genes in the initial stages of the Palette algorithm also warrants further exploration. Addressing the criteria for determining which genes qualify as "highly expressed" and outlining robust cutoff will enhance the algorithm's rigor and applicability. Similarly, in the iterative estimation of gene expression across spatial spots, establishing optimal iteration conditions is crucial. Implementing a loss function may offer a systematic method for concluding iterations, thus refining computational efficiency.
      4. Performance metrics relating to processing speed and computational demands remain inadequately addressed in the current framework. Understanding how the Palette method scales across varying gene counts and bulk RNA-seq datasets will be essential for potential applications in larger biological contexts. Notably, the quantitative demands of analyzing 20,000 genes when processing 10, 100, or 1,000 bulk RNA profiles must be articulated to guide researchers in planning accordingly.

      Minor opinions:

      1. Despite the promising advances offered by the zebrafish 3D reconstruction, there is a lack of details regarding numbers of the spatial transcriptomics (ST) data utilized, and the number of bulk RNA-seq data employed in the analyses. These parameters need to be clarified.
      2. Issues regarding spatial cell-cell communication, especially concerning interactions over longer distances, necessitate careful consideration. Introducing spatial distance constraints could help formulate more realistic models of cellular interactions, a vital aspect of embryonic development.
      3. Evaluation metrics such as the Adjusted Rand Index (ARI) and Root Mean Square Error (RMSE) represent critical tools for systematically measuring the similarity of inferred spatial patterns, yet their specific application within this context should be elaborated.
      4. The study's limitations surrounding ST data quality cannot be overstated. Discussing scenarios where only limited or poor-quality ST data are available will be crucial for guiding future studies. Furthermore, a clear explanation of how enhanced specificity and accuracy translate into tangible biological insights is essential for demystifying the underlying mechanisms driving developmental processes.

      Significance

      The Palette pipeline demonstrates a marked improvement in specificity and accuracy when predicting spatial gene expression patterns. Evaluative studies on Drosophila and zebrafish datasets affirm its enhanced performance compared to existing methodologies. By effectively reconstructing spatial information from bulk transcriptomic data, the Palette method innovatively merges the philosophy of leveraging single-cell transcriptomic data for deconvolution analyses. This integration is pivotal, advancing traditional bulk RNA-seq approaches while laying the groundwork for future research.

      One of the notable achievements in this work is the construction of the DreSTEP atlas, which integrates serial bulk RNA-seq data with advanced 3D imaging techniques. This resource grants researchers unprecedented access to the visualization of gene expression patterns across the zebrafish embryo, facilitating the investigation of spatial relationships and cell-cell interactions critical for developmental processes. Such capabilities are invaluable for understanding the intricate dynamics of embryogenesis and the distinct roles of individual cell types.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The manuscript titled "Unravelling the Progression of the Zebrafish Primary Body Axis with Reconstructed Spatiotemporal Transcriptomics" presents a comprehensive analysis of the development of the primary body axis in zebrafish by integrating bulk RNA-seq, 3D images, and Stereo-Seq. The authors first clearly demonstrate the application of Palette for integrating RNA-seq and Stereo-Seq using published spatial transcriptomics data of Drosophila embryos. Subsequently, they produced serial bulk RNA-seq data for certain developmental stages of Danio rerio embryos and utilized published Stereo-Seq data. Through robust validation, the authors observe the molecular network involved in AP axis formation. While the authors show that integrating bulk RNA-seq data with Stereo-Seq improves spatial resolution, additional proof is required to demonstrate the extent of this improvement.

      Major Comments:

      1. Lines 66-68: Discuss the limitations of existing tools and explicitly state the advantages of using Palette.
      2. Body Pattern Genes Analysis: For both Drosophila and Danio rerio, it would be valuable to examine body pattern genes in Stereo-Seq and apply Palette to determine if the resolution of the segments improves or merges. The resolution of the A-P axis is convincing, but further evidence for other segments would be beneficial.
      3. Figure 2d: Include the A-P line for which the intensity profile was plotted in the main figure, rather than just in the supplementary material. Additionally, consider simplifying the plot by not combining three lines into one, as it complicates the interpretation of observations.
      4. Drosophila Data Analysis: While the alignment and validation of Danio rerio sections are clearly explained, the analysis and validation of Drosophila data are insufficiently detailed. Provide a more thorough explanation of how the intensity profiles between BDGP in situ data and Stereo-Seq data are adjusted.
      5. Figure 3d: Present a plot with the expected expression profiles of the three genes if the embryo is aligned as anticipated.
      6. Analysis Without Palette: Between lines 277-438, the outcome of using Palette with bulk RNA-seq and Stereo-Seq is convincing. However, consider the following:<br /> o What would be the observations if the analysis were conducted solely with Stereo-Seq data, without incorporating bulk RNA-seq data and employing Palette?<br /> o This study uses only Stereo-Seq as the spatial transcriptomics reference. It would strengthen the argument to use at least one other spatial transcriptomics method, such as Visium or MERFISH, in conjunction with bulk RNA-seq and Palette, to demonstrate whether Palette consistently improves gene expression resolution.
      7. PDAC Data Analysis: Provide a more detailed explanation of the PDAC data analysis and use appropriate colors in the tissue images to clearly distinguish cell types.
      8. Comparison with Other Methods: State the limitations of not using STitch3D and Spateo for alignment and explain why these methods were not employed.

      Minor Comments:

      1. References: Add references to the statements in lines 51-53.
      2. Scientific Name Consistency: Ensure consistency in using either "Danio rerio" or "zebrafish" throughout the manuscript.
      3. Related References: Include the following relevant references:
      4. https://academic.oup.com/bib/article/25/4/bbae316/7705532
      5. https://www.life-science-alliance.org/content/6/1/e202201701
      6. Figure 1a: In the Venn diagram, include the number of genes in the bulk and Stereo-Seq datasets, as well as the number of overlapping genes.
      7. Figure 1 Improvement: Enlarge Figure 1 and reduce repetitive elements, such as parts of the deconvolution and Figure 1b.
      8. Figure 3f: Explain the black discontinuous line in the plot.
      9. Line 610: State the percentage of unpaired imaging spots.
      10. Lines 616-618: Specify the unit for the spot diameter.

      Significance

      This algorithm will be useful not only for the field of developmental biology but also for wider applications in spatial omics. Although I have expertise in spatial omics technology development, my understanding of computational biology is limited, which restricts my ability to fully evaluate the Palette algorithm presented in this paper.

    1. Reviewer #2 (Public review):

      Here, Hudait et al. use CG modeling to investigate the mechanism by which lenacapavir (LEN) treats HIV capsids that dock to the nuclear pore complex (NPC). However, the manuscript fails to present meaningful findings that were previously unreported in the literature, and is thus of low impact. Many claims made in the manuscript are not substantiated by the presented data. Key mechanistic details that the work purports to reveal are artifacts of the parameterization choices or simulation/analysis design, with the simulations said to reveal details that they were specifically biased to reproduce. This makes the manuscript highly problematic, as its contributions to the literature would represent misconceptions based on oversights in modeling, and thus mislead future readers.

      (1) Considering the literature, it is unclear that the manuscript presents new scientific discoveries. The following are results from this paper that have been previously reported:

      (a) LEN-bound capsid can dock to the nuclear pore (Figure 2; see e.g. 10.1016/j.cell.2024.12.008 or 10.1128/mbio.03613-24).

      (b) NUP98 interacts with the docked capsid (Figure 2; see e.g. 10.1016/j.virol.2013.02.008 or 10.1038/s41586-023-06969-7 or 10.1016/j.cell.2024.12.008).

      (c) LEN and NUP98 compete for a binding interface (Figure 2; see e.g. 10.1126/science.abb4808 or 10.1371/journal.ppat.1004459).

      (d) LEN creates capsid defects (Figure 3 and 5, see e.g. 10.1073/pnas.2420497122).

      (e) RNP can emerge from a damaged capsid (Figure 3 and 5; see e.g. 10.1073/pnas.2117781119 or 10.7554/eLife.64776).

      (f) LEN hyperstabilizes/reduces the elasticity of the capsid lattice (Figure 6; see e.g. 10.1371/journal.ppat.1012537).

      (2) The mechanistic findings related to how these processes occur are problematic, either based on circular reasoning or unsubstantiated, based on the presented data. In some cases, features of parameterization and simulation/analysis design are erroneously interpreted as predictions by the CG models.

      (a) Claim: LEN-bound capsids remain associated with the NPC after rupture. CG simulations did not reach the timescale needed to demonstrate continued association or failure to translocate, leaving the claim unsubstantiated.

      (b) Claim: LEN contributes to loss of capsid elasticity. The authors do not measure elasticity here, only force constants of fluctuations between capsomers in freely diffusing capsids. Elasticity is defined as the ability of a material to undergo reversible deformation when subjected to stress. Other computational works that actually measure elasticity (e.g., 0.1371/journal.ppat.1012537) could represent a point of comparison, but are not cited. The changes in force constants in the presence of LEN are shown in Figure 6C, but the text of the scale bar legend and units of k are not legible, so one cannot discern the magnitude or significance of the change.

      (c) Claim: Capsid defects are formed along striated patterns of capsid disorder. Data is not presented that correlates defects/cracks with striations.

      (d) Claim: Typically 1-2 LEN, but rarely 3 bind per capsid hexamer. The authors state: "The magnitude of the attractive interactions was adjusted to capture the substoichiometric binding of LEN to CA hexamers (Faysal et al., 2024). ... We simulated LEN binding to the capsid cone (in the absence of NPC), which resulted in a substoichiometric binding (~1.5 LEN per CA hexamer), consistent with experimental data (Singh et al., 2024)." This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim: "This indicates that the probability of binding a third LEN molecule to a CA hexamer is impeded, likely due to steric effects that prevent the approach of an incoming molecule to a CA hexamer where 2 LEN molecules are already associated. ... Approximately 20% of CA hexamers remain unoccupied despite the availability of a large excess of unbound LEN molecules. This suggests a heterogeneity in the molecular environment of the capsid lattice for LEN binding." These statements represent gross over-interpretation of a bias deliberately introduced during parameterization, and the "finding" represents circular reasoning. Also, if "steric effects" play any role, the authors could analyze the model to characterize and report them rather than simply speculate.

      (e) Claim: Competition between NUP98 and LEN regulates capsid docking. The authors state: "A fraction of LEN molecules bound at the narrow end dissociate to allow NUP98 binding to the capsid ... Therefore, LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm." Capsid docking occurs regardless of the presence of LEN, and appears to occur at the same rate as the LEN-free capsid presented in the authors' previous work (Hudait &Voth, 2024). The presented data simply show that there is a fluctuation of bound LEN, with about 10 fewer (<5%) bound at the end of the simulation than at the beginning, and the curve (Figure 2A) does not clearly correlate with increased NUP98 contact. In that case, no data is shown that connects LEN binding with the regulation of the docking process. Further, the two quoted statements contradict each other. The presented data appear to show that NUP outcompetes LEN binding, rather than LEN inhibiting NUP binding. The "Therefore" statement is an attempt to reconcile with experimental studies, but is not substantiated by the presented data.

      (f) Claim: LEN binding leads to spontaneous dissociation of pentamers. The CG simulation trajectories show pentamer dissociation. However, it is quite difficult to believe that a pentamer in the wide end of the capsid would dissociate and diffuse 100 nm away before a hexamer in the narrow end (previously between two pentamers and now only partially coordinated, also in a highly curved environment, and further under the force of the extruding RNA) would dissociate, as in Figure 2B. A more plausible explanation could be force balance between pent-hex versus hex-hex contacts, an aspect of CG parameterization. No further modeling is presented to explain the release of pentamers, and changes in pent-hex stiffness are not apparent in the force constant fluctuation analysis in Figure 6C.

      (g) Claim: WTMetaD simulations predict capsid rupture. The authors state: "In WTMetaD simulations, we used the mean coordination number (Figure S6) between CA proteins in pentamers and in hexamers as the reaction coordinate." This means that the coordination number, the number of pent-hex contacts, is the bias used to accelerate simulation sampling. Yet the authors then interpret a change in coordination number leading to capsid rupture as a discovery, representing a fundamental misuse of the WTMetaD method. Changes in coordination number cannot be claimed as an emergent property when they are in fact the applied bias, when the simulation forced them to sample such states. The bias must be orthogonal to the feature of interest for that feature to be discoverable. While the reported free energies are orthogonal to the reaction coordinate, the structural and stepwise-mechanism "findings" here represent circular reasoning.

      (3) Another major concern with this work is the excessive self-citation, and the conspicuous lack of engagement with similar computational modeling studies that investigate the HIV capsid and its interactions with LEN, capsid mechanical properties relevant to nuclear entry, and other capsid-NPC simulations (e.g., 10.1016/j.cell.2024.12.008 and 10.1371/journal.ppat.1012537). Other such studies available in the literature include examination of varying aspects of the system at both CG and all-atom levels of resolution, which could be highly complementary to the present work and, in many cases, lend support to the authors' claims rather than detract from them. The choice to omit relevant literature implies either a lack of perspective or a lack of collegiality, which the presentation of the work suffers from. Overall, it is essential to discuss findings in the context of competing studies to give readers an accurate view of the state of the field and how the present work fits into it. It is appropriate in a CG modeling study to discuss the potential weaknesses of the methodology, points of disagreement with alternative modeling studies, and any lack of correlation with a broader range of experimental work. Qualitative agreement with select experiments does not constitute model validation.

      (4) Other critiques, questions, concerns:

      (a) The first Results sub-heading presents "results", complete with several supplementary figures and a movie that are from a previous publication about the development of the HIV capsid-NPC model in the absence of LEN (Hudait &Voth, 2024). This information should be included as part of the introduction or an abbreviated main-text methods section rather than being included within Results as if it represents a newly reported advancement, as this could be misleading.

      (b) The authors say the unbiased simulations of capsid-NPC docking were run as two independent replicates, but results from only one trajectory are ever shown plotted over time. It is not mentioned if the time series data are averaged or smoothed, so what is the shadow in these plots (e.g., Figures 1,2, and Supplementary Figure 5)?

      (c) Why do the insets showing LEN binding in Figure 2A look so different from the models they are apparently zoomed in on? Both instances really look like they are taken from different simulation frames, rather than being a zoomed-in view.

      (d) What are the sudden jerks apparent in the SI movies? Perhaps this is related to the rate at which trajectory frames are saved, but occasionally, during the relatively smooth motion of the capsid-NPC complex, something dramatic happens all of a sudden in a frame. For example, significant and apparently instantaneous reorientation of the cone far beyond what preceding motions suggest is possible (SI movie 2, at timestamp 0.22), RNP extrusion suddenly in a single frame (SI movie 2, at timestamp 0.27), and simultaneous opening of all pentamers all at once starting in a single frame (SI movie 2, at timestamp 0.33). This almost makes the movie look generated from separate trajectories or discontinuous portions of the same trajectory. If movies have been edited for visual clarity (e.g., to skip over time when "nothing" is happening and focus on the exciting aspects), then the authors should state so in the captions.

      (e) Figure 3c presents a time series of the degree of defects at pent-hex and hex-hex interfaces, but I do not understand the normalization. The authors state, "we represented the defects as the number of under-coordinated CA monomers of the hexamers at the pentamer-hexamer-pentamer and hexamer-hexamer interface as N_Pen-Hex and N_Hex-Hex ... Note that in N_Pen-Hex and N_Hex-Hex are calculated by normalizing by the total number of CA pentamer (12) and hexamer rings (209) respectively." Shouldn't the number of uncoordinated monomers be normalized by the number of that type of monomer, rather than the number of capsomers/rings? E.g., 12*5 and 209*6, rather than 12 and 209?

      (f) The authors state that "Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate towards the high curvature ends of the capsid." The defects being reported are apparently propagating from (not towards) the high curvature ends of the capsid.

      (g) The first half of the paper uses the color orange in figures to indicate LEN, but the second half uses orange to indicate defects, and this could be confusing for some readers. Both LEN and "defects" are simply a cluster of spheres, so highlighted defects appear to represent LEN without careful reading of captions.

      (h) SI Figure S3 captions says "The CA monomers to which at least one LEN molecule is bound are shown in orange spheres. The CA monomers to which no LEN molecule is bound are shown in white spheres. " While in contradiction, the main-text Fig 2 says "The CA monomers to which at least one LEN molecule is bound are shown in white spheres. The CA monomers to which no LEN molecule is bound are shown in orange spheres. " One of these must be a typo.

      (i) The authors state that: "CG MD simulations and live-cell imaging demonstrate that LEN-treated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated to the NPC after rupture." However, the live cell imaging data do not show where rupture occurs, such that this statement is at least partially false. It is also unclear that CG simulations show that cores remain bound following rupture, given that simulations were not extended to the timescale needed to observe this, again rendering the statement partially false.

      (j) The authors state: "We previously demonstrated that the RNP complex inside the capsid contributes to internal mechanical strain on the lattice driven by CACTD-RNP interactions and condensation state of RNP complex (Hudait &Voth, 2024). " In that case, why do the present CG models detect no difference in results for condensed versus uncondensed RNP?

      (k) The authors state: "The distribution demonstrates that the binding of LEN to the distorted lattice sites is energetically favorable. Since LEN localizes at the hydrophobic pocket between two adjoining CA monomers, it is sterically favorable to accommodate the incoming molecule at a distorted lattice site. This can be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed. This also allows the drug molecule to avoid the multitude of unfavorable CA-LEN interactions and establish the energetically favorable interactions leading to a successful binding event. " What multitude of unfavorable interactions are the authors referring to? Data is not presented to substantiate the claim of increased void volume between hexamers in the distorted lattice. Capsomer distortion is shown as a schematic in Figure 6A rather than in the context of the actual model.

      (l) The authors state that "These striated patterns also demonstrate deviations from ideal lattice packing. " What does ideal lattice packing mean in this context, where hexamers are in numerous unique environments in terms of curvature? What is the structural reference point?

      (m) If pentamer-hexamer interactions are weakened in the presence of LEN, why are differences at these interfaces not apparent in the Figure 6C data that shows stiffening of the interactions between capsomer subunits?

      (n) The authors state: "Lattice defects arising from the loss of pentamers and cracks along the weak points of the hexameric lattice drive the uncoating of the capsid." The word rupture or failure should be used here rather than uncoating; it is unclear that the authors are studying the true process of uncoating and whether the defects induced by LEN binding relate in any way to uncoating.

      (o) The authors state: "LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC." But no data is presented to demonstrate that capsid stability is increased by NUP98 interaction. In fact, the presented data could suggest the opposite since capsids in contact with NUP98 in the NPC appeared to rupture faster than freely diffusing capsids.

      (p) The authors state: "LEN binding stimulates similar changes in free capsids, but they occur with lower frequency on similar time scales, suggesting that the cores docked at the NPC are under increased stress, resulting in more frequent weakening of the hexamer-pentamer and hexamer-hexamer interactions, as well as more nucleation of defects at the hexamer-hexamer<br /> Interface. ... Our results suggest that in the presence of the LEN, capsid docking into the NPC central channel will increase stress, resulting in more frequent breaks in the capsid lattice compared to free capsids." The first is a run-on sentence. The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.

      (q) The authors state: "A possible mechanistic pathway of capsid disassembly can be that multiple pentamers are dissociated from the capsid sequentially, and the remaining hexameric lattice remains stabilized by bound LEN molecules for a time, before the structural integrity of the remaining lattice is compromised." This statement is inconsistent with experimental studies that say LEN does not lead to capsid disassembly, and may even prevent disassembly as part of its disruption of proper uncoating (e.g., 10.1073/pnas.2420497122 previously published by the authors).

      (r) Finally, it remains a concern with the authors' work that the bottom-up solvent-free CG modeling software used in this and supporting works is not open source or even available to other researchers like other commonly used molecular dynamics software packages, raising significant questions about transparency and reproducibility.

    1. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Green et al. attempt to use large-scale protein structure analysis to find signals of selection and clustering related to antibiotic resistance. This was applied to the whole proteome of Mycobacterium tuberculosis, with a specific focus on the smaller set of known antibiotic-resistance-related proteins.

      Strengths:

      The use of geospatial analysis to detect signals of selection and clustering on the structural level is really intriguing. This could have a wider use beyond the AMR-focussed work here and could be applied to a more general evolutionary analysis context. Much of the strength of this work lies in breaking ground into this structural evolution space, something rarely seen in such pathogen data. Additional further research can be done to build on this foundation, and the work presented here will be important for the field.

      The size of the dataset and use of protein structure prediction via AlphaFold, giving such a consistent signal within the dataset, is also of great interest and shows the power of these approaches to allow us to integrate protein structure more confidently into evolution and selection analyses.

      Weaknesses:

      There are several issues with the evolutionary analysis and assumptions made in the paper, which perhaps overstate the findings, or require refining to take into account other factors that may be at play.

      (1) The focus on antimicrobial resistance (AMR) throughout the paper contains the findings within that lens. This results in a few different weaknesses:

      (a) While the large size of the analysis is highlighted in the abstract and elsewhere, in reality, only a few proteins are studied in depth. These are proteins already associated with AMR by many other studies, somewhat retreading old ground and reducing the novelty.

      (b) Beyond the AMR-associated proteins, the proteome work is of great interest, but only casually interrogated and only in the context of AMR. There appears to be an assumption that all signals of positive selection detected are related to AMR, whereas something like cas10 is part of the CRISPR machinery, a set of proteins often under positive selection, and thus unlikely to be AMR-related.

      (2) The strength of the signal from the structural information and the novelty of the structural incorporation into prediction are perhaps overstated.

      (a) A drop of 13% in F1 for a gain of 2% in PPV is quite the trade-off. This is not as indicative of a strong predictor that could be used as the abstract claims. While the approach is novel and this is a good finding for a first attempt at such complex analysis, this is perhaps not as significant as the authors claim

      (b) In relation to this, there is a lack of situating these findings within the wider research landscape. For instance, the use of structure for predicting resistance has been done, for example, in PncA (https://academic.oup.com/jacamr/article/6/2/dlae037/7630603, https://www.sciencedirect.com/science/article/pii/S1476927125003664, https://www.nature.com/articles/s41598-020-58635-x) and in RpoB (https://www.nature.com/articles/s41598-020-74648-y). These, and other such works, should be acknowledged as the novelty of this work is perhaps not as stark as the authors present it to be.

      (3) The authors postulate that neutral AA substitutions would be randomly distributed in the protein structure and thus use random mutations as a negative control to simulate this neutral evolution. However, I am unsure if this is a true negative control for neutral evolution. The vast majority of residues would be under purifying selection, not neutral selection, especially in core proteins like rpoB and gyrA. Therefore, most of these residues would never be mutated in a real-world dataset. Therefore, you are not testing positive selection against neutral selection; you are testing positive against purifying, which will have a much stronger signal. This is likely to, in turn, overestimate the signal of positive selection. This would be better accounted for using a model of neutral evolution, although this is complex and perhaps outside the scope. Still, it needs to be made clear that these negative controls are not representative of neutral evolution.

      (4) In a similar vein, the use of 15 Å as a cut-off for stating co-localisation feels quite arbitrary. The average radius of a globular protein is about 20 Å, so this could be quite a large patch of a protein. I think it may be good to situate the cut-off for a 'single location' within a size estimator of the entire protein, as 15 Å could be a neighbourhood in a large protein, but be the whole protein for smaller ones.

    1. Reviewer #1 (Public review):

      Summary:

      This manuscript by Xie and colleagues presents an intriguing behavioral finding for the field of perceptual learning (PL): combining the reactivation-based training paradigm with anodal tDCS induces complete generalization of the learning effect. Notably, this generalization is achieved without compromising the magnitude of learning effects and with an 80% reduction in total training time. The experimental design is well-structured, and the observed complete generalization is robustly replicated across two stimulus dimensions (orientation and motion direction).

      However, while the empirical results are methodologically valid and scientifically surprising, the theoretical framework proposed to explain them appears underdeveloped and, in some cases, difficult to reconcile with the existing literature. Several arguments are insufficiently justified. In addition, the introduction of a non-standard metric (NGI: normalized learning gain index) raises concerns about the interpretability and comparability with existing PL literature.

      Strengths:

      (1) Rigorous experimental design

      In this study, Xie and colleagues employed a 2×2 factorial design (Training paradigm: Reactivation vs. Full-Practice × tDCS protocols: Anodal vs. Sham), which allowed clear dissociation of the main and interaction effects.

      (2) High statistical credibility

      Sample sizes were predetermined using G*Power, non-significant effects were evaluated using the Bayes factor, and the core behavioral findings were replicated in a second stimulus dimension. These strengthen the credibility of the findings.

      (3) Strong translational potential

      The observed complete generalization could have useful implications for sensory rehabilitation. The large reduction (80%) in total training time is particularly compelling.

      Weaknesses:

      (1) NGI (Normalized learning gain index) is a non-standard behavioral metric and may distort interpretability.

      NGI (pre - post / ((pre + post) / 2)) is rarely used in PL studies to measure learning effects. Almost all PL studies rely on raw thresholds and percent improvements (pre - post / pre), making it difficult to contextualize the current NGI-based results within the broader field. The current manuscript provides no justification for adopting NGI.

      A more critical issue is the NGI's nonlinearity: by normalizing to the mean of pre- and post-test thresholds, it disproportionately inflates learning effects for participants with lower post-test thresholds. Notably, the "complete generalization" claims are illustrated mainly with NGI plots. Although the authors also analyze thresholds directly and the results also support the core claim, the interpretation in the text relies heavily on NGI.

      The authors may consider rerunning key analyses using the standard percent improvement metric. If retaining NGI, the authors should provide explicit justification for why NGI is superior to standard measures.

      (2) The proposed theoretical framework is sometimes unclear and insufficiently supported.

      The authors propose the following mechanistic chain:

      (a) reactivation-based learning depends on offline consolidation mediated by GABA (page 4 line 73);

      (b) online a-tDCS reduces GABA (page 4, line 76), thereby disrupting offline consolidation (page 11, line 225);

      (c) disrupted offline consolidation reduces perceptual overfitting (page 4, line 77; page 11, line 225), thereby enabling generalization;

      (d) under full-practice training, a-tDCS increases specificity via a different mechanism (page 11 line 235).

      While this framework is plausible in broad terms, several components are speculative at best in the absence of neurochemical or neural measurements.

      (3) Several reasoning steps require further clarification.

      (a) Mechanisms of Reactivation-based Learning.

      The manuscript focuses on the neurochemical basis of reactivation-based learning. However, reactivation-induced neurochemical changes differ across brain regions. In the motor cortex, Eisenstein et al. (2023) reported that after reactivation, increased GABA and decreased E/I ratio were associated with offline gains. In contrast, Bang et al. (2018) demonstrated that, in the visual cortex, reactivation decreased GABA and increased E/I ratio. While both studies are consistent with GABA involvement, the direction of GABA modulation differs. The authors should clarify this discrepancy.<br /> More importantly, Bang et al. (2018) demonstrated that reactivation-based (3 blocks) and full-practice (16 blocks) training produced similar time courses of E/I ratio changes in V1: an initial increase followed by a decrease. Given this similarity, the manuscript would benefit from a more thorough discussion of how the two paradigms diverge mechanistically. For example, behaviorally, Song et al. (2021) reported greater generalization with reactivation-based training than with full-practice training, aligning with Kondat et al. (2025). Neurally, Kondat et al. (2024) showed that reactivation-based training increased activity in higher-order brain regions (e.g., IPS), whereas full practice training reduced connectivity between temporal and parietal regions.

      (b) tDCS Mechanisms and Protocols.

      The effect of a-tDCS on GABA is not consistent across brain regions. While a-tDCS reliably reduces GABA in the motor cortex, recently, a more related work (Abuleli et al., 2025) reports no significant modulation of GABA or Glx in V1, challenging the authors' assumption of tDCS-induced GABA reduction in the visual cortex.

      The manuscript proposes that online a-tDCS disrupts offline consolidation is somewhat difficult to interpret conceptually. Online tDCS typically modulates processes occurring during stimulation (e.g., encoding process, attentional state), whereas consolidation occurs afterward. Thus, stating that online tDCS protocols only disrupt offline consolidation without considering the possibility that they first modulate the encoding process is difficult to interpret. Even if tDCS has prolonged effects, the link between online stimulation and disruption of offline consolidation remains unelucidated.

      (c) Missing links between GABA modulation and perceptual overfitting.

      The proposed chain ("tDCS disrupts consolidation → reduced overfitting → improved generalization") skips a critical step: how GABA modulation translates to changes in neural representational properties (e.g., tuning width, representational overlap between trained/untrained stimuli) that define "perceptual overfitting." The PL literature has not established a link between GABA levels and these representational changes, leaving a key component of the mechanistic explanation underspecified.

      (d) Insufficient explanation of the opposite effects.

      The manuscript does not fully explain why the same a-tDCS promotes generalization in reactivation-based training but increases specificity in full-practice training. Both paradigms engage offline consolidations, and, as mentioned above, the time courses of E/I ratio changes are similar for 3-block reactivation-based or 16-block training. Thus, if offline consolidation mechanisms (and their associated E/I changes) are comparable across paradigms, it is unclear why identical a-tDCS would produce opposite outcomes in the two paradigms.

    1. Reviewer #2 (Public review):

      The authors describe the first deep neurological characterization of WAC mutation in two vertebrate species (zebrafish and mouse). They examine these at various levels, guided by the work in humans that has associated a heterozygous WAC mutation with DeSantos Shinawi Syndrome (DESSH). Therefore, they investigate the animals for a variety of phenotypes, following a template for what is seen when characterizing a new mouse/fish model of a developmental disability gene. Investigations include analysis of skull and jaw for abnormalities(both species), MRI of brain structure(in mice), electrophysiology(mice), assessment of signaling pathways (by Western blot, in mice), cell counts (both, more in mice), transcriptomics (mice), and behavior (both).

      Generally, this describes an important first characterization of the consequences of the mutation. Most of the studies appear well-conducted and reasonably powered, thus solid or convincing. However, there are a few places where the data presentation could be improved for clarity, and a few concerns about some choices in analytical approach for a couple of the experiments, where improved statistical approaches could improve their sensitivity and/or better rule out false positives, and thus the support of some of these claims is currently incomplete. There is also some lack of clarity about the rationale for some decisions regarding the fish genetics. Nonetheless, this is an important and useful first characterization of many phenotypes of these lines. Such experiments form a baseline for future mechanistic studies in the same lines and a platform to test approaches to reverse phenotypes.

      Individual claims and their strength & weaknesses:

      (1) The authors developed mouse and zebrafish models of WAC deletion

      They used the existing KOMP floxed WAC line to generate a null allele. For the mouse, there is a Western showing that it is indeed null for the protein. The fish data is less robustly validated - they don't confirm the allele in null at the protein or RNA level, and fish have two paralogs (waca and wacb), and this paper only characterizes one of these. So this evidence is less clear. The evaluated mice are heterozygous (Het), similar to patients, while the fish appear to be evaluated as homozygous mutants.

      (2) The authors show that both species show altered craniofacial features

      These data appear well powered, and the findings are robust.

      (3) Each model altered GABAergic neurons

      In mice, the authors stained with PV antibodies and saw a decrease in cells positive for this staining. A second marker, Lhx6, does not show a difference, suggesting this might be a change in PV expression rather than cell number. They could maybe look into the literature to see if this loss of just the protein also occurs in other models. Overall, the sample size here is a bit smaller than other parts of the paper (n=3), and the methods on the cell counts were less clear, so it is not as clear that this finding is as robust. The authors counted several other broad classes of cells, and those appear normal. Interestingly, there might also be some TBR1 mislocalization in layer 6 that might be significant with added power.

      The fish data is based on an in situ hybridization for GAD. The measure shown is the width of the positive area in the forebrain. This measure is not one I have seen much before, and has potential to be driven by something unrelated to GABA (e.g., if the whole forebrain were simply a bit smaller). So this analysis could use a couple of other approaches (density of signal?) and/or a control probe for some other brain gene showing the measure is normal, and thus it is not just a size issue.

      (4) Mice were more susceptible to the seizure-inducing agent PTZ

      These data appear well powered, and the findings are robust. The authors also did a fair amount of useful electrophysiology that was all normal, but appeared to be well executed.

      (5) Mice had changes in brain volume that interact with sex

      The authors conducted an MRI on a good number of mice and reported a slight increase in global volume just in males. Sample size is fair, but the statistical approach here may be better if it puts males and females in the same model (to boost power and explicitly test for sex by genotype interaction that they report), and there is some chance that the brain region level differences that they report could include some false positives. They tested many regions, and it is not clear whether or not they corrected for the number of tests. Often, an FDR correction would be used in such imaging studies. It may be that only the most robust regional findings will survive those corrections. It is interesting data either way, but the analysis could be improved.

      (6) Several behaviors are altered in the mice as well

      These studies were fairly well-powered (n=15,16), and they found several positive and negative results, including alterations in memory and sociability in both species. There is a minor statistical flaw in the three-chamber analysis (they don't actually compare the Hets directly to the wildtypes in their statistical testing - a common mistake in neuroscience that should be addressed. But the data look like they will probably still be significant when correctly analyzed. In the supplement, the authors could do a bit more with the data they have to look at hyperactivity (i.e., show total motion in open field, not just time in center vs. periphery), and adding sex to their model might improve sensitivity for genotype effects.

      (7) Some biochemical signaling pathways are altered in the brain

      These are n=4 immunoblots, and show altered phospho ERK, but no changes in other signaling events predicted from prior WAC literature like H2B ubiquitination. They appear well done, and the authors share the full blots in the supplement.

      (8) WAC deletion also alters gene expression in the brain

      These studies were well-powered for RNAseq, with 10 and 14 samples, using neonates (P2), just the forebrain. The sequencing quality metrics all looked good, and the approach to analysis was okay. It would be stronger to again include sex in the model, rather than separate by sex. There were some typos in this part of the paper that made part of the conclusions unclear, but the RNAseq nicely confirmed the mutation of the mice, and discovered many differentially expressed genes, consistent with the role of this gene as a regulator of transcription. The presentation could be expanded to make more use of the data. Overall, though, this is a useful first characterization of the transcriptome in the line.

    1. Reviewer #2 (Public review):

      Summary:

      Previous studies by some of the same authors of the actual manuscript showed that healthy human newborns memorize recently learned nonsense words. They exposed neonates to a familiarization period (several minutes) when multiple repetitions of a bisyllabic word were presented, uttered by the same speaker. Then they exposed neonates to an "interference period" when newborns listened to music or the same speaker uttering a different pseudoword. Finally, neonates were exposed to a test period when infants hear the familiarized word again. Interestingly, when the interference was music, the recognition of the word remained. The word recognition of the word was measured by using the NIRS technique, which estimates the regional brain oxygenation at the scalp level. Specifically, the brain response to the word in the test was reduced, unveiling a familiarity effect, while an increase in regional brain oxygenation corresponds to the detection of a "new word" due to a novelty effect. In previous studies, music does not erase the memory traces for a word (familiarity effect), while a different word uttered by the same speaker does.

      The current study aims at exploring whether and how word memory is interfered with by other speech properties, specifically the changes in the speaker, while young children can distinguish speakers by processing the speech. The author's main hypothesis anticipates that new speaker recognition would produce less interference in the familiarized word because somehow neonates "separate" the processing of both words (familiarized uttered by one speaker, and interfering word, uttered by a different speaker), memorizing both words as different auditory events.

      From my point of view, this hypothesis is interesting, since the results would contribute to estimating the role of the speaker in word learning and speech processing early in life.

      Strengths:

      (1) New data from neonates. Exploring neonates' cognitive abilities is a big challenge, and we need more data to enrich the knowledge of the early steps of language acquisition.

      (2) The study contributes new data showing the role of speaker (recognition) on word learning (word memory), a quite unexplored factor. The idea that neonates include speakers in speech processing is not new, but its role in word memory has not been evaluated before. The possible interpretation is that neonates integrate the process of the linguistic and communicative aspects of speech at this early age.

      (3) The study proposes a quite novel analytic approach. The new mixed models allow exploring the brain response considering an unbalanced design. More than the loss of data, which is frequent in infants' studies, the familiarization, interference and learning processes may take place at different moments of the experiment (e.g. related to changes in behavioural states along the experiment) or expressed in different regions (e.g. related to individual variations in optodes' locations and brain anatomy).

      Weaknesses:

      I did not find major weaknesses. However, I would like to have more discussion or explanation on the following points.

      (1) It would be fine to report the contribution of each infant to the analysis, i.e. how many good blocks, 1 to 5 in sequence 1 and 2, were provided by each infant.

      (2) Why did the factor "blocknumber" range from 0 to 4? The authors should explain what block zero means and why not 1 to 5.

      (3) I may suggest intending to integrate the changes in brain activity across the 3 phases. That is, whether changes in familiarization relate to changes in the test and interference phases. For instance, in Figure 2, the brain response distinguishes between same and novel words that occurred over IFG and STG in both hemispheres. However, in the right STG there was no initial increase in the brain response, and the response for the same was higher than the one for novels in the 5th block.

      (4) Similarly, it is quite amazing that the brain did not increase the activity with respect to the familiarization during the interference phase, mainly over the left hemisphere, even if both the word and speaker changed. Although the discussion considers these findings, an integrated discussion of the detection of novel words and the detection of a novel speaker over time may benefit from a greater integration of the results.

      Appraisal:

      The authors achieved their aims because the design and analytic approaches showed significant differences. The conclusions are based on these results. Specifically, the hypothesis that neonates would memorize words after interference, when interfered speech is pronounced by a different speaker, was supported by the data in blocks 2 and 5, and the potential mechanisms underlying these findings were discussed, such as separate processing for different speakers, likely related to the recognition of speaker identity.

      I think the discussion is well-structured, although I may suggest integrating the changes into the three phases of the study. Maybe comparing with other regions, not related to speech processing.

      Evaluating neonates is a challenge. Because physiology is constantly changing. For instance, in 9 minutes, newborns may transit from different behavioral states and experience different physiological needs.

      This study offers the opportunity to inspire looking for commonalities and individual differences when investigating early memory capacities of newborns.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This is a good manuscript, well performed and well presented. I have several suggestions/questions to enhance the clarity of the concept, as technically the work is rather well performed.

      1. I suggest that the authors explain better the mesenchymal-to-epithelial (MET) transition in reprogramming. Perhaps, explaining that epithelial gene acquisition (e.g., CDH1) and epidermal cell fate are not exactly the same. This approach could also be used to divide the genes they study further in their analyses.
      2. KLF4 is both a repressor and an activator in different cell contexts including reprogramming. Does HIC2 act only as repressor? Is it possible that HIC2 is repressing KLF4-activated genes bad for reprogramming (including epidermal genes) and activating KLF4-suppressed genes ncessary for reprogramming? This should not be too difficult to explore with their current dataset and they also could look at available datasets for histone modifications in reprogramming.
      3. Does HIC2 bind to genes related to somatic cell identify that need to be suppressed in reprogramming before the MET phase takes place?
      4. Does HIC2 influence proliferation during reprogramming?

      Referee cross-commenting

      Comments by the other reviewers are sound and will help improve the manuscript.

      Significance

      In this manuscript, Kaji and colleagues perform a CRISPR/Cas9 screen to identify genes involved in mouse somatic cell reprogramming, identifying HIC2 as a target that they further validate. They conclude that HIC2 acts by repressing the epidermal/epithelial program induced by KLF4 during reprogramming. Studying the complex role of transcription factor interactions in the context of cell fate conversions (of any kind and not just somatic cell reprogramming) is highly relevant. This work helps clarify such complexity in a specific context but the work has wider conceptual implications.

    1. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors develop a biologically plausible recurrent neural network model to explain how the hippocampus generates and uses barcode-like activity to support episodic memory. They address key questions raised by recent experimental findings: how barcodes are generated, how they interact with memory content (such as place and seed-related activity), and how the hippocampus balances memory specificity with flexible recall. The authors demonstrate that chaotic dynamics in a recurrent neural network can produce barcodes that reduce memory interference, complement place tuning, and enable context-dependent memory retrieval, while aligning their model with observed hippocampal activity during caching and retrieval in chickadees.

      Strengths:

      (1) The manuscript is well-written and structured.

      (2) The paper provides a detailed and biologically plausible mechanism for generating and utilizing barcode activity through chaotic dynamics in a recurrent neural network. This mechanism effectively explains how barcodes reduce memory interference, complement place tuning, and enable flexible, context-dependent recall.

      (3) The authors successfully reproduce key experimental findings on hippocampal barcode activity from chickadee studies, including the distinct correlations observed during caching, retrieval, and visits.

      (4) Overall, the study addresses a somewhat puzzling question about how memory indices and content signals coexist and interact in the same hippocampal population. By proposing a unified model, it provides significant conceptual clarity.

      Weaknesses:

      The recurrent neural network model incorporates assumptions and mechanisms, such as the modulation of recurrent input strength, whose biological underpinnings remain unclear. The authors acknowledge some of these limitations thoughtfully, offering plausible mechanisms and discussing their implications in depth. It may be worth exploring the robustness of the results to certain modeling assumptions. For instance, the choice to run the network for a fixed amount of time and then use the activity at the end for plasticity could be relaxed.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors develop a biologically plausible recurrent neural network model to explain how the hippocampus generates and uses barcode-like activity to support episodic memory. They address key questions raised by recent experimental findings: how barcodes are generated, how they interact with memory content (such as place and seed-related activity), and how the hippocampus balances memory specificity with flexible recall. The authors demonstrate that chaotic dynamics in a recurrent neural network can produce barcodes that reduce memory interference, complement place tuning, and enable context-dependent memory retrieval, while aligning their model with observed hippocampal activity during caching and retrieval in chickadees.

      Strengths:

      (1) The manuscript is well-written and structured.

      (2) The paper provides a detailed and biologically plausible mechanism for generating and utilizing barcode activity through chaotic dynamics in a recurrent neural network. This mechanism effectively explains how barcodes reduce memory interference, complement place tuning, and enable flexible, context-dependent recall.

      (3) The authors successfully reproduce key experimental findings on hippocampal barcode activity from chickadee studies, including the distinct correlations observed during caching, retrieval, and visits.

      (4) Overall, the study addresses a somewhat puzzling question about how memory indices and content signals coexist and interact in the same hippocampal population. By proposing a unified model, it provides significant conceptual clarity.

      Weaknesses:

      The recurrent neural network model incorporates assumptions and mechanisms, such as the modulation of recurrent input strength, whose biological underpinnings remain unclear. The authors acknowledge some of these limitations thoughtfully, offering plausible mechanisms and discussing their implications in depth.

      One thread of questions that authors may want to further explore is related to the chaotic nature of activity that generates barcodes when recurrence is strong. Chaos inherently implies sensitivity to initial conditions and noise, which raises questions about its reliability as a mechanism for producing robust and repeatable barcode signals. How sensitive are the results to noise in both the dynamics and the input signals? Does this sensitivity affect the stability of the generated barcodes and place fields, potentially disrupting their functional roles? Moreover, does the implemented plasticity mitigate some of this chaos, or might it amplify it under certain conditions? Clarifying these aspects could strengthen the argument for the robustness of the proposed mechanism.

      In our model, chaos is used to produce a random barcode when forming memories, but memory retrieval depends on attractor dynamics. Specifically, the plasticity update at the end of the cache creates an attractor state, and then afterwards for successful memory retrieval the network activity must settle into this attractor rather than remaining chaotic. This attractor state is a conjunction of memory content (place and seed activity) and memory index (barcode activity). Thus a barcode is ‘reactivated’ when network dynamics during retrieval settle into this cache attractor, or in other words chaotic dynamics do not need to generate the same barcode twice.

      The reviewer raises an important point, which is how sensitivity to initial conditions and noise would affect the reliability of our proposed mechanism. The key question here is how noise will affect the network’s dynamics during retrieval. Would adding noise to the dynamics make memory retrieval more difficult? We thank the reviewer for suggesting we investigate this further, and below describe our experiments and changes to the manuscript to better address this topic.

      We first experimented with adding independent gaussian distributed noise into each unit, drawn independently at each timestep. We analyzed recall accuracy using the same task and methods as Fig. 4F while varying the magnitude of noise. Memory recall was quite robust to this form of noise, even as the magnitude of noise approached half of the signal amplitude. This first experiment added noise into the temporal dynamics of the network. We subsequently examined adding static noise into the network inputs, which can also be thought of as introducing noise into initial conditions. Specifically, we added independent gaussian distributed noise into each unit, with the random value held constant for the extent of temporal dynamics. This perturbation decreased the likelihood of memory recall in a graded manner with noise magnitude, without dramatically changing the spatial profile. Examination of dynamics on individual trials revealed that the network failed to converge onto a cache attractor on some random fraction of trials, with other trials appearing nearly identical to noiseless results. We now include these results in the text and as a new supplementary figure, Figure S4AB.

      To clarify the network dynamics and the purpose of chaos in our model, we make the following modifications in text:

      Section 2.3, paragraph 2 (starting at “To store memories…”):

      “…place inputs arrive into the RNN, recurrent dynamics generate an essentially random barcode, seed inputs are activated, and then Hebbian learning binds a particular pattern of barcode activity to place- and seed-related activity.”

      Section 2.3, paragraph 3 (starting at “Memory recall in our network…”): As an example, consider a scenario in which an animal has already formed a memory at some location l, resulting in the storage of an attractor \vec{a} into the RNN. The attractor \vec{a} can be thought of as a linear combination of place input-driven activity $p(l)$, seed input-driven activity $s$, and a recurrent-driven barcode component $b$. Later, the animal returns to the same location and attempts recall (i.e. sets r \= 1, Figure 3B). Place inputs for location l drive RNN activity towards $p(l)$, which is partially correlated with attractor \vec{a}, and the recurrent dynamics cause network activity to converge onto attractor \vec{a}. In this way, barcode activity $b$ is reactivated, along with the place and seed components stored in the attractor state, $p(l)$ and $s$. The seed input can also affect recall, as discussed in the following section.

      Section 2.4, final paragraph (starting “We further examined how model hyperparameters affected performance on these tasks”), added the following describing new results on adding noise: We found that adding noise to the network's temporal dynamics had little effect on memory recall performance (Figure S4A). However, large static noise vectors added to the network's input and initial state decreased the overall probability of memory recall, but not its spatial profile (Figure S4B).

      It may also be worth exploring the robustness of the results to certain modeling assumptions.  For instance, the choice to run the network for a fixed amount of time and then use the activity  at the end for plasticity could be relaxed.

      As described above, chaotic dynamics are necessary to generate a barcode during a cache, but not to reactivate that barcode during retrieval. During a successful memory retrieval, network activity settles into an attractor state and thus does not depend on the duration of simulated dynamics. The choice of duration to run dynamics during caching is important, but only insofar as activity significantly decorrelates from the initial state. We show in Figure S1B that decorrelation saturates ~t=25, and thus any random time point t > 25 would be similarly effective. We used a fixed duration runtime for caches only to avoid introducing unnecessary complication into our model.

      Reviewer #2 (Public review):

      Summary:

      Striking experimental results by Chettih et al 2024 have identified high-dimensional, sparse patterns of activity in the chickadee hippocampus when birds store or retrieve food at a given site. These barcode-like patterns were interpreted as "indexes" allowing the birds to retrieve from memory the locations of stored food.

      The present manuscript proposes a recurrent network model that generates such barcode activity and uses it to form attractor-like memories that bind information about location and food. The manuscript then examines the computational role of barcode activity in the model by simulating two behavioral tasks, and by comparing the model with an alternate model in which barcode activity is ablated.

      Strengths of the study:

      Proposes a potential neural implementation for the indexing theory of episodic memory - Provides a mechanistic model of striking experimental findings: barcode-like, sparse patterns of activity when birds store a grain at a specific location

      A particularly interesting aspect of the model is that it proposes a mechanism for binding discrete events to a continuous spatial map, and demonstrates the computational advantages of this mechanism.

      Weaknesses:

      The relation between the model and experimentally recorded activity needs some clarification

      The relation with indexing theory could be made more clear

      The importance of different modeling ingredients and dynamical mechanisms could be made more clear

      The paper would be strengthened by focusing on the most essential aspects

      Comments:

      The model distinguishes between "barcode activity" and "attractors". Which of the two corresponds to experimentally-recorded barcodes? I would presume the attractors. A potential issue is that the attractors are, as explained in the text (l.137), conjunctions of place activity, barcode activity and "seed" inputs. The fact that the seed activity is shared across attractors seems to imply that they have a non-zero correlation independent of distance. Is that the case in the model? If I understand correctly, Fig 3D shows correlations between an attractor and barcodes at different locations, but correlations between attractors at different locations are not shown. Fig 1 F instead shows that correlations between recorded retrieval activities decay to zero with distance.

      More generally, the fact that the expression "barcode" is apparently used with different meanings in the model and in the experiments is potentially confusing (in the model they correspond to activity generating during caching, and this activity is distinct from the memories; my understanding is that in the experiments barcodes correspond to both caching and retrieval, but perhaps I am mistaken?).

      Our intent is to use the expression “barcode” as similarly as possible between model and experimental work. The reviewer points out that the connection between barcodes in experimental and modeling work is unclear, as well as the relation of “attractors” in our model to previous experimental results. The meaning of ‘barcode’ is absolutely critical—we clarify below our intended meaning, and then describe changes to the manuscript to highlight this.

      In experiments, we observed that activity during caching looked different than ordinary hippocampal activity (i.e. typical “place activity” observed during visits). Empirically there were two major differences. First, there was a pattern of neural activity which was present during every cache . This pattern was also present when birds visually inspected sites containing a cached seed, but not when visually inspecting an empty site. This is what we refer to as “seed activity”. Second, there was a pattern of neural activity which was unique to each cache. This pattern re-occurred during retrieval, and was orthogonal to place activity (see Fig. 1E-F). This is what we refer to as “barcode activity”. In summary, activity during a cache (or retrieval) contains a combination of three components: place activity, seed activity, and barcode activity.

      These experimental findings are recapitulated in our model, as activity during a cache contains a combination of three components: place activity driven by place inputs, seed activity driven by seed inputs, and barcode activity generated by recurrent dynamics. Cache activity in the model corresponds to cache activity in experiments, and barcodes in the model correspond to barcodes in experiments. Our model additionally has “attractors”, meaning that network connectivity changes so that the activity generated during a simulated cache becomes an attractor state of network dynamics. “Attractors” refers to a feature of network dynamics, not a distinct activity state, and we do not yet know if these attractors exist in experimental data.

      Figure 3D, as described in the figure legend, is a correlation of activity during cache and retrieval (in purple), for cache-retrieval pairs at the same or at different sites. We believe this is what the reviewer asks to see: the correlation between attractor states for different cache locations. The reviewer makes an important point: seed activity is shared across all attractors, so then why are correlations not high for all locations? This is because attractors also have a place component, which is anti-correlated for distant locations. This is evident in Fig. 3D by noticing that visit-visit correlations (black line, corresponding to place activity only) are negative for distant locations, and the correlation between attractors (purple line, cache-retrieval pairs) is subtly shifted up relative to the black line (place code only) for these distant locations. The size of this shift is due to the relative magnitude of place and seed inputs. For example, if we increase the strength of the seed input during caching (blue line), we can further increase the correlation between attractors even for quite distant sites:

      Author response image 1.

      To clarify the manuscript, we made the following modifications:

      Section 2.2, first paragraph: We model the hippocampus as a recurrent neural network (RNN) (Alvarez and Squire, 1994; Tsodyks, 1999; Hopfield, 1982) and propose that recurrent dynamics can generate barcodes from place inputs. As in experiments, the model’s population activity during a cache should exhibit both place and barcode activity components.

      Section 2.3, paragraph 3 (starting at “Memory recall in our network…”): As an example, consider a scenario in which an animal has already formed a memory at some location l , resulting in the storage of an attractor \vec{a} into the RNN . The attractor \vec{a} can be thought of as a linear combination of place input-driven activity $p(l)$, seed input-driven activity $s$, and a recurrent-driven barcode component $b$. Later, the animal returns to the same location and attempts recall (i.e. sets r \= 1, Figure 3B). Place inputs for l drive RNN activity towards $p(l)$, which is partially correlated with attractor \vec{a}, and the recurrent dynamics cause network activity to converge onto attractor \vec{a}. In this way, barcode activity $b$ is reactivated as part of attractor \vec{a}, along with the place and seed components stored in the attractor state, $p(l)$ and $s$. The seed input can also affect recall, as discussed in the following section.

      The insights obtained from the network model for the computational role of barcode activity could be explained more clearly. The introduction starts by laying out the indexing theory, which proposes that the hippocampus links an index with each memory so that the memory is reactivated when the index is presented. The experimental paper suggests that the barcode activations play the role of indexes. Yet, in the model reactivations of memories are driven not by presenting bar-code activity, but by presenting place activity (Cache Presence task) or seed activity (Cache Location task). So it seems that either place activity and seed activity play the role of indexes. Section 2.5 nicely shows that ultimately the role of barcode activity is to decorrelate attractors, which seems different from playing the role of indexes. I feel it would be useful that the Discussion reassess more critically the relationship between barcodes, indexing theory, and key-value architectures.

      The reviewer highlights a failure on our part to clearly identify the connection between our findings on barcodes, indexing theory, and key-value architectures. This is another major component of the paper, and below we propose changes to the manuscript to clarify these concepts and their relationships. First, we will summarize the key points that were unclear in our original manuscript.

      The reviewer equates the concept of an ‘index’ with that of a ‘query’: the signal that drives memory reactivation. This may be intuitive, but it is not how a memory index was defined in indexing theory (e.g. Teyler & DiScenna 1986). In indexing theory, the index is a pattern of hippocampal activity that is (a) generated during memory formation, (b) separate from the activity encoding memory content, and (c) linked to memory content via associative plasticity. After memory formation, a memory might be queried by activating a partial set of the memory contents, which would then drive reactivation of the hippocampal index, leading to pattern completion of memory contents. See, for example, figure 1 of Teyler and DiScenna 1986. The ‘index’ is thus not the same as the ‘query’ that drives recall.

      We propose in this work that barcode activity is such an index. Indexing theory originally posited that memory content was encoded by neocortex, and memory index was encoded by hippocampus. However the experiments of Chettih et al. 2024 revealed that the hippocampus contained both memory content and memory index signals, and furthermore there was no division of cells into ‘content’ and ‘index’ subtypes. Thus our model drops the assumption of earlier work that index and content signals correspond to different neurons in different brain areas—a significant advance of our work. Otherwise, the experimentally observed barcodes and the barcodes generated by our computational model play the role of indices as originally defined.

      Our original manuscript was unclear on the relationship of indexing theory and key-value systems. Our work connects diverse areas of memory models, including attractor dynamics, key-value memory systems, and memory indexing. A full account of these literatures and their relationships may be beyond the scope of this manuscript, and we note that a recent review article (Gershman, Fiete, and Irie, 2025) further clarifies the relationship between key-value memory, indexing theory, and the hippocampus. We will cite this work in our discussion as a source for the interested reader.

      Briefly, a key-value memory system distinguishes between the address where a memory is stored, the ‘key’, and the content of that memory, the ‘value’. An advantage of such systems is that keys can be optimized for purposes independent of the value of each memory. The use of barcodes in our model to decorrelate memories is related to this optimization of keys in key-value memory systems. By generating barcodes and adding this to the attractor state corresponding to a cache memory, the ‘address’ of the memory in population activity is differentiated from other memories. Our work is thus consistent with the idea that hippocampus generates keys and implements a key storage system. However it is not so straightforward to equate barcodes with keys, as they are defined in key-value memory. As the reviewer points out, memory recall can be driven by location and seed inputs, i.e. it is content-addressable. We think of the barcode as modifying the memory address to better separate similar memories, without changing memory content, and the resulting memory can be recalled by querying with either content or barcode. Given the complex and speculative nature of these relationships, we prefer to note the salient connection of our work with ongoing efforts applying the key-value framework to biological memory, and leave the precise details of this connection to future work.

      We make the following changes in the manuscript to clarify these ideas:

      Introduction, first paragraph: In this scheme, during memory formation the hippocampus generates an index of population activity, and the neurons representing this index are linked with the neurons representing memory content by associative plasticity . Later, re-experience of partial memory contents may reactivate the index, and reactivation of the index drives complete recall of the memory contents.

      Discussion, 4th paragraph on key-value: Interestingly, prior theoretical work has suggested neural implementations for both key-value memory and attention mechanisms, arguing for their usefulness in neural systems such as long term memory (Kanerva, 1988; Tyulmankov et al., 2021; Bricken and Pehlevan, 2021; Whittington et al., 2021; Kozachkov et al., 2023; Krotov and Hopfield, 2020; Gershman 2025 ). In this framework, the address where a memory is stored (the key) may be optimized independently of the value or content of the memory. In our model, barcodes improve memory performance by providing a content-independent scaffold that binds to memory content, preventing memories with overlapping content from blurring together. Thus barcodes can be considered as a change in memory address, and our model suggests important connections between recurrent neural activity and key generation mechanisms. However we note that barcodes should not be literally equated with keys in key-value systems as our model’s memory is ‘content-addresable’—it can be queried by place and seed inputs.

      The model includes a number of non-standard ingredients. It would be useful to explain which of these ingredients and which of the described mechanisms are essential for the studied phenomenon. In particular:

      - the dynamics in Eq.2 include a shunting inhibition term. Is it essential and why?

      The shunting inhibition is important as it acts to normalize the network activity to prevent runaway excitation. We hope to clarify this further by amending the following sentence in section 2.2: “g (·) is a leak rate that depends on the average activity of the full network, representing a form of global shunting inhibition that normalizes network activity to prevent runaway excitation from recurrent dynamics.”

      - same question for the global inhibition included in the random connectivity;

      The distribution from which connectivity strengths are drawn has a negative mean (global inhibition). This causes activity during caching (i.e. r = 1) to be sparser than activity during visits (i.e. r = 0), and was chosen to match experimental findings. In figures 2B and S2B we show that our model can transition between a mode with place code only, barcode only, or a mode containing both, by changing the variance of the weight distribution while holding the mean constant. We suggest clarifying this by editing the following in section 2.2, paragraph 2: “We initialize the recurrent weights from a random Gaussian distribution, . where 𝑁<sub>𝑋</sub> is the number of RNN neurons and μ < 0, reflecting global subtractive inhibition that encourages sparse network activity to match experimental findings (Chettih et al. 2024).”

      - the model is fully rate-based, but for certain figures, spikes are randomly generated. This seems superfluous.

      Spikes are simulated for one analysis and one visualization, where it is important to consider noise or variability in neural responses across trials. First, for Fig. 2H,J, we generated spikes to allow a visual comparison to figures that can be easily generated from experimental data. Second, and more significantly, for the analysis underlying Fig. 3D, it is essential to simulate variability in neural responses. Because our rate-based models are noiseless, the RNN’s rate vector at site distance = 0 will always be the same and result in a correlation of 1 for both visit-visit and cache-retrieval. However, we show that, if one interprets the rate as a noisy Poisson spiking process, the correlation at site distance = 0 between a cache-retrieval pair is higher than that of two visits. This is because under a Poisson spiking model, the signal-to-noise ratio is higher for cache-retrieval activity, where rates are higher in magnitude. The greater correlation for a cache-retrieval pair at the same site, relative to visits at the same site, is an experimental finding that was critical for our model to reproduce. We detail clarifications to the manuscript below in response to the reviewer’s following and related question.

      How are the correlations determined in the model (e.g., Fig 2 B)? The methods explain that they are computed from Poisson-generated spikes, but over which time period? Presumably during steady-state responses, but are these responses time-averaged?

      The reviewer points out a lack of clarity in our original manuscript. Correlations for events (caches, retrievals and visits) at different sites are calculated in two sections of the paper (2B, 3D), for different purposes and with slight differences in methods:

      - For figure 2B, no spikes are simulated. Note that the methods mentioning poisson spike generation specify only Fig. 2H,J and Fig. 3D. We simply take the network’s rate vector at timestep t=100 (when the decorrelating effect of chaotic dynamics has saturated, S1A-B) and correlate this vector when generated at different locations. We now clarify this in the legend for Figure 2B: “We show correlation of place inputs (gray) and correlation of the RNN's rate vector at t = 100 (black).”

      - For Figure 3D, we want to compare the model to empirical results from Chettih et al. 2024, and reproduced in this paper in Fig. 1E-F. These empirical results are derived from correlating vectors of spiking activity on pairs of single trials, and are thus affected by noise or variability in neural responses as described in our response to the reviewer’s previous question. We thus took the RNN’s rate vector at t=100 and simulated spiking data by drawing samples from a poisson distribution to get spike counts. Our original manuscript was unclear about this, and we suggest the following changes:

      - Legend for Figure 3D: D. Correlation of Poisson-generated spikes simulated from RNN rate vectors at two sites, plotted as a function of the distance between the two sites.

      - Section 2.3, last paragraph: Population activity during retrieval closely matches activity during caching, and is substantially decorrelated from activity during visits (Figure 3C). To compare our model with the empirical results reproduced in Figure 1E,F, we ran in silico experiments with caches and retrievals at varying sites in the circular arena. We simulated Poisson-generated spikes drawn from our network's underlying rates to match the intrinsic variability in empirical data (see Methods).

      - Methods, subsection Spatial correlation of RNN activity for cache-retrieval pairs at different sites: To calculate correlation values as in Figure \ref{fig3}D, we simulated experiments where 5 sites were randomly chosen for caching and retrieval. To compare model results to the empirical data in Fig. 1E,F, which includes intrinsic neural variability, we sampled Poisson-generated spike counts from the rates output by our model. Specifically, for RNN activity \vec{r_i} at location i, using the rates at t=100 as elsewhere, we first generate a sample vector of spikes…

      I was confused by early and late responses in Fig 2 C. The text says that the activity is initialized at zero, so the response at t=0 should be flat (and zero). More generally, I am not sure I understand why the dynamics matter for the phenomenon at all, presumably the decorrelation shown in Fig 2B depends only on steady state activity (cf previous question).

      Thanks for catching this mistake. The legend has been updated to indicate that the ‘early’ response is actually at t=1, when network activity reflects place inputs without the effects of dynamics. The reviewer is correct that we are primarily interested in the ‘late’ response of the network. All other results in the paper use this late response at t=100. As shown in Fig. S2A,B, this timepoint is not truly a steady state, as activity in the network continues to change, but the decorrelation of network activity with place-driven activity has saturated.

      We include the early response in Fig. 2C for visual comparison of the purely place-driven early activity with the eventual network response. It is also relevant since, as the reviewer points out above, there is a shunting inhibition term in the dynamics that is present during both low and high recurrent strength simulations.

      Related to the previous point, the discussion of decorrelation (l.79 - 97) is somewhat confusing. That paragraph focuses on chaotic activity, but chaos decorrelates responses across different time points. Here the main phenomenon is the decorrelation of responses across different spatial inputs (Fig 2B). This decorrelation is presumably due to the fact that different inputs lead to different non-trivial steady-state responses, but this requires some clarification. If that is correct, the temporal chaos adds fluctuations around these non-trivial steady-state responses, but that alone would not lead to the decorrelation shown in Fig 2B.

      We agree with the reviewer that chaotic activity produces a decorrelation across time points. Because of chaotic dynamics, network activity does not settle into a trivial steady-state, and instead evolves from the initial state in an unpredictable way. The network does not settle into a steady-state pattern, but both the decorrelation of network state with initial state and the rate of change in the network state saturate after ~t=25 timesteps, as shown in Fig. S2A-B.

      The initial activity for nearby states is similar, due to them receiving similar place inputs.

      Because network activity is chaotically decorrelated from this initial state by temporal dynamics, ‘late stage’ network activity between nearby spatial states is less correlated than ‘early stage’ activity. Thus the temporal decorrelation produces a spatial decorrelation. We believe that the changes we have introduced to the manuscript in revision will make this point clearer in our resubmission.

      A key ingredient of the model is that the recurrent interactions are switched on and off between "caching" and "visits". The discussion argues that a possible mechanism for this is recurrent inhibition (l.320), which would need to be added. However two forms of inhibition are already included in the model. The text also says that it is unclear how units in the model should be mapped onto E and I neurons. However the model makes explicit assumptions about this, in particular by generating spikes from individual neurons. Altogether, I did not find that part of the Discussion convincing.

      We agree with the reviewer that this section is a limitation of our current work, and in fact it is an ongoing area of future research. However we think the advances in this current work warrant publication despite this topic requiring further research. We attempted to discuss this limitation explicitly, and note that the other reviewer pointed this section out as particularly helpful. We do not think it is problematic for a realistic model of the brain to ultimately include 3, or even more forms of inhibition. We do not think that poisson-generated spikes commit us to interpreting network units as single neurons. Spikes are not a core part of our model’s mechanism, and were used only as a mechanism of introducing variability on top of deterministic rates for specific analyses. Furthermore one could still view network units as pools of both E and I spiking neurons. We would welcome further recommendations the reviewer believes are important to note in this section on our model’s limitations.

      On lines 117-120 the text briefly mentions an alternate feed-forward model and promptly discards it. The discussion instead says that a "separate possibility is that barcodes are generated in a circuit upstream of where memories are stored, and supplied as inputs to the hippocampal population", and that this possibility would lead to identical conclusions. The two statements seem a bit contradictory. It seems that the alternative possibility would replace the need for switching on and off recurrent interactions, with a mechanism where barcode inputs are switched on and off. This alternate scenario is perhaps more plausible, so it would be useful to discuss it more explicitly.

      We apologize for the confusion here, which seems to be due to our phrasing in the discussion section. We do reject the idea that a simple feed-forward model could generate the spatial correlation profile observed in data, as mentioned in the text and included as Fig. S2. Our statement in the discussion may have seemed contradictory because here we intended to discuss the possibility that an upstream area generates barcodes, for example by the chaotic recurrent dynamics proposed in our work, while a downstream network receives these barcodes as inputs and undergoes plasticity to store memories as attractors. We did not intend to suggest any connection to the feedforward model of barcode generation, and apologize for the confusion. Our claim that this ‘2 network’ solution would lead to similar conclusions is because the upstream network would need an efficient means of barcode generation, and the downstream network would need an efficient means of storing memory attractors, and separating these functions into different networks is not likely to affect for example the advantage of partially decorrelating memory attractors. Moreover, the downstream network would still require some form of recurrent gating, so that during visits it exhibits place activity without activating stored memory attractors!

      We thus chose a 1 network instead of a 2 network solution because it was simpler and, we believe, more interesting. It is challenging in the absence of more data to say which is more plausible, thus we wanted to mention the possibility of a 2 network solution. We suggest the following changes to the manuscript:

      - Discussion, 3rd paragraph: “Alternatively, other mechanisms may be involved in generating barcodes. We demonstrated that conventional feed-forward sparsification (Babadi and Sompolinsky, 2014; Xie et al., 2023) was highly inefficient, but more specialized computations may improve this (Földiak, 1990; Olshausen and Field, 1996; Sacouto and Wichert, 2023; Muscinelli et al., 2023). Another possibility is that barcodes are generated in a separate recurrent network upstream of the recurrent network where memories are stored. In this 2-network scenario, the downstream network receives both spatial tuning and barcodes as inputs. This would not obviate the need for modulating recurrent strength in the downstream network to switch between input-driven modes and attractor dynamics. We suspect separating barcode generation and memory storage in separate networks would not fundamentally affect our conclusions.”

      As a minor note, the beginning of the discussion states that the presented model is similar to previous recurrent network models of the hippocampus. It would be worth noting that several of the cited works assign a very different role to recurrent interactions: they generate place cell activity, while the present model assumes it is inherited from upstream inputs.

      We are not sure how best to modify the paper to address this suggestion. As far as we know, all of the cited models which deal with spatial encoding do assume that the hippocampus receives a spatially-modulated or spatially-tuned input. For example, the Tsodyks 1999 paper cited in this paragraph uses exponentially-decaying place inputs to each neuron highly similar to our model. Furthermore we explore how our model would perform if we change the format of spatial inputs in Fig. S4, and find key results are unchanged. It is unclear how hippocampal place fields could emerge without inputs that differentiate between spatial locations. We think it is appropriate to highlight the similarity of our model to well known hopfield-type recurrent models, where memories are stored as attractor states of the network dynamics.

      On the other hand, we agree that a common line of hippocampal modeling proposes that recurrent interactions reshape spatial inputs to produce place fields. This often arises in the context of hippocampus generating a predictive map, where inputs may be one-hot for a single spatial state, in a grid cell-like format, or a random projection of sensory features. We attempted to address this in section 2.6, using a model which superimposes the random connectivity needed for barcode generation with the structured connectivity needed for predictive map formation. We found that such a model was able to perform both predictive and barcode functions, suggesting a path forward to connecting different lines of hippocampal modeling in future work.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Xiong and colleagues investigate the mechanisms operating downstream to TRIM32 and controlling myogenic progression from proliferation to differentiation. Overall, the bulk of the data presented is robust. Although further investigation of specific aspects would make the conclusions more definitive (see below), it is an interesting contribution to the field of scientists studying the molecular basis of muscle diseases.

      We thank the Reviewer for appreciating our work and for their valuable suggestions to improve our manuscript. We have carefully addressed some of the concerns raised, as detailed here, while others, which require more experimental efforts, will be addressed as detailed in the Revision Plan.

      In my opinion, a few aspects would improve the manuscript. Firstly, the conclusion that Trim32 regulates c-Myc mRNA stability could be expanded and corroborated by further mechanistic studies:

      1. Studies investigating whether Tim32 binds directly to c-Myc RNA. Moreover, although possibly beyond the scope of this study, an unbiased screening of RNA species binding to Trim32 would be informative. Authors’ response. This point will be addressed as detailed in the Revision Plan

      If possible, studies in which the overexpression of different mutants presenting specific altered functional domains (NHL domain known to bind RNAs and Ring domain reportedly involved in protein ubiquitination) would be used to test if they are capable or incapable of rescuing the reported alteration of Trim32 KO cell lines in c-Myc expression and muscle maturation.

      Authors’ response. This point will be addressed as detailed in the Revision Plan

      An optional aspect that might be interesting to explore is whether the alterations in c-Myc expression observed in C2C12 might be replicated with primary myoblasts or satellite cells devoid of Trim32.

      Authors’ response. This point will be addressed as detailed in the Revision Plan

      I also have a few minor points to highlight:

        • It is unclear if the differences highlighted in graphs 5G, EV5D, and EV5E are statistically significant.*

      Authors’ response. We thank the Reviewer for raising this point. We now indicated the statistical analyses performed on the data presented in the mentioned figures (according also to a point of Reviewer #3). According to the conclusion that Trim32 is necessary for proper regulation of c-Myc transcript stability, using 2-way-ANOVA, the data now reported as Figure 5G show the statistically significant effect of the genotype at 6h (right-hand graph) but not at D0 (left-hand graph). In the graphs of Fig. EV5 D and E at D0 no significant changes are observed whereas at 6h the data show significant difference at the 40 min time point. We included this info in the graphs and in the corresponding legends.

      - On page 10, it is stated that c-Myc down-regulation cannot rescue KO myotube morphology fully nor increase the differentiation index significantly, but the corresponding data is not shown. Could the authors include those quantifications in the manuscript?

      Authors’ response. As suggested, we included the graph showing the differentiation index upon c-Myc silencing in the Trim32 KO clones and in the WT clones, as a novel panel in Figure 6 (Fig. 6D). As already reported in the text, a partial recovery of differentiation index is observed but the increase is not statistically significant. In contrast, no changes are observed applying the same silencing in the WT cells. Legend and text were modified accordingly.

      Reviewer #1 (Significance (Required)):

      The manuscript offers several strengths. It provides novel mechanistic insight by identifying a previously unrecognized role for Trim32 in regulating c-Myc mRNA stability during the onset of myogenic differentiation. The study is supported by a robust methodology that integrates CRISPR/Cas9 gene editing, transcriptomic profiling, flow cytometry, biochemical assays, and rescue experiments using siRNA knockdown. Furthermore, the work has a disease relevance, as it uncovers a mechanistic link between Trim32 deficiency and impaired myogenesis, with implications for the pathogenesis of LGMDR8. * * At the same time, the study has some limitations. The findings rely exclusively on the C2C12 myoblast cell line, which may not fully represent primary satellite cell or in vivo biology. The functional rescue achieved through c-Myc knockdown is only partial, restoring Myogenin expression but not the full differentiation index or morphology, indicating that additional mechanisms are likely involved. Although evidence supports a role for Trim32 in mRNA destabilization, the precise molecular partners-such as RNA-binding activity, microRNA involvement, or ligase function-remain undefined. Some discrepancies with previous studies, including Trim32-mediated protein degradation of c-Myc, are acknowledged but not experimentally resolved. Moreover, functional validation in animal models or patient-derived cells is currently lacking. Despite these limitations, the study represents an advancement for the field. It shifts the conceptual framework from Trim32's canonical role in protein ubiquitination to a novel function in RNA regulation during myogenesis. It also raises potential clinical implications by suggesting that targeting the Trim32-c-Myc axis, or modulating c-Myc stability, may represent a therapeutic strategy for LGMDR8. This work will be of particular interest to muscle biology researchers studying myogenesis and the molecular basis of muscle disease, RNA biology specialists investigating post-transcriptional regulation and mRNA stability, and neuromuscular disease researchers and clinicians seeking to identify new molecular targets for therapeutic intervention in LGMDR8. * * The Reviewer expressing this opinion is an expert in muscle stem cells, muscle regeneration, and muscle development.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: * * In this study, the authors sought to investigate the molecular role of Trim32, a tripartite motif-containing E3 ubiquitin ligase often associated with its dysregulation in Limb-Girdle Muscular Dystrophy Recessive 8 (LGMDR8), and its role in the dynamics of skeletal muscle differentiation. Using a CRISPR-Cas9 model of Trim32 knockout in C2C12 murine myoblasts, the authors demonstrate that loss of Trim32 alters the myogenic process, particularly by impairing the transition from proliferation to differentiation. The authors provide evidence in the way of transcriptomic profiling that displays an alteration of myogenic signaling in the Trim32 KO cells, leading to a disruption of myotube formation in-vitro. Interestingly, while previous studies have focused on Trim32's role in protein ubiquitination and degradation of c-Myc, the authors provide evidence that Trim32-regulation of c-Myc occurs at the level of mRNA stability. The authors show that the sustained c-Myc expression in Trim32 knockout cells disrupts the timely expression of key myogenic factors and interferes with critical withdrawal of myoblasts from the cell cycle required for myotube formation. Overall, the study offers a new insight into how Trim32 regulates early myogenic progression and highlights a potential therapeutic target for addressing the defects in muscular regeneration observed in LGMDR8.

      We thank the Reviewer for valuing our work and for their appreciated suggestions to improve our manuscript. We have carefully addressed some of the concerns raised as detailed here, while others, which require more laborious experimental efforts, will be addressed as reported in the Revision Plan.

      Major Comments:

      The work is a bit incremental based on this:

      https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030445 * * And this:

      https://www.nature.com/articles/s41418-018-0129-0 * * To their credit, the authors do cite the above papers.

      Authors’ response. We thank the Reviewer for this careful evaluation of our work against the current literature and for recognising the contribution of our findings to the understanding of myogenesis complex picture in which the involvement of Trim32 and c-Myc, and of the Trim32-c-Myc axis, can occur at several stages and likely in narrow time windows along the process, thus possibly explaining some reports inconsistencies.

      The authors do provide compelling evidence that Trim32 deficiency disrupts C2C12 myogenic differentiation and sustained c-Myc expression contributes to this defective process. However, while knockdown of c-Myc does restore Myogenin levels, it was not sufficient to normalize myotube morphology or differentiation index, suggesting an incomplete picture of the Trim32-dependent pathways involved. The authors should qualify their claim by emphasizing that c-Myc regulation is a major, but not exclusive, mechanism underlying the observed defects. This will prevent an overgeneralization and better align the conclusions with the author's data.

      Authors’ response. We agree with the Reviewer and we modified our phrasing that implied Trim32-c-Myc axis as the exclusive mechanism by explicitly indicated that other pathways contribute to guarantee proper myogenesis, in the Abstract and in Discussion.

      The Abstract now reads: … suggesting that the Trim32–c-Myc axis may represent an essential hub, although likely not the exclusive molecular mechanism, in muscle regeneration within LGMDR8 pathogenesis.”

      The Discussion now reads: “Functionally, we demonstrated that c-Myc contributes to the impaired myogenesis observed in Trim32 KO clones, although this is clearly not the only factor involved in the Trim32-mediated myogenic network; realistically other molecular mechanisms can participate in this process as also suggested by our transcriptomic results.”

      The authors provide a thorough and well-executed interrogation of cell cycle dynamics in Trim32 KO clones, combining phosphor-histone H3 flow cytometry of DNA content, and CFSE proliferation assays. These complementary approaches convincingly show that, while proliferation states remain similar in WT and KO cells, Trim32-deficient myoblasts fail in their normal withdraw from the cell cycle during exposure to differentiation-inducing conditions. This work adds clarity to a previously inconsistent literature and greatly strengthens the study.

      Authors’ response. We thank the Reviewer for appreciating our thorough analyses on cell cycle dynamics in proliferation conditions and at the onset of the differentiation process.

      The transcriptomic analysis (detailed In the "Transcriptomic analysis of Trim32 WT and KO clones along early differentiation" section of Results) is central to the manuscript and provides strong evidence that Trim32 deficiency disrupts normal differentiation processes. However, the description of the pathway enrichment results is highly detailed and somewhat compressed, which may make it challenging for readers to following the key biological 'take-homes'. The narrative quickly moves across their multiple analyses like MDS, clustering, heatmaps, and bubble plots without pausing to guide the reader through what each analysis contributes to the overall biological interpretation. As a result, the key findings (reduced muscle development pathways in KO cells and enrichment of cell cycle-related pathways) can feel somewhat muted. The authors may consider reorganizing this section, so the primary biological insights are highlighted and supported by each of their analyses. This would allow the biological implications to be more accessible to a broader readership.

      Authors’ response. We thank the Reviewer for raising this point and apologise for being too brief in describing the data, leaving indeed some points excessively implicit. As suggested, we now reorganised this session and added the lists of enriched canonical pathways relative to WT vs KO comparisons at D0 and D3 (Fig. EV3B) as well as those relative to the comparison between D0 and D3 for both WT and Trim32 KO samples (Fig. EV3C), with their relative scores. We changed the Results section “Transcriptomic analysis of Trim32 WT and Trim32 KO clones along early differentiationas reported here below and modified the legends accordingly.

      The paragraph now reads: Based on our initial observations, the absence of Trim32 already exerts a significant impact by day 3 (D3) of C2C12 myogenic differentiation. To investigate how Trim32 influences early global transcriptional changes during the proliferative phase (D0) and early differentiation (D3), we performed an unbiased transcriptomic profiling of WT and Trim32 KO clones (Fig. 2A). Multidimensional Scaling (MDS) analysis revealed clear segregation of gene expression profiles based on both time of differentiation (Dim1, 44% variance) and Trim32 genotype (Dim2, 16% variance) (Fig. 2A). Likewise, hierarchical clustering grouped WT and Trim32 KO clones into distinct clusters at both timepoints, indicating consistent genotype-specific transcriptional differences (Fig. EV3A). Differentially Expressed Genes (DEGs) were detected in the Trim32 KO transcriptome relative to WT, at both D0 and D3. In proliferating conditions, 72 genes were upregulated and 189 were downregulated whereas at D3 of differentiation, 72 genes were upregulated and 212 were downregulated. Ingenuity Pathway Analysis of the DEGs revealed the top 10 Canonical Pathways displayed in Fig. EV3B as enriched at either D0 or D3 (Fig. EV3B). Several of these pathways can underscore relevant Trim32-mediated functions though most of them represent generic functions not immediately attributable to the observed myogenesis defects.

      Notably, the transcriptional divergence between WT and Trim32 KO cells is more pronounced at D3, as evidenced by a greater separation along the MSD Dim2 axis, suggesting that Trim32-dependent transcriptional regulation intensifies during early differentiation (Fig. 2A). Given our interest in the differentiation process, we therefore focused our analyses comparing the changes occurring from D0 to D3 in WT (WT D3 vs. D0) and in Trim32 KO (KO D3 vs. D0) RNAseq data.

      Pathway enrichment analysis of D3 vs. D0 DEGs allowed the selection of the top-scored pathways for both WT and Trim32 KO data. We obtained 18 top-scored pathways enriched in each genotype (-log(p-value) ³ 9 cut-off): 14 are shared while 4 are top-ranked only in WT and 4 only in Trim32 KO (Fig. EV3C). For the following analyses, we employed thus a total of 22 distinct pathways and to better mine those relevant in the passage from the proliferation stage to the early differentiation one and that are affected by the lack of Trim32, we built a bubble plot comparing side-by-side the scores and enrichment of the 22 selected top-scored pathways above in WT and Trim32 KO (Fig. 2B). A heatmap of DEGs included within these selected pathways confirms the clustering of the samples considering both the genotypes and the timepoints highlighting gene expression differences (Fig. 2C). These pathways are mainly related to muscle development, cell cycle regulation, genome stability maintenance and few other metabolic cascades.

      As expected given the results related to Figure 1, moving from D0 to D3 WT clones showed robust upregulation of key transcripts associated with the Inactive Sarcomere Protein Complex, a category encompassing most genes in the “Striated Muscle Contraction” pathway, while in Trim32 KO clones this pathway was not among those enriched in the transition from D0 to D3 (Fig. EV3C). Detailed analyses of transcripts enclosed within this pathway revealed that on the transition from proliferation to differentiation, WT clones show upregulation of several Myosin Heavy Chain isoforms (e.g., MYH3, MYH6, MYH8), α-Actin 1 (ACTA1), α-Actinin 2 (ACTN2), Desmin (DES), Tropomodulin 1 (TMOD1), and Titin (TTN), a pattern consistent with previous reports, while these same transcripts were either non-detected or only modestly upregulated in Trim32 KO clones at D3 (Fig. 2D). This genotype-specific disparity was further confirmed by gene set enrichment barcode plots, which demonstrated significant enrichment of these muscle-related transcripts in WT cells (FDR_UP = 0.0062), but not in Trim32 KO cells (FDR_UP = 0.24) (Fig. EV3D). These findings support an early transcriptional basis for the impaired myogenesis previously observed in Trim32 KO cells.

      In addition to differences in muscle-specific gene expression, we observed that also several pathways related to cell proliferation and cell cycle regulation were more enriched in Trim32 KO cells compared to WT. This suggests that altered cell proliferation may contribute to the distinct differentiation behavior observed in Trim32 KO versus WT (Fig. 2B). Given that cell cycle exit is a critical prerequisite for the onset of myogenic differentiation and considering that previous studies on Trim32 role in cell cycle regulation have reported inconsistent findings, we further examined cell cycle dynamics under our experimental conditions to clarify Trim32 contribution to this process

      The work would be greatly strengthened by the conclusion of LGMDR8 primary cells, and rescue experiments of TRIM32 to explore myogenesis.

      Authors’ response. This point will be addressed as detailed in the Revision Plan

      Also, EU (5-ethynyl uridine) pulse-chase experiments to label nascent and stable RNA coupled with MYC pulldowns and qPCR (or RNA-sequencing of both pools) would further enhance the claim that MYC stability is being affected.

      Authors’ response. This point will be addressed as detailed in the Revision Plan

      "On one side, c-Myc may influence early stages of myogenesis, such as myoblast proliferation and initial myotube formation, but it may not contribute significantly to later events such as myotube hypertrophy or fusion between existing myotubes and myocytes. This hypothesis is supported by recent work showing that c-Myc is dispensable for muscle fiber hypertrophy but essential for normal MuSC function (Ham et al, 2025)." Also address and discuss the following, as what is currently written is not entirely accurate: https://www.embopress.org/doi/full/10.1038/s44319-024-00299-z and https://journals.physiology.org/doi/prev/20250724-aop/abs/10.1152/ajpcell.00528.2025

      Authors’ response. We thank the Reviewer for bringing to our attention these two publications, that indeed, add important piece of data to recapitulate the in vivo complexity of c-Myc role in myogenesis. We included this point in our Discussion.

      The Discussion now reads: “On one side, c-Myc may influence early stages of myogenesis, such as myoblast proliferation and initial myotube formation, but it may not contribute significantly to later events such as myotube hypertrophy or fusion between existing myotubes and myocytes. This hypothesis is supported by recent work showing that c-Myc is dispensable for muscle fiber hypertrophy but essential for normal MuSC function (Ham et al, 2025). Other reports, instead, demonstrated the implication of c-Myc periodic pulses, mimicking resistance-exercise, in muscle growth, a role that cannot though be observed in our experimental model (Edman et al., 2024; Jones et al., 2025).”

      Minor Comments:

      Z-score scale used in the pathway bubble plot (Figure 2C) could benefit from alternative color choices. Current gradient is a bit muddy and clarity for the reader could be improved by more distinct color options, particularly in the transition from positive to negative Z-score.

      Authors’ response. As suggested, we modified the z-score-representing colors using a more distinct gradient especially in the positive to negative transition in Figure 2B.

      Clarification on the rationale for selecting the "top 18" pathways would be helpful, as it is not clear if this cutoff was chosen arbitrarily or reflects a specific statistical or biological threshold.

      Authors’ response. As now better explained (see comment regarding Major point: Transcriptomics), we used a cut-off of -log(p-value) above or equal to 9 for pathways enriched in DEGs of the D0 vs D3 comparison for both WT and Trim32 KO. The threshold is now included in the Results section and the pathways (shared between WT and Trim32 KO and unique) are listed as Fig. EV3C.

      The authors alternates between using "Trim 32 KO clones" and "KO clones" throughout the manuscript. Consistent terminology across figures and text would improve readability.

      Authors’ response. We thank the Reviewer for this remark, and we apologise for having overlooked it. We amended this throughout the manuscript by always using for clarity “Trim32 KO clones/cells”.

      Cell culture methodology does not specify passage number or culture duration (only "At confluence") before differentiation. This is important, as C2C12 differentiation potential can drift with extended passaging.

      Authors’ response. We agree with the Reviewer that C2C12 passaging can reduce the differentiation potential of this myoblast cell lines; this is indeed the main reason why we decided to employ WT clones, which underwent the same editing process as those that resulted mutated in the Trim32 gene, as reference controls throughout our study. We apologise for not indicating the passages in the first version of the manuscript that now is amended as per here below in the Methods section:

      The C2C12 parental cells used in this study were maintained within passages 3–8. All clonal cell lines (see below) were utilized within 10 passages following gene editing. In all experiments, WT and Trim32 KO clones of comparable passage numbers were used to ensure consistency and minimize passage-related variability.

      Reviewer #2 (Significance (Required)):

      General Assessment:

      This study provides a thorough investigation of Trim32's role the processes related to skeletal muscle differentiation using a CRISPR-Cas9 knockout C2C12 model. The strengths of this study lie in the multi-layered experimental approach as the authors incorporated transcriptomics, cell cycle profiling, and stability assays which collectively build a strong case for their hypothesis that Trim32 is a key factor in the normal regulation of myogenesis. The work is also strengthened by the use of multiple biological and technical replicates, particularly the independent KO clones which helps address potential clonal variation issues that could occur. The largest limitation to this study is that, while the c-Myc mechanism is well explored, the other Trim32-dependent pathways associated with the disruption (implicated by the incomplete rescue by c-Myc knockdown) are not as well addressed. Overall however, the study convincingly identifies a critical function for Trim32 during skeletal muscle differentiation. * * Advance: * * To my knowledge, this is the first study to demonstrate the mRNA stability level of c-Myc regulation by Trim32, rather than through the ubiquitin-mediated protein degradation. This work will advance the current understanding and provide a more complete understanding of Trim32's role in c-Myc regulation. Beyond c-Myc, this work highlights the idea that TRIM family proteins can influence RNA stability which could implicate a broader role in RNA biology and has potential for future therapeutic targeting. * * Audience: * * This research will be of interest to an audience that focuses on broad skeletal muscle biology but primarily to readers with more focused research such as myogenesis and neuromuscular disease (LGMDR8 in particular) where the defined Trim32 governance over early differentiation checkpoints will be of interest. It will also provide mechanistic insights to those outside of skeletal muscle that study TRIM family proteins, ubiquitin biology, and RNA regulation. For translational/clinical researchers, it identifies the Trim32/c-Myc axis as a potential therapeutic target for LGMDR8 and related muscular dystrophies.

      Expertise: * * My expertise lies in skeletal muscle biology, gene editing, transgenic mouse models, and bioinformatics. I feel confident evaluating the data and conclusions as presented.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      • In this paper, the authors examine the role of TRIM32, implicated in limb girdle muscular dystrophy recessive 8 (LGMDR8), in the differentiation of C2C12 mouse myoblasts. Using CRISPR, they generate mutant and wild-type clones and compare their differentiation capacity in vitro. They report that Trim32-deficient clones exhibit delayed and defective myogenic differentiation. RNA-seq analysis reveals widespread changes in gene expression, although few are validated by independent methods. Notably, Trim32 mutant cells maintain residual proliferation under differentiation conditions, apparently due to a failure to downregulate c-Myc. Translation inhibition experiments suggest that TRIM32 promotes c-Myc mRNA destabilization, but this conclusion is insufficiently substantiated. The authors also perform rescue experiments, showing that c-Myc knockdown in Trim32-deficient cells alleviates some differentiation defects. However, this rescue is not quantified, was conducted in only two of the three knockout lines, and is supported by inappropriate statistical analysis of gene expression. Overall, the manuscript in its current form has substantial weaknesses that preclude publication. Beyond statistical issues, the major concerns are: (1) exclusive reliance on the immortalized C2C12 line, with no validation in primary/satellite cells or in vivo, (2) insufficient mechanistic evidence that TRIM32 acts directly on c-Myc mRNA, and (3) overinterpretation of disease relevance in the absence of supporting patient or in vivo data. Please find more details below:*

      We thank the Reviewer for the in-depth assessment of our work and precious suggestions to improve the manuscript. We have carefully addressed some of the concerns raised, as detailed here, while others, which require more experimental efforts, will be addressed as detailed in the Revision Plan.

      - TRIM32 complementation / rescue experiments to exclude clonal or off-target CRISPR effects and show specificity are lacking.

      Authors’ response. This point will be addressed as detailed in the Revision Plan

      - The authors link their in vitro findings to LGMDR8 pathogenesis and propose that the Trim32-c-Myc axis may serve as a central regulator of muscle regeneration in the disease. However, LGMDR8 is a complex disorder, and connecting muscle wasting in patients to differentiation assays in C2C12 cells is difficult to justify. No direct evidence is provided that the proposed mRNA mechanism operates in patient-derived samples or in mouse satellite cells. Moreover, the partial rescue achieved by c-Myc knockdown (which does not fully restore myotube morphology or differentiation index) further suggests that the disease connection is not straightforward. Validation of the TRIM32-c-Myc axis in a physiologically relevant system, such as LGMD patient myoblasts or Trim32 mutant mouse cells, would greatly strengthen the claim.

      Authors’ response. This point will be addressed as detailed in the Revision Plan

      -Some gene expression changes from the RNA-seq study in Figure 2 should be validated by qPCR

      Authors’ response. We thank the reviewer for this suggestion. This point will be addressed as detailed in the Revision Plan. We have selected several transcripts that will be evaluated in independent samples in order to validate the RNAseq results.

      - The paper shows siRNA knockdown of c-Myc in KO restores Myogenin RNA/protein but does not fully rescue myotube morphology or differentiation index. This suggests that Trim32 controls additional effectors beyond c-Myc; yet the authors do not pursue other candidate mediators identified in the RNA-seq. The manuscript would be strengthened by systematically testing whether other deregulated transcripts contribute to the phenotype.

      Authors’ response. This point will be addressed as detailed in the Revision Plan

      - There are concerns with experimental/statistical issues and insufficient replicate reporting. The authors use unpaired two-tailed Student's t-test across many comparisons; multiple testing corrections or ANOVA where appropriate should be used. In Figure EV5B and Figure 6B, the authors perform statistical analyses with control values set to 1. This method masks the inherent variability between experiments and artificially augments p values. Control sample values need to be normalized to one another to have reliable statistical analysis. Myotube morphology and differentiation index quantifications need clear description of fields counted, blind analysis, and number of biological replicates.

      Authors’ response. We thank the Reviewer for raising this point.

      Regarding the replicates, we clarified in the Methods and Legends that the Trim32 KO experiments have been performed on 3 biological replicates (independent clones) and the same for the reference control (3 independent WT clones), except for the Fig. 6 experiments that were performed on 2 Trim32 KO and 2 WT clones. All the Western Blots, immunofluorescence, qPCR data are representative of the results of at least 3 independent experiments unless otherwise stated. We reported the number and type of replicates as well as the microscope fields analyzed.

      We repeated the statistical analyses of the data in Figure 5G, EV5D, EV5E, employing more appropriately the 2-way-ANOVA test, as suggested, and we now reported this info in the graphs and legends.

      We thank the Reviewer for raising this point, we agree and substituted the graphs in Fig. EV5B and 6B showing the control values normalised as suggested. The statistical analyses now reflect this change.

      -Some English mistakes require additional read-throughs. For example: "Indeed, Trim32 has no effect on the stability of c-Myc mRNA in proliferating conditions, but upon induction of differentiation the stability of c-Myc mRNA resulted enhanced in Trim32 KO clones (Fig. 5G, Fig. EV5D and 5E)."

      Authors’ response. We re-edited this revised version of the manuscript as suggested.

      -Results in Figure 5A should be quantified

      Authors’ response. We amended this point by quantifying the results shown in Fig. 5A, we added the graph of the quantification of 3 experimental replicates to the Figure. Quantification confirms that no statistically significant difference is observed. The Figure and the relative legend are modified accordingly.

      -Based on the nuclear marker p84, the separation of cytoplasmic and nuclear fractions is not ideal in Figure 5D

      Authors’ response. We agree with the Reviewer that the presence of p84 also in the cytoplasmic fraction is not ideal. Regrettably, we observed this faint p84 band in all the experiments performed. We think however, that this is not impacting on the result that clearly shows that c-Myc and Trim32 are never detected in the same compartment.

      -In Figure 6, it is not appropriate to perform statistical analyses on only two data points per condition.

      Authors’ response. We agree with the Reviewer and we now show the graph of the results of the 3 technical replicates for 2 biological replicates and do not indicate any statistics (Fig. 6B). The graph was also modified according to a previous point raised.

      -The nuclear MYOG phenotype is very interesting; could this be related to requirements of TRIM32 in fusion?

      Authors’ response. We agree with the Reviewer that Trim32 might also be necessary for myoblast fusion. This point is however beyond the scope of the present study and will be addressed in future work.

      - The hypothesis that TRIM32 destabilizes c-Myc mRNA is intriguing but requires stronger mechanistic support. This would be more convincing with RNA immunoprecipitation to test direct association with c-Myc mRNA, and/or co-immunoprecipitation to identify interactions between TRIM32 and proteins involved in mRNA stability. The study would also be strengthened by reporter assays, such as c-Myc 3′UTR luciferase constructs in WT and KO cells, to directly demonstrate 3′UTR-dependent regulation of mRNA stability.

      Authors’ response. This point will be addressed as detailed in the Revision Plan

      Reviewer #3 (Significance (Required)):

      The manuscript presents a minor conceptual advance in understanding TRIM32 function in myogenic differentiation. Its main limitation is that all experiments were performed in C2C12 cells. While C2C12 are a classical system to study muscle differentiation, they are an immortalized, long-cultured, and genetically unstable line that represents a committed myoblast stage rather than bona fide satellite cells. They therefore do not fully model the biology of early regenerative responses. Several TRIM32 phenotypes reported in the literature differ between primary satellite cells and cell lines, and the authors themselves note such discrepancies. Extrapolating these findings to LGMDR8 pathogenesis without validation in primary human myoblasts, satellite cell assays, or in vivo regeneration models is therefore not justified. Previous work has already established clear roles for TRIM32 in mouse satellite cells in vivo and in patient myoblasts in vitro, whereas this study introduces a novel link to c-Myc regulation during differentiation. In addition, without mechanistic evidence, the central claim that TRIM32 regulates c-Myc mRNA stability remains descriptive and incomplete. Nevertheless, the results will be of interest to researchers studying LGMD and to those exploring TRIM32 biology in broader contexts. I review this manuscript as a muscle biologist with expertise in satellite cell biology and transcriptional regulation.

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Reply to the Reviewers

      I thank the Referees for their...

      Referee #1

      1. The authors should provide more information when...

      Responses + The typical domed appearance of a hydrocephalus-harboring skull is apparent as early as P4, as shown in a new side-by-side comparison of pups at that age (Fig. 1A). + Though this is not stated in the MS 2. Figure 6: Why has only...

      Response: We expanded the comparison

      Minor comments:

      1. The text contains several...

      Response: We added...

      Referee #2

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Reply to the Reviewers

      I thank the Referees for their...

      Referee #1

      1. The authors should provide more information when...

      Responses + The typical domed appearance of a hydrocephalus-harboring skull is apparent as early as P4, as shown in a new side-by-side comparison of pups at that age (Fig. 1A). + Though this is not stated in the MS 2. Figure 6: Why has only...

      Response: We expanded the comparison

      Minor comments:

      1. The text contains several...

      Response: We added...

      Referee #2

    1. Reviewer #2 (Public review):

      A long-standing debate in the field of Pavlovian learning relates to the phenomenon of timescale invariance in learning i.e. that the rate at which an animal learns about a Pavlovian CS is driven by the relative rate of reinforcement of the cue (CS) to the background rate of reinforcement. In practice, if a CS is reinforced on every trial, then the rate of acquisition is determined by the relative duration of the CS (T) and the ITI (C = inter-US-interval = duration of CS + ITI), specifically the ratio of C/T. Therefore, the point of acquisition should be the same with a 10s CS and a 90s ITI (T = 10; C = 90 + 10 = 100, C/T = 100/10 = 10) and with a 100s CS and a 900s ITI (T = 100; C = 900 + 100 = 1000, C/T = 1000/100 = 10). That is to say, the rate of acquisition is invariant to the absolute timescale as long as this ratio is the same. This idea has many other consequences, but is also notably different from more popular prediction-error based associative learning models such as the Rescorla-Wagner model. The initial demonstrations that the ratio C/T predicts the point of acquisition across a wide range of parameters (both within and across multiple studies) was conducted in Pigeons using a Pavlovian autoshaping procedure. What has remained under contention is whether or not this relationship holds across species, particularly in the standard appetitive Pavlovian conditioning paradigms used in rodents. The results from rodent studies aimed at testing this have been mixed, and often the debate around the source of these inconsistent results focuses on the different statistical methods used to identify the point of acquisition for the highly variable trial-by-trial responses at the level of individual animals.

      The authors successfully replicate the same effect found in pigeon autoshaping paradigms decades ago (with almost identical model parameters) in a standard Pavlovian appetitive paradigm in rats. They achieve this through a clever change the experimental design, using a convincingly wide range of parameters across 14 groups of rats, and by a thorough and meticulous analysis of these data. It is also interesting to note that the two authors have published on opposing sides of this debate for many years, and as a result have developed and refined many of the ideas in this manuscript through this process.

      Main findings

      (1) The present findings demonstrate that the point of initial acquisition of responding is predicted by the C/T ratio.

      (2) The terminal rates of responding to the CS appear to be related to the reinforcement rate of the CS (T; specifically, 1/T) but not its relation to the reinforcement rate of the context (i.e. C or C/T). In the present experiment, all CS trials were reinforced so it is also the case that the terminal rate of responding was related to the duration of the CS.

      (3) An unexpected finding was that responding during the ITI was similarly related to the rate of contextual reinforcement (1/C). This novel finding suggests that the terminal rate of responding during the ITI and the CS are related to their corresponding rates of reinforcement. This finding is surprising as it suggests that responding during the ITI is not being driven by the probability of reinforcement during the ITI.

      (4) Finally, the authors characterised the nature of increased responding from the point of initial acquisition until responding peaks at a maximum. Their analyses suggest that nature of this increase was best described as linear in the majority of rats, as opposed to the non-linear increase that might be predicted by prediction error learning models (e.g. Rescorla-Wagner). However, more detailed analyses revealed that these changes can be quite variable across rats, and more variable when the CS had lower informativeness (defined as C/T).

      Strengths and Weaknesses:

      There is an inherent paradox regarding the consistency of the acquisition data from Gibbon & Balsam's (1981) meta-analysis of autoshaping in pigeons, and the present results in magazine response frequency in rats. This consistency is remarkable and impressive, and is suggestive of a relatively conserved or similar underlying learning principle. However, the consistency is also surprising given some significant differences in how these experiments were run. Some of these differences might reasonably be expected to lead to differences in how these different species respond. For example:

      The autoshaping procedure commonly used in the pigeons from these data were pretrained to retrieve rewards from a grain hopper with an instrumental contingency between head entry into the hopper and grain availability. During Pavlovian training, pecking the key light also elicited an auditory click feedback stimulus, and when the grain hopper was made available, the hopper was also illuminated.

      In the present experimental procedure, the rats were not given contextual exposure to the pellet reinforcers in the magazine (e.g. a magazine training session is typically found in similar rodent procedures). The Pavlovian CS was a cue light within the magazine itself.

      These design features in the present rodent experiment are clearly intentional. Pretraining with the reinforcer in the testing chambers would reasonably alter the background rate of reinforcement (parameter), so it make sense not to include this but differs from the paradigm used in pigeons. Having the CS inside the magazine where pellets are delivered provides an effective way to reduce any potential response competition between CS and US directed responding and combines these all into the same physical response. This makes the magazine approach response more like the pecking of the light stimulus in the pigeon autoshaping paradigm. However, the location of the CS and US is separated in pigeon autoshaping, raising questions about why the findings across species are consistent despite these differences.

      Intriguingly, when the insertion of a lever is used as a Pavlovian cue in rodent studies, CS directed responding (sign-tracking) often develops over training such that eventually all animals bias their responding towards the lever than towards the US (goal-tracking at the magazine). However, the nature of this shift highlights the important point that these CS and US directed responses can be quite distinct physically as well as psychologically. Therefore, by conflating the development of these different forms of responding, it is not clear whether the relationship between C/T and the acquisition of responding describes the sum of all Pavlovian responding or predominantly CS or US directed responding.

      Another interesting aspect of these findings is that there is a large amount of variability that scales inversely with C/T. A potential account of the source of this variability is related to the absence of preexposure to the reward pellets. This is normally done within the animals' homecage as a form of preexposure to reduce neophobia. If some rats take longer to notice and then approach and finally consume the reward pellets in the magazine, the impact of this would systematically differ depending on the length of the ITI. For animals presented with relatively short CSs and ITIs, they may essentially miss the first couple of trials and/or attribute uneaten pellets accumulating in the magazine to the background/contextual rate of reinforcement. What is not currently clear is whether this was accounted for in some way by confirming when the rats first started retrieving and consuming the rewards from the magazine.

      While the generality of these findings across species is impressive, the very specific set of parameters employed to generate these data raise questions about the generality of these findings across other standard Pavlovian conditioning parameters. While this is obviously beyond the scope of the present experiment, it is important to consider that the present study explored a situation with 100% reinforcement on every trial, with a variable duration CS (drawn form a uniform distribution), with a single relatively brief CS (maximum of 122s) CS and a single US. Again, the choice of these parameters in the present experiment is appropriate and very deliberately based on refinements from many previous studies from the authors. This includes a number of criteria used to define magazine response frequency which includes discarding specific responses (discussed and reasonably justified clearly in the methods section). Similarly, the finding that terminal rates of responding are reliably related to 1/T is surprising, and it is not clear whether this might be a property specific to this form of variable duration CS, the use of a uniform sampling distribution, or the use of only a single CS. However, it is important to keeps these limitations in mind when considering some of the claims made in the discussion section of this manuscript that go beyond what these data can support.

    2. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Conceptually, I feel that the authors addressed many concerns. However, I am still not convinced that their data support the strength of their claims. Additionally, I spent considerable time investigating the now freely available code and data and found several inconsistencies that would be critical to rectify. My comments are split into two parts, reflecting concerns related to the responses/methods and concerns resulting from investigation of the provided code/data. The former is described in the public review above. Because I show several figures to illustrate some key points for the latter part, an attached file will provide the second part: https://elife-rp.msubmit.net/elife-rp_files/2025/02/24/00136468/01/136468_1_attach_15_2451_convrt.pdf

      (1) This point is discussed in more detail in the attached file, but there are some important details regarding the identification of the learned trial that require more clarification. For instance, isn’t the original criterion by Gibbon et al. (1977) the first “sequence of three out of four trials in a row with at least one response”? The authors’ provided code for the Wilcoxon signed rank test and nDkl thresholds looks for a permanent exceeding of the threshold. So, I am not yet convinced that the approaches used here and in prior papers are directly comparable.

      We agree that there remain unresolved issues with our two attempts to create criteria that match that used by Gibbon and Balsam for trials to criterion. Therefore, we have decided to remove those analyses and return to our original approach showing trials to acquisition using several different criteria so as to demonstrate that the essential feature of the results—the scaling between learning rate and information—is robust. Figure 2A shows the results for a criterion that identifies the trial after which the cumulative response rate during the CS (=cumulative CS response count from Trial 1 divided by cumulative CS time from Trial 1) is consistently above the cumulative overall response rate across the trial (i.e., including both the CS and ITI). These data compare the CS response rate with the overall response rate, rather than with ITI rate as done in the previous version (in Figure 3A of that submission), to be consistent with the subsequent comparisons that are made using the nDkl. (The nDkl relies on the comparison between the CS rate and the overall rate, rather than between the CS and ITI rates.) Figures 2B and 2C show trials to acquisition when two statistical criteria, based on the nDkl, are applied to the difference between CS and overall response rates (the criteria are for odds >= 4:1 and p<.05). As we now explain in the text, a statistical threshold is useful inasmuch as it provides some confidence to the claim that the animals had learned by a given trial. However, this trial is very likely to be after the point when they had learned because accumulating statistical evidence of a difference necessarily adds trials.

      Also, there’s still no regression line fitted to their data (Fig 3’s black line is from Fig 1,according to the legends). Accordingly, I think the claim in the second paragraph of the Discussion that the old data and their data are explained by a model with “essentially the same parameter value” is not yet convincing without actually reporting the parameters of the regression. Related to this, the regression for their data based on my analysis appears to have a slope closer to -0.6, which does not support strict timescale invariance. I think that this point should be discussed as a caveat in the manuscript.

      We now include regression lines fitted to our data in Figures 2A-C, and their slopes are reported in the figure note. We also note on page 14 of the revision that these regressions fitted to our data diverge from the black regression line (slope -1) as the informativeness increases. On pages 14-15, we offer an explanation for this divergence; that, in groups with high informativeness, the effective informativeness is likely to be lower than the assigned value because the rats had not been magazine trained which means they would not have discovered the food pellet as soon as it was released on the first few trials. On pages 15-16, we go on to note that evidence for a change in response rate during the CS in those very first few trials may have been missed because the initial response rates were very low in rats trained with very long inter-reinforcement intervals (and thus high informativeness). We also propose a solution to this problem of comparing between very low response rates, one that uses the nDkl to parse response rates into segments (clusters of trials with equivalent response rates). This analysis with parsed response rates provides evidence that differential responding to the CS may have been acquired earlier than is revealed using trial-by-trial comparisons.

      (2) The authors report in the response that the basis for the apparent gradual/multiple step-like increases after initial learning remains unclear within their framework. This would be important to point out in the actual manuscript Further, the responses indicating the fact that there are some phenomena that are not captured by the current model would be important to state in the manuscript itself.

      We have included a paragraph (on page 26) that discusses the interpretation of the steady/multi-step increase in responding across continued training.

      (3) There are several mismatches between results shown in figures and those produced by the authors’ code, or other supplementary files. As one example, rat 3 results in Fig 11 and Supplementary Materials don’t match and neither version is reproduced by the authors’ code. There are more concerns like this, which are detailed in the attached review file.

      Addressed next….

      The following is the response to the points raised in Part 2 of Reviewer 1’s pdf.

      (1a) I plotted the calculated nDkl with the provided code for rat 3 (Fig 11), but itlooks different, and the trials to acquisition also didn’t match with the table  provided (average of ~20 trial difference). The authors should revise the provided code and plots. Further, even in their provided figures, if one compares rat 3 in Supplementary Materials to data from the same rat in Fig 11, the curves are different. It is critical to have reproducible results in the manuscript, including the ability to reproduce with the provided code.

      We apologise for those inconsistencies. We have checked the code and the data in the figures to ensure they are all now consistent and match the full data in the nHT.mat file in OSF. Figures 11 and 12 from the previous version are now replaced with Figure 6 in the revised manuscript (still showing data from Rats 3 and 176). The data plotted in Fig 6 match what is plotted in the supplementary figures for those 2 rats (but with slightly different cropping of the x-axes) and all plots draw directly from nHT.mat.

      (1b) I tried to replicate also Fig 3C with the results from the provided code, but I failed especially for nDkl > 2.2. Fig 3A and B look to be OK.

      There was error in the previous Fig 3C which was plotting the data from the wrong column of the Trials2Acquisition Table. We suspect this arose because some changes to the file were not updated in Dropbox. However, that figure has changed (now Figure 2) as already mentioned, and no longer plots data obtained with that specific nDkl criterion. The figure now shows criteria that do not attempt to match the Gibbon and Balsam criterion.

      (1c) The trials to learn from the code do match with those in the  Trials2Acquisition Table, but the authors’ code doesn’t reproduce the reported trials to learn values in the nDkl Acquisition Table. The trials to learn from the code are ~20 trials different on average from the table’s ones, for 1:20, 1:100, and 1:1000 nDkl.

      We agree that discrepancies between those different files were a source of potential confusion because they were using different criteria or different ways of measuring response rate (i.e., the “conventional” calculation of rate as number of responses/time, vs our adjusted calculation in which the 1<sup>st</sup> response in the CS was excluded as well as the time spent in the magazine, vs parsed response rates based on inter-response intervals). To avoid this, there is now a single table called Acquisition_Table.xlsx in OSF that includes Trials to acquisition for each rat based on a range of criteria or estimates of response rate in labelled columns. The data shown in Figure 2 are all based on the conventional calculation of response rate (provided in Columns E to H of Acquisition_Table.xlsx). To make the source of these data explicit, we have provided in OSF the matlab code that draws the data from the nHT.mat file to obtain these values for trials-to-acquisition.

      (1d) The nDkl Acquisition Table has columns with the value of the nDkl statistics at various acquisition landmarks, but the value does not look to be true, especially for rat 19. The nDkl curve provided by the authors (Supplementary Materials) doesn’t match the values in the table. The curve is below 10 until at least 300 trials, while the table reports a value higher than 20 (24.86) at the earliest evidence of learning (~120 trials?).

      We are very grateful to the reviewer for finding this discrepancy in our previous files. The individual plots in the Supplementary Materials now contain a plot of the nDkl computed using the conventional calculation of response rate (plot 3 in each 6-panel figure) and a plot of the nDkl computed using the new adjusted calculation of response rate (plot 4). These correspond to the signed nDkl columns for each rat in the full data file nHT.mat. The nDkl values at different acquisition landmarks included in Acquisition_Table.xlsx (Cols AB to AF) correspond to the second of these nDkl formulations. We point out that, of the acquisition landmarks based on the conventional calculation of response rate (Cols E to J of Acquisition_Tabls.xlsx), only the first two landmarks (CSrate>Contextrate and min_nDkl) match the permanently positive and minimum values of the plotted nDkl values. This is because the subsequent acquisition landmarks are based on a recalculation of the nDkl starting from the trial when CSrate>ContextRate, whereas the plotted nDkl starts from Trial 1.

      (2) The cumulative number of responses during the trial (Total) in the raw data table is not measured directly, but indirectly estimated from the pre-CS period, as (cumNR_Pre*[cumITI/cumT_Pre])+ cumNR_CS (cumNR_Pre: cumulative nose-poke response number during pre-CS period; cumITI: cumulative sum of ITI duration; cumT_Pre: cumulative pre-CS duration; cumNR_CS: cumulative response number during CS), according to ‘Explanation of TbyTdataTable (MATLAB).docx’.Why not use the actual cumulative responses during the whole trial instead of using a noisier measure during a smaller time window and then scaling it for the total period?

      Unfortunately, the bespoke software used to control the experimental events and record the magazine activity did not record data continuously throughout the experiment. The ITI responses were only sampled during a specified time-window (the “pre-CS” period) immediately before each CS onset. Therefore, response counts across the whole ITI had to be extrapolated.

      (3) Regarding the “Matlab code for Find Trials to Criterion.docx”:

      (a) What’s the rationale for not using all the trials to calculate nDkl but starting the cumulative summation from the earliest evidence trial (truncated)? Also, this procedure is not described in the manuscript, and this should be mentioned.

      The procedure was perhaps not described clearly enough in the previous manuscript. We have expanded that text to make it clearer (page 12) which includes the text…

      “We started from this trial, rather than from Trial 1, because response rate data from trials prior to the point of acquisition would dilute the evidence for a statistically significant difference in responding once it had emerged, and thereby increase the number of trials required to observe significant responding to the CS. The data from Rat 1 illustrates this point. The CS response rate of Rat 1 permanently exceeded its overall response rate on Trial 52 (when the nD<sub>KL</sub> also became permanently positive). The nD<sub>KL</sub>, calculated from that trial onwards, surpassed 0.82 (odds 4:1) after a further 11 trials (on Trial 63) and reached 1.92 (p < .05) on Trial 81. By contrast, the nD<sub>KL</sub> for this rat, calculated from Trial 1, did not permanently exceed 0.82 until Trial 83 and did not exceed 1.92 until Trial 93, adding 10 or 20 trials to the point of acquisition.”

      (3b) The authors' threshold is the trial when the nDkl value exceeds the threshold permanently.  What about using just the first pass after the minimum?

      Rat 19 provides one example where the nDkl was initially positive, and even exceeded threshold for odds 4:1 and p<.05, but was followed by an extended period when the nDkl was negative because the CS response rate was less than the overall response rate. It illustrates why the first trial on which the nDkl passes a threshold cannot be used as a reliably index of acquisition.

      (3c) Can the authors explain why a value of 0.5 is added to the cumulative response number before dividing it by the cumulative time?

      This was done to provide an “unbiased” estimate of the response count because responses are integers. For example, if a rat has made 10 responses over 100 s of cumulative CS time, the estimated rate should be at least 10/100 but could be anything up to, but not including, 11/100. A rate of 10.5/100 is the unbiased estimate. However, we have now removed this step when calculating the nDkl to identify trials to acquisition because we recognise that it would represent a larger correction to the rate calculated across short intervals than across long intervals and therefore bias comparison between CS and overall response rates that involve very different time durations. As such, the correction would artefactually inflate evidence that the CS response rate was higher than the contextual response rate. However, as noted earlier in this reply, we have now instituted a similar correction when calculating the pre-CS response rate over the final 5 sessions for rats that did not register a single response (hence we set their response count to 0.5).

      (3d) Although the authors explain that nDkl was set to negative if pre-CS rate is higher than CS rate, this is not included in the code because the code calculates the nDkl using the truncated version, starting to accumulate the poke numbers and time from the earliest evidence, thus cumulative CS rate is always higher than cumulative contextual rate. I expect then that the cumulative CS rate will be always higher than the cumulative pre-CS rate.

      Yes, that is correct. The negative sign is added to the nDkl when it is computed starting from Trial 1. But when it is computed starting from the trial when the CS rate is permanently > the overall rate, there is no need to add a sign because the divergence is always in the positive direction.

      (3e) Regarding the Wilcoxon signed rank test, please clarify in the manuscript that the input ‘rate’ is not the cumulative rate as used for the earliest evidence. Please also clarify if the rates being compared for the signed nDkl are just the instantaneous rates or the cumulative ones. I believe that these are the ‘cumulative’ ones (not as for Wilcoxon signed rank test), because if not, the signed nDkl curve of rat 3 would fluctuate a lot across the x-axis.

      The reviewer is correct in both cases. However, as already mentioned, we have removed the analysis involving the Wilcoxon test. The description of the nDkl already specifies that this was done using the cumulative rates.

      (4) Supplemental table ‘nDkl Acquisition Table.xlsx’ 3rd column (“Earliest”) descriptions are unclear.

      (a) It is described in the supplemental ‘Explanation of Excel Tables.docx’ as the ‘earliest estimate of the onset of a poke rate during the CSs higher than the contextual poke rate’, while the last paragraph of the manuscript’s method section says ‘Columns 4, 5 and 6 of the table give the trial after which conditioned responding appeared as estimated in the above described three different ways— by the location of the minimum in the nDkl, the last upward 0 crossings, and the CS parse consistently greater than the ITI parse, respectively. Column 3 in that table gives the minimum of the three estimates.’ I plotted the data from column 3 (right) and comparing them with Fig 3A (left) makes it clear that there’s an issue in this column. If the description in the ‘Explanation of Excel Tables.docx’ is incorrect, please update it.

      We agree that the naming of these criteria can cause confusion, hence we have changed them. On page 9 we have replaced “earliest” with “first” in describing the criterion plotted in Figure 2A showing the trial starting from which the cumulative CS response rate permanently exceeded the cumulative overall rate. What is labelled as “Earliest” in “Acquisition_Table.xlsx” is, as the explanation says, the minimum value across the 3 estimates in that table.

      (b) Also, the term ‘contextual poke rate’ in the 3rd column’s description isconfusing as in the nDkl calculation it represents the poke rate during all the training time, while in the first paragraph of the ‘Data analysis’ part, the earliest evidence is calculated by comparing the ITI (pre-CS baseline) poke rate.

      Yes, we have kept the term “contextual” response rate to refer to responding across the whole training interval (the ITI and the CS duration). This is used in calculation of the nDkl. For consistency with this comparison, we now take the first estimate of acquisition (in Fig 2A) based on a comparison between the CS rate and the overall (context) rate (not the pre-CS rate).

      Reviewer #2 (Recommendations for the authors):

      In response to the Rebuttal comments:

      Analytical (1) relating to Figure 3C/D

      This is a reasonable set of alternative analyses, but it is not clear that it answers the original comment regarding why the fit was worse when using a theoretically derived measure. Indeed, Figure 3C now looks distinctly different to the original Gibbon and Balsam data in terms of the shape of the relationship (specifically, the Group Median - filled orange circles) diverge from the black regression line.

      As mentioned in response to Reviewer 1, there was a mistake in Figure 3C of the revised manuscript. The figure was actually plotting data using a more stringent criterion of nDkl > 5.4, corresponding to p<0.001. The figure was referencing the data in column J of the public Trials2Acquisition Table. The data previously plotted in Figure 3C are no longer plotted because we no longer attempt to identify a criterion exactly matching that used by Gibbon and Balsam.

      We agree that the data shown in the first 3 panels of Figure 2 do diverge somewhat from the black regression line at the highest levels of informativeness (C/T ratios > 70), and the regression lines fitted to the data have slopes greater than -1. We acknowledge this on page 14 of the revised manuscript. Since Gibbon and Balsam did not report data from groups with such high ratios, we can’t know whether their data too would have diverged from the regression line at this point. We now report in the text a regression fitted to the first 10 groups in our experiment, which have C/T ratios that coincide with those of Gibbon and Balsam, and those regression lines do have slopes much closer to -1 (and include -1 in the 95% confidence intervals). We believe the divergence in our data at the high C/T ratios may be due to the fact that our rats were not given magazine training before commencing training with the CS and food. Because of this, it is quite likely that many rats did not find the food immediately after delivery on the first few trials. Indeed, in subsequent experiments, when we have continued to record magazine entries after CS-offset, we have found that rats can take 90 s or more to enter the magazine after the first pellet delivery. This delay would substantially increase the effective CS-US interval, measured from CS onset to discovery of the food pellet by the rat, making the CS much less informative over those trials. We now make this point on pages 14-15 of the revised manuscript.

      Analytical (2)

      We may have very different views on the statistical and scientific approaches here.

      This scalar relationship may only be uniquely applicable to the specific parameters of an experiment where CS and US responding are measured with the same behavioral response (magazine entry). As such, statements regarding the simplicity of the number of parameters in the model may simply reflect the niche experimental conditions required to generate data to fit the original hypotheses.

      To the extent that our data are consistent with the data reported decades ago by Gibbon and Balsam indicates the scalar relationship they identified is not unique to certain niche conditions since those special conditions must be true of both the acquisition of sign-tracking responses in pigeons and magazine entry responses in rats. How broadly it applies will require further experimental work using different paradigms and different species to assess how the rate of acquisition is affected across a wide range of informativeness, just as we have done here.

    1. Reviewer #3 (Public review):

      Summary:

      The authors aimed to overcome the challenges associated with complex, conventional prokaryotic cell-free protein synthesis (CFPS) systems, which require up to thirty-five components, by developing a streamlined and efficient E. coli CFPS platform to encourage broader adoption. The main objective was to reduce the number of reaction components from thirty-five to seven, while also developing an accessible 'fast lysate' preparation protocol that eliminates time-consuming runoff and dialysis steps. The authors also sought to demonstrate the robustness and translational quality of this streamlined system by efficiently synthesising challenging functional proteins, including the cytotoxic restriction endonuclease BsaI and the self-assembling intermediate filament protein vimentin.

      Strengths:

      This study presents several key strengths of the optimised E. coli cell-free protein synthesis system in terms of its design, performance and accessibility.

      (1) The reaction mixture has been dramatically simplified, with the number of essential core components successfully reduced from up to thirty-five in conventional systems to just seven.

      (2) The "fast lysate" protocol is a significant advance in terms of procedure.

      (3) The system's ability to synthesise challenging, functional proteins is evidence of its robustness.

      Weaknesses:

      (1) Title: "A simplified and highly efficient cell-free protein synthesis system for prokaryotes".

      (a) This title is misleading since one would expect a simplified and highly efficient cell-free protein synthesis system to yield similar protein levels compared to current cell-free protein synthesis systems. What this study shows is that the composition of cell-free protein synthesis systems can be simplified while maintaining a certain level of protein synthesis. Here, optimisation does not involve maintaining protein synthesis yield while simplifying the cell-free protein synthesis system; rather, it involves developing a simplified cell-free protein synthesis system. As mentioned in my comments below, this study lacks a comparison of protein levels with a typical cell-free protein synthesis system.

      (b) What do the authors mean by "highly efficient"? Highly efficient compared to what experimental conditions? If one is interested in the yield of protein synthesis, is this simplified system highly efficient compared to current systems?

      (2) Figures 1, 3-5 :

      (a) What do relative luciferase units represent? How are these units calculated?

      (b) In this system, the level of expression depends mainly on the level of NLuc transcripts and the efficiency of NLuc translation. How did the authors ensure that the chemical composition of the different eCFPS buffers only affected protein translation and not transcript levels? In other words, are luciferase units solely an indicator of protein synthesis efficiency, or do they also depend on transcription efficiency, which could vary depending on the experimental conditions?

      (c) How long were the eCFPS reactions allowed to proceed before performing the luciferase activity measurement? Depending on the reaction time, the absence or presence of certain compounds may or may not impact NLuc expression. For example, it can be assumed that tRNA does not significantly affect NLuc levels over a short period of time, and that endogenous tRNA in the lysate is present at sufficient concentrations. However, over a longer period of time, the addition of tRNA could be essential to achieve optimal NLuc levels.

      (d) The authors show that tRNA and amino acids are not strictly essential for the expression of NLuc, likely due to residual amounts within the cell lysate. However, are the protein levels achieved without added amino acids and tRNA sufficient for biochemical assays that require a certain amount of protein? It is important to note that the focus here is on optimising the simplicity of the buffer rather than the level of protein expression. In fact, the simplicity of the buffer is prioritised over the amount of protein produced. This should be made clear.

      (e) How would the NLuc level compare if all the components were optimised individually and present in an optimised buffer, compared to a buffer optimised for simplicity as described by the authors?

      (3) Line 71, Streamlining eCFPS: removal of dispensable components. This title is misleading because it creates the false impression that proteins can be produced in vitro without the addition of certain compounds. While this is true, the level of protein produced may not be sufficient for subsequent biochemical analyses. This should be made clear.

      (4) Figure 2: In the legend, "(A) Protein expression levels of the eCFPS system measured at varying concentrations of KGlu and MgGlu2" would be more accurate if changed to "(A) Protein expression levels of the eCFPS system using an Nanoluciferase (NLuc) reporter DNA measured at varying concentrations of KGlu and MgGlu2".

      (5) Lanes 302-303: "The thorough optimization of the seven core components was a critical step in achieving high protein expression levels". What are "high expression levels"? Compared to what?

    2. Author response:

      Thank you for overseeing the review of our manuscript and for providing the eLife Assessment and Public Reviews. We are highly appreciative of the detailed, constructive feedback from the editors and reviewers.

      We acknowledge the core issues raised and we are committed to undertaking the necessary experiments and textual revisions to address every critique.

      Here is a summary of the key revisions we plan to undertake to address the major points raised:

      (1) Absolute yield comparison and efficiency clarification (eLife Assessment, R#3)

      We will perform new quantitative experiments to provide the absolute protein yield of our optimized eCFPS system and benchmark it against a published, widely recognized high-yield CFPS protocol. This will directly address the central requirement for industry comparison and strengthen the claim of "high efficiency." Furthermore, we will revise the manuscript's terminology, especially in the title and abstract, to accurately reflect the system's success in "streamlining" and "robustness" in addition to performance.

      (2) Mechanistic rationale for simplification (eLife Assessment, R#1)

      We will substantially expand the Discussion to provide a mechanistic explanation for why activity is maintained after removing up to 28 components. This analysis will focus on the retention of endogenous metabolic enzymes and residual factors within the "Fast Lysate," citing relevant literature (e.g., Yokoyama et al., 2010, as suggested by R#1) to support the role of metabolic pathways in compensating for the lack of exogenous tRNA, CTP/UTP, and specific amino acids.

      (3) Transcription-translation coupling (R#3)

      To address the concern that expression changes might be due to transcription rather than translation efficiency, we will perform control experiments to monitor mRNA levels under key optimized conditions. This will help confirm that the observed efficiency changes are primarily attributable to translation.

      (4) Data presentation and completeness (R#2)

      We will revise the presentation of data in figures (e.g., Figure 2) to use appropriate graph types for discrete data and ensure all units, incubation times, and conditions are clearly and consistently specified. Furthermore, we will add a paragraph to the Discussion addressing the study's limitations, specifically the potential implications of DTT removal for certain protein types.

      We are confident that these planned revisions will address the reviewers' recommendations and result in a stronger manuscript.

    1. Reviewer #1 (Public review):

      Summary:

      The authors have created a new model of KCNC1-related DEE in which a pathogenic patient variant (A421V) is knocked into mouse in order to better understand the mechanisms through which KCNC1 variants lead to DEE.

      Strengths:

      (1) The creation of a new DEE model of KCNC1 dysfunction.

      (2) InVivo phenotyping demonstrates key features of the model such as early lethality and several types of electrographic seizures.

      (3) The ex vivo cellular electrophysiology is very strong and comprehensive including isolated patches to accurately measure K+ currents, paired recording to measure evoked synaptic transmission, and the measurement of membrane excitability at different timepoint and in two cell types.

      (4) 2P imaging relates the cellular dysfunction in PV neurons to epilepsy.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):           

      Summary:

      The authors have created a new model of KCNC1-related DEE in which a pathogenic patient variant (A421V) is knocked into a mouse in order to better understand the mechanisms through which KCNC1 variants lead to DEE.  

      Strengths:

      (1)  The creation of a new DEE model of KCNC1 dysfunction. 

      (2)  In Vivo phenotyping demonstrates key features of the model such as early lethality and several types of electrographic seizures. 

      (3)  The ex vivo cellular electrophysiology is very strong and comprehensive including isolated patches to accurately measure K+ currents, paired recording to measure evoked synaptic transmission, and the measurement of membrane excitability at different time points and in two cell types.

      We thank Reviewer 1 for these positive comments related to strengths of the study.   

      Weaknesses:

      (1) The assertion that membrane trafficking is impaired by this variant could be bolstered by additional data.

      We agree with this comment. However, given the technical challenges of standard biochemical experiments for investigating voltage-gated potassium channels (e.g., antibody quality), the lack of a Kv3.1-A421V specific antibody, and the fact that Kv3.1 is expressed in only a small subset of cells, we did not undertake this approach. However, we did perform additional experiments and analysis to improve the rigor of the experiments supporting our conclusion that membrane trafficking is impaired in the Kcnc1-A421V/+ mouse. 

      Such experiments support a highly significant and robust difference in our (albeit imperfect) measurement of the membrane:cytosol ratio of Kv3.1 immunofluorescence between WT and Kcnc1-A421V/+ mice, which is consistent with lack of membrane trafficking (Figure 3). In the revised manuscript, we have added additional data points to this plot and updated the representative example images using improved imaging techniques to better showcase how Kcnc1-A421V/+ PV-INs differ from age-matched WT littermate controls. We think the result is quite clear. Future biochemical experiments perhaps best performed in a culture system in vitro could provide additional support for this conclusion.

      (2) In some experiments details such as the age of the mice or cortical layer are emphasized, but in others, these details are omitted.

      We apologize for this omission. We have now clarified the age of the mice and cortical layer for each experiment in the Methods and Results sections as well as figure legends.   

      (3) The impairments in PV neuron AP firing are quite large. This could be expected to lead to changes in PV neuron activity outside of the hypersynchronous discharges that could be detected in the 2-photon imaging experiments, however, a lack of an effect on PV neuron activity is only loosely alluded to in the text. A more formal analysis is lacking. An important question in trying to understand mechanisms underlying channelopathies like KCNC1 is how changes in membrane excitability recorded at the whole cell level manifest during ongoing activity in vivo. Thus, the significance of this work would be greatly improved if it could address this question.

      Yes, the impairments in the neocortical PV-IN excitability are notably severe relative to other PV interneuronopathies that we and others have directly investigated (e.g., Kv3.1 or Kv3.2-/- knockout mice; Scn1a+/- mice). In the revised version of the manuscript, we have now added a more thorough in vivo 2P calcium imaging investigation and analysis of our in vivo 2P calcium imaging data of PV-IN (and presumptive excitatory cell) neural activity (Figure 8 and Supplementary Figure 9, Methods- lines 230-271 Results- lines 630-657, and Discussion lines- 795-814). 

      Because of the prominent recruitment of neuropil during presumptive myoclonic seizures, further investigation of individual neuronal excitability in vivo required a slightly different labeling strategy now using a soma-tagged GCaMP8m as well as a separate AAV containing tdTomato driven by the PV-IN-specific S5E2 enhancer. Our new results reveal an increase in the baseline calcium transient frequency in non-PV-INs, and reduced mean transient amplitudes in both non-PV cells and PV-INs. These interesting findings, which are consistent with attenuated PV-IN-mediated perisomatic inhibition leading to disinhibited excitatory cells in the Kcnc1-A421V/+ mice, link our in vivo results to the slice electrophysiology experiments. Of course, there are residual issues with the application of this technique to interneurons and the ability to resolve individual or small numbers of spikes, which likely explains the lack of genotype difference in calcium transient frequency in PV-INs.

      (4) Myoclonic jerks and other types of more subtle epileptiform activity have been observed in control mice, but there is no mention of littermate control analyzed by EEG. 

      We performed additional experiments as requested and did not observe myoclonic jerks or any other epileptic activity in WT control mice. We have included this data in the revised manuscript (Figure 9C).   

      Reviewer #2 (Public review):           

      Summary:

      Wengert et al. generated and thoroughly characterized the developmental epileptic encephalopathy phenotype of Kcnc1A421V/+ knock-in mice. The Kcnc1 gene encodes the Kv3.1 channel subunit. Analogous to the role of BK channels in excitatory neurons, Kv3 channels are important for the recurrent high-frequency discharge in interneurons by accelerating the downward hyperpolarization of the individual action potential. Various Kcnc1 mutations are associated with developmental epileptic encephalopathy, but the effect of a recurrent A421V mutation was somewhat controversial and its influence on neuronal excitability has not been fully established. In order to determine the neurological deficits and underlying disease mechanisms, the authors generated cre-dependent KI mice and characterized them using neonatal neurological examination, high-quality in vitro electrophysiology, and in vivo imaging/electrophysiology analyses. These analyses revealed excitability defects in the PV+ inhibitory neurons associated with the emergence of epilepsy and premature death. Overall, the experimental data convincingly support the conclusion.

      Strengths:

      The study is well-designed and conducted at high quality. The use of the Cre-dependent KI mouse is effective for maintaining the mutant mouse line with premature death phenotype, and may also minimize the drift of phenotypes which can occur due to the use of mutant mice with minor phenotype for breeding. The neonatal behavior analysis is thoroughly conducted, and the in vitro electrophysiology studies are of high quality.

      We appreciate these positive comments from Reviewer 2. 

      Weaknesses:

      While not critically influencing the conclusion of the study, there are several concerns.

      In some experiments, the age of the animal in each experiment is not clearly stated. For example, the experiments in Figure 2 demonstrate impaired K+ conductance and membrane localization, but it is not clear whether they correlated with the excitability and synaptic defects shown in subsequent figures. Similarly, it is unclear how old mice the authors conducted EEG recordings, and whether non-epileptic mice are younger than those with seizures. 

      We have now updated the manuscript to include clear report of age for all experiments including the impaired K<sup>+</sup> conductance (now Figure 3) and EEG (now Figure 9). There was no intention to omit this information. The recordings of K<sup>+</sup> conductance impairments in PV-INs from Kcnc1-A421V/+ mice were completed at P1621. Thus, we interpret the loss of potassium current density to be causally linked with the impairments in intrinsic physiological function at that same time-period in neocortical layer II-IV PV-INs and more subtly in PV-positive cells in the RTN and neocortical layer V PVINs.

      Mice used in the EEG experiments were P24-48, an age range which roughly corresponded with the midpoint on the survival curve for Kcnc1-A421V/+ mice. Although we saw significant mouse-to-mouse variability in seizure phenotype, no Kcnc1-A421V/+ mice completely lacked epilepsy or marked epileptiform abnormalities, neither of which were seen in WT mice. We did not detect a clear relationship between seizure frequency/type and mouse age. 

      The trafficking defect of mutant Kv3.1 proposed in this study is based only on the fluorescence density analysis which showed a minor change in membrane/cytosol ratio. It is not very clear how the membrane component was determined (any control staining?). In addition to fluorescence imaging, an addition of biochemical analysis will make the conclusion more convincing (while it might be challenging if the Kv3.1 is expressed only in PV+ cells).

      This relates to comment 3 of Reviewer 1. We agree that, in the initial submission of the manuscript, the evidence from IHC for Kv3.1 trafficking deficits was somewhat subtle. In the revised version of the paper, we have gathered additional replicates of this original experiment with improved imaging quality and clarify how the membrane component was specified, to now show a robust and highly significant (***P<0.001) decrease in membrane:cytosol Kv3.1 ratio. We have also now provided new example images better showcasing the deficits observed in the Kcnc1-A421V/+ mice (Figure 3). The membrane compartment was defined as the outermost 1 micron of the parvalbumin-defined cell soma (drawn blind to the Kv3.1b signal), and, importantly, all analysis was conducted blinded to mouse genotype. These measures help to ensure that the result is robust and unbiased. Nonetheless, we have added a paragraph in the Discussion section highlighting the limitations of our IHC evidence for trafficking impairment (Lines 868-883). 

      While the study focused on the superficial layer because Kv3.1 is the major channel subunit, the PV+ cells in the deeper cortical layer also express Kv3.1 (Chow et al., 1999) and they may also contribute to the hyperexcitable phenotype via negative effect on Kv3.2; the mutant Kv3.1 may also block membrane trafficking of Kv3.1/Kv3.2 heteromers in the deeper layer PV cells and reduce their excitability. Such an additional effect on Kv3.2, if present, may explain why the heterozygous A421V KI mouse shows a more severe phenotype than the Kv3.1 KO mouse (and why they are more similar to Kv3.2 KO). Analyzing the membrane excitability differences in the deep-layer PV cells may address this possibility.

      We appreciate this thoughtful suggestion. We have now provided data from neocortical layer V PV interneurons in the revised manuscript (Supplementary Figure 5). Abnormalities in intrinsic excitability from neocortical layer V PV-INs in Kcnc1A421V/+ mice were present, but less pronounced than in PV-INs from more superficial cortical layers. These results are consistent with the view that greater relative expression of Kv3.2 “dilutes” the impact of the Kv3.1 A421V/+ variant. More specific determination of whether the A421V/+ variant impairs membrane trafficking and/or gating of Kv3.2 remains unclear. 

      We attempted to assess how the mutant Kv3.1 affects Kv3.2 localization, but were unsuccessful due to the lack of reliable antibodies. After immunostaining mouse brain sections with two different anti-Kv3.2 antibodies, only one produced somewhat promising signal (see below). However, even in this case, Kv3.2 staining was successful only once (out of five independent staining experiments) and the signal varied across cortical regions, showing widespread cellular Kv3.2 signal in some areas (b, top panel), and barely detectable signal in others, regardless of Kv3.1 expression. In the remaining four attempts, we detected only ‘fiber-like’ immunostaining signal, further diminishing our confidence in anti-Kv3.2 antibody, although results could be improved with still further testing and refinement which we will attempt. Consequently, this important question remains unsolved in this study. 

      Author response image 1.

      Immunostaining of Kv3.1 and Kv3.2 in sagittal mouse brain sections. a) An example of intracellular Kv3.2 immunostaining signal, variable across the cortex of a WT mice independent of Kv3.1 expression b) Kv3.2 is detectable intracellularly in most of the cells in the top panel but barely detectable in the lowest panel. c) Representative image of Kv3.2 immunostaining signal in other sagittal mouse brain sections.

      We have discussed these important implications and limitations of our results in the Discussion (Lines 868-883). We agree with the Reviewer’s interpretation that an impact on Kv3.1/Kv3.2 heteromultimers across the neocortex may explain why the Kcnc1A421V/+ mouse exhibits a more severe phenotype than Kv3.1-/- or Kv3.2-/- mice (see below), a view which we have attempted to further clarify in the Conclusion.    

      In Table 1, the A421V PV+ cells show a depolarized resting membrane potential than WT by ~5 mV which seems a robust change and would influence the circuit excitability. The authors measured firing frequency after adjusting the membrane voltage to -65mV, but are the excitability differences less significant if the resting potential is not adjusted? It is also interesting that such a membrane potential difference is not detected in young adult mice (Table 2). This loss of potential compensation may be important for developmental changes in the circuit excitability. These issues can be more explicitly discussed.

      We do not entirely understand this finding and its apparent developmental component. It could be compensatory, as suggested by the Reviewer; however, it is transient and seems to be an isolated finding (i.e., it is not accompanied by compensation in other properties). It is also possible that this change in Kcnc1-A421V/+ PV-INs may reflect impaired/delayed development. We cannot test excitability at a meaningfully later time point as the mice are deceased.

      The revised version of the manuscript contains additional data (Supplementary Figure 4) showing that major deficits in intrinsic excitability are still observed even when the resting membrane potential is left unadjusted. These results are further discussed in the Results section (lines 522-523) and the Discussion section (lines 727-731).   

      Reviewer #3 (Public review):           

      Summary:

      Here Wengert et al., establish a rodent model of KCNC1 (Kv3.1) epilepsy by introducing the A421V mutation. The authors perform video-EEG, slice electrophysiology, and in vivo 2P imaging of calcium activity to establish disease mechanisms involving impairment in the excitability of fast-spiking parvalbumin (PV) interneurons in the cortex and thalamic PV cells.

      Outside-out nucleated patch recordings were used to evaluate the biophysical consequence of the A421V mutation on potassium currents and showed a clear reduction in potassium currents. Similarly, action potential generation in cortical PV interneurons was severely reduced. Given that both potassium currents and action potential generation were found to be unaffected in excitatory pyramidal cells in the cortex the authors propose that loss of inhibition leads to hyperexcitability and seizure susceptibility in a mechanism similar to that of Dravet Syndrome.  

      Strengths: 

      This manuscript establishes a new rodent model of KCNC1-developmental and epileptic encephalopathy. The manuscript provides strong evidence that parvabumin-type interneurons are impaired by the A421V Kv3.1 mutation and that cortical excitatory neurons are not impaired. Together these findings support the conclusion that seizure phenotypes are caused by reduced cortical inhibition.

      We thank Reviewer 3 for their view of the strengths of the study.

      Weaknesses:

      The manuscript identifies a partial mechanism of disease that leaves several aspects unresolved including the possible role of the observed impairments in thalamic neurons in the seizure mechanism. Similarly, while the authors identify a reduction in potassium currents and a reduction in PV cell surface expression of Kv3.1 it is not clear why these impairments would lead to a more severe disease phenotype than other loss-of-function mutations which have been characterized previously. Lastly, additional analysis of videoEEG data would be helpful for interpreting the extent of the seizure burden and the nature of the seizure types caused by the mutation.

      We agree with this comment(s) from Reviewer 3. We studied neurons in the reticular thalamus and layer V neocortical PV-INs since they are also linked to epilepsy pathogenesis and are known to express Kv3.1. However, for most of the study, we focused on neocortical layer II-IV PV-INs, because these cells exhibited the most robust impairments in intrinsic excitability. Cross of our novel Kcnc1-Flox(A421V)/+ mice to a cerebral cortex interneuron-specific driver that would avoid recombination in the thalamus, such as Ppp1r2-Cre (RRID:IMSR_JAX:012686), could assist in determining the relative contribution of thalamic reticular nucleus dysfunction to overall phenotype as used by (Makinson et al., 2017) to address a similar question; however, we have been unable to obtain this mouse despite extensive effort. There are of course other Kv3.1expressing neurons in the brain, including in the hippocampus, amygdala, and cerebellum, and we have provided additional discussion (Lines 731-736) of this issue.

      We further agree with the Reviewer that a major question in the field of KCNC1-related neurological disorders is the mechanistic underpinning of why the KCNC1-A421V variant leads to a more severe disease phenotype than other loss of function KCNC1 variants, and, further, why the mouse phenotype is more severe than the Kcnc1 knockout. Previous results and our own recordings in heterologous systems suggest that the A421V variant is more profoundly loss of function than the R320H variant (Oliver et al., 2017; Cameron et al., 2019; Park et al., 2019), which is consistent with A421V having a more severe disease phenotype. Relative to knockout of Kv3.1, our results are consistent with the view that the A421V exhibits dominant negative activity by reducing surface expression of Kv3.1 and/or Kv3.2 (an effect that would not occur in knockout mice), with a possible additional contribution of impairing gating of those Kv3.1-A421V variant containing Kv3.1/Kv3.2 heteromultimers by inclusion of A421V subunits into the heterotetramer. Our finding that the magnitude of total potassium current was reduced in PV-INs by ~50% is consistent with a combination of these various mechanisms but does not distinguish between them.

      In the revised version of the manuscript, we have provided a more complete discussion of these important remaining questions regarding our interpretation of how the severity of KCNC1 disorders relates to the biophysical features of the ion channel variant (lines 868883).

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):          

      Major

      (1) The authors suggest that the reduced K+ current density in Kcnc1-A421V/+ neurons is due in part to impaired trafficking and cell surface expression of Kv3.1 in these neurons. The data supporting this claim aren't completely convincing. First, it's difficult to visualize a difference in Kv3.1 localization in the images shown in panel H, and importantly, it seems problematic that the method to assess Kv3.1 levels in membrane vs. cytosol relied on using PV co-staining to define the membrane compartment as the outermost 1 um of the PV-defined cell soma. This doesn't seem to be the best method to define the membrane compartment, as the PV signal should be largely cytosolic.

      As noted above, we have completed additional data collection to confirm our results, and have performed additional imaging and updated our example images to be more representative of the observed deficits in membrane Kv3.1 expression in the Kcnc1-A421V/+ mice. We attempted to identify a marker to more clearly label the membrane to combine with PV immunocytochemistry but were unable to do so despite some effort. 

      Is it possible that in control neurons, the cytosolic PV signal localizes within the membrane-bound Kv3.1 signal, with less colocalization, whereas in Kcnc1-A421V/+ neurons, there would be more colocalization of the cytosolic PV and improperly trafficked Kv3.1.? Could the data be presented in this way showing altered colocalization of Kv3.1 with PV?

      We do not entirely understand the nature of this concern. In our experiments, we utilized the PV signal to determine the cell membrane and cytosolic compartments in an unbiased manner using a 1-micron shell traced around/outside the edge of the PV signal to define the membrane compartment, with the remainder of the area (minus the nuclear signal defined by DAPI) defined as the cytosol (see Methods 176-186). Because we did not identify any alterations in PV signal or correlation between PV immunohistochemistry and tdTomato expression in Cre reporter strains between WT and Kcnc1-A421V/+ mice, we believe that our strategy for determining membrane:cytosol ratio of Kv3.1 in an unbiased manner is acceptable (albeit of course imperfect). 

      Alternatively, membrane fractionation could be performed on WT vs Kcnc1-A421V/+ neurons, followed by Western blotting with a Kv3.1 antibody to show altered proportions in the cytosolic vs. membrane protein fractions. It's important that these results are convincing, as the findings are mentioned in the Abstract, the Results section, and multiple times in the Discussion, although it is still unclear how much the potential altered trafficking contributes to the decrease in K+ currents versus changes in channel gating.

      Multiple technical barriers made it difficult for us to gain direct biochemical evidence for altered trafficking of the A421V/+ Kv3.1 variant (see above). It is not clear how membrane fractionation techniques could be easily applied in this case (at least by us) when PV-INs constitute 3-5% of all neocortical neurons. We further agree (as noted above) that it is difficult to properly disentangle the relative roles of impaired membrane trafficking vs. gating deficits to the observed effect; however, we think that both phenomena are likely occurring. In the revised version of the manuscript, we have more explicitly discussed these limitations in the Discussion section (Lines 868-883).   

      (2) More information is needed regarding the age of mice used for experiments for the following results (added to the Results section as well as figure legends):

      PV density (Supplementary Figure 1) 

      K+ current data (Figure 2A-G)       

      Kv3.1 localization (Figure 2H and I)        

      RTN electrophysiology (Supplementary Figure 3)

      Excitatory neuron electrophysiology (Figure 4)             

      In vivo 2P calcium imaging (Figure 7) 

      Video-EEG (Figure 8)

      We apologize for omitting this critical information. In the revised manuscript, we have provided the age of mice for each of our experiments in the results section, in the figure legend, and in the methods section.   

      (3) It's unclear why developmental milestones/behavioral assessments were only done at P5-P10. In the previous publication of another Kcnc1 LOF variant (Feng et al. 2024), no differences were found at P5-P10, and it was suggested in the discussion that this finding was "consistent with the known developmental expression pattern of Kv3.1 in mouse, where Kv3.1 protein does not appear until P10 or later". In that paper, they did find behavioral deficits at 2-4 months. Even though this model is more severe than the previous model, it would be interesting to determine if there are any behavioral deficits at a later time point (especially as they find more neurophysiological impairments at P32P42).

      As in our previous study, the lack of clear behavioral deficits in developmental milestones from P5-15 is potentially expected considering the developmental expression of Kv3.1, and we performed these experiments primarily to showcase that the Kcnc1-A421V/+ mice exhibit otherwise normal overall early development (although this could be an artifact of the sensitivity of our testing methods).

      For the revised manuscript, we have conducted additional experiments to investigate behavioral deficits in adult Kcnc1-A421V/+ mice. We found cognitive/learning deficits in both Kcnc1-A421V/+ mice relative to WT in both the Barnes maze (Figure 2A-C) and Ymaze (Figure 2D-F). Other aspects of animal behavior including cerebellar-related motor function are likely also impaired at post-weaning timepoints, and will be included in a forthcoming research study focusing on the motor function in these mice.  

      (4) In the Results section, it should be more clearly stated which cortical layer/layers are being studied. In some cases, it mentions layers 2-4, and in some, only layer 4, and in others, it doesn't mention layers at all. Toward the beginning of the Results section, the rationale for focusing on layers 2-4 to assess the effects of this variant should be well described and then, for each experiment, it should be stated which cortical layers were assessed. Related to this point, it seems electrophysiology was only done in layer 4; the rationale for this should also be included.

      We have now clarified which neocortical layers were under investigation in the study. All PV-INs were targeted in somatosensory layers II-IV, while excitatory neurons were either cortical layer IV spiny stellate cells or pyramidal cells. Paired recordings were also completed in layer IV. We have also more explicitly articulated our rationale for looking at PV-INs in layers II-IV to examine the cellular/circuitlevel impact of Kv3.1 in a model of developmental and epileptic encephalopathy (Lines 487-491). 

      (5) Kcnc1-A421V/+ PV neurons showed more robust impairments in AP shape and firing at P32-42 than at P16-21 (Figure 3), and only showed synaptic neurotransmission alterations at P32-42 (Figure 6). Thus, it's unclear why Kcnc1-A421V/+ excitatory neurons were only assessed at P16-21 (Figure 4 and Supplementary Figure 4 related to Figure 5), particularly if only secondary or indirect effects on this population would be expected.

      We appreciate this excellent point raised by the Reviewer and we have taken the suggestion to examine excitatory neurons at P32-42 in addition to the earlier juvenile timepoint. Our new results from the later timepoint are similar to our results at P16-21: Excitatory neurons show no statistically significant impairments in intrinsic excitability at either of the two timepoints examined (Supplementary Figure 7). This adds support to our original conclusion that PV-INs represent the major driver of disease pathology across development.   

      (6) The 2P calcium imaging experiments are potentially interesting, however, a relationship between these results and the electrophysiology results for PV neurons is lacking. Was there an attempt to assess the frequency and/or amplitude of calcium events specifically in PV neurons, outside of the hypersynchronous discharges, to determine whether there are differences between WT and Kcnc1-A421V/+, as was seen in the electrophysiological analyses? It does seem there are some key differences between the two experiments (age: later timepoint for 2P vs. P16-21 and P32-42, layer: 2/3 vs. 4, and PV marking method: virus vs. mouse line), but the electrophysiological differences reported were quite strong. Thus, it would be surprising if there were no alterations in calcium activity among the Kcnc1-A421V/+ PV neurons.

      In our initial experiments, the prominent neuropil GCaMP signal in Kcnc1-A421V/+ mice rendered it difficult to distinguish and accurately describe baseline neuronal excitability in PV-INs and non-PV cells. In our revised manuscript, we utilized a soma-tagged GCaMP8m and separately labeled PV-INs through S5E2-tdTomato. This strategy made it possible to assess the amplitude and frequency of calcium transients in both PV-positive and PV-negative cells in vivo. We have updated the description of our methods (lines 230-271) and our results (lines 630-657) in the revised manuscript.

      As noted above, our more detailed analysis of somatic calcium transients in PV-IN and non-PV cells during quiet rest (Figure 8 and Supplementary Figure 9) shows that PV-INs from Kcnc1-A421V/+ mice are abnormally excitable- having reduced transient amplitude relative to WT controls. Interestingly, non-PV cells also exhibited an increased calcium transient frequency and reduced amplitude which is potentially consistent with reduced perisomatic inhibition causing disinhibition in cortical microcircuits. We again highlight that the slow kinetics of GCaMP combined with the calcium buffering and brief spikes of PVINs render quantification of action potential frequency and comparisons between groups difficult.  

      (7) As mentioned above, it would be helpful to state the time points or age ranges of these experiments to better understand the results and relate them to each other. For example, the 2P imaging showed apparent myoclonic seizures in 7/7 Kcnc1-A421V/+ mice (recorded for a total of 30-50 minutes/mouse), but the video-EEG showed myoclonic seizures in only 3/11 Kcnc1-A421V/+ mice (recorded for 48-72 hours/mouse). Were these experiments done at very different age ranges, so this difference could be due to some sort of progression of seizure types and events as the mice age? Is it possible these are not the same seizure types (even though they are similarly described)? This discrepancy should be discussed.

      Mice in the EEG experiments were between the ages of P24 and 48, slightly younger than the age in which we carried out the in vivo calcium imaging experiments (>P50). Therefore, an age-related exacerbation in myoclonic jerks is possible. 

      As is highlighted by the Reviewer, it is interesting that the myoclonic seizures were only detected in a portion of the Kcnc1-A421V/+ mice during EEG monitoring (4/12). We believe that the difference is most likely driven by more sensitive detection of the myoclonic jerk activity and behavior in the 2P imaging of neuropil cellular activity compared to our video-EEG monitoring and 2P imaging of soma-tagged GCaMP. We have occasionally observed repetitive myoclonic jerking in mice that appears highly localized (i.e. one forepaw only) suggesting that the myoclonic seizures exist on a spectra of severity from focal to diffuse. It is therefore possible that myoclonic events and electrographic activity may be slightly underestimated in our video-EEG experiments? 

      We have now added a few lines discussing this discrepancy in the Discussion (lines 809814).   

      (8) Myoclonic jerks and other types of more subtle epileptiform activity have been observed in control mice. Was video-EEG performed on control mice? These data should be added to Figure 8.

      We have added recordings in control WT mice (N=4). We did not detect myoclonic jerks or other epileptiform activity in the control mice (Figure 9).  

      Minor

      (1) In the first Results section, Line 365, the P value (P<0.001) is different from that in the legend for Figure 1, line 743 (P<0.0001).

      We have fixed this discrepancy. 

      (2) For Supplementary Figure 1, it would be helpful to show images that span the cortical layers (1-6), as PV and Kv3.1 are both expressed across the cortical layers.

      We have updated Supplementary Figure 1 with better example images that span the cortical layers.    

      (3) Error bars should be added to the line graphs in Supplementary Figure 2, particularly panels B and C. Some of the differences appear small considering the highly significant p-values (i.e. body weight at P7 and brain weight at P21).

      The values shown in Supplementary Figure 2D-E are percentages of mice displaying a particular characteristic, so there is no variance for the data.

      Supplementary Figure 2B-C actually do contain error bars plotted as SEM, however, because of the large number of N and small degree of variance in the measurements, the error bars are not apparent in the graphs. This has been noted in the Supplementary Figure 2 legend for clarity. 

      (4) In Figure 3, although the Kcnc1-A421V/+ neurons have elevated AP amplitudes relative to WT, the representative traces for P16-21 and P32-42 groups appear strikingly opposite (traces in B in G appear to have much higher amplitudes than those in C and H). As this is one of the three AP phenotypes described, it would be nice to have it reflected in the traces.

      We have updated our example traces to better represent our main findings including AP amplitude for both P16-21 and P32-42 timepoints.  

      (5) Were any effects on the AHP assessed in the electrophysiology experiments? As other studies have reported the effects of altered Kv3 channel activity on AHP, this parameter could be interesting to report as well.

      We have now provided data on the afterhyperpolarization for each condition displayed in the Supplementary data tables. Interestingly, we failed to detect significant differences in AHP between WT and Kcnc1-A421V/+ PV-INs, RTN neurons, or pyramidal cells, although we did identify differences in the dV/dt of the repolarization phase of the AP.   

      (6) The figure legend for Figure 7 has errors in the panel labeling (D instead of C, and two Fs).

      This error has been corrected in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      Specific comments and questions for the authors:         

      (1) Do the authors provide a reason for why the juvenile animals are unaffected by the A421V mutation? Is it that PV cells have not fully integrated at this early time point or that Kv3.1 expression is low? Is the developmental expression profile of Kv3.1 in PV cells known and if so could the authors update the discussion with this information?

      We interpret the normal early developmental milestones (P5-P15) to reflect that Kcnc1-A421V/+ mice exhibit the onset of their neurological impairment at the same time that PV-INs upregulate Kv3.1, develop a fast-spiking physiological phenotype, and integrate into functional circuits in the third and fourth postnatal weeks. We have updated the discussion (Line 780-782) with this information and more clearly describe our interpretation of these early-life behavioral experiments.   

      (2) I would like to see a more complete analysis of the Video-EEG data that is included in Figure 8. What was the seizure duration and frequency? Were there spike-wave seizure types observed? Were EEG events that involve thalamocortical circuitry affected such as spindles? Was sleep architecture impaired in the model? Were littermate control animals recorded?

      Although classical convulsive seizures represent only part of the overall epilepsy phenotype that this mouse exhibits, we agree that reporting seizure duration and frequency is important. We have now included this in our revised manuscript (line 624-626). We have also now added WT control mice to our dataset, and, as expected, we failed to observe any epileptic features in our WT recordings.

      In our EEG experiments, we did not record EMG activity in the mouse to allow for unambiguous determination of sleep vs. quiet wakefulness. For that reason, and because we believe it beyond the scope of this particular study, we did not examine sleep-related EEG phenomena such as spindles or sleep architecture. We have, however, added a line in the discussion (line 771-774) suggesting that future studies focus on a more thorough investigation of the EEG activity in these animals. 

      (3) The in vivo calcium imaging data shows synchronous bursts in A421V animals which is in agreement with the synchronous bursts observed in the EEG. Overall the analysis of the in vivo calcium imaging data appears to be rudimentary and perhaps this is a missed opportunity. What additional insights were gained from this technically demanding experiment that were not obtained from the EEG recordings?

      As noted above, in the revised version of the manuscript, we have conducted additional experiments which allowed us to separately examine PV-IN and non-PV neuron excitability via 2P in vivo calcium imaging. This required an alternative strategy to label individual neuronal somata without contamination by the robust neuropil signal that we observed in the approach undertaken in the original submission. We’ve described the details of this new approach in methods (Lines 230-271) and results section (lines 630-657).

      Our new results (Figure 8 and Supplementary Figure 9) reveal that, during quiet rest, neocortical PV-INs from Kcnc1-A421V/+ mice exhibit a reduction in calcium transient amplitude during quiet wakefulness and that non-PV cells exhibit altered transient frequency and amplitude. Overall, we believe that these results are consistent with the view that PV-IN-mediated perisomatic inhibition is compromised in Kcnc1-A421V/+ mice which leads to a downstream hyperexcitability in excitatory neurons within cortical microcircuits.  

      (4) The increased severity of seizure phenotypes observed in the A421V model relative to knockout mice is interesting but also confusing given what is known about this mutation. As the authors point out, a possible explanation is that the mutation is acting in a dominant negative manner, where mutant Kv3.1 channels compete with other Kvs that would otherwise be able to partially compensate for the loss of Kv function. Alternatively, the A421V mutation might act by affecting the trafficking of heterotetrameric Kv3 channels to the membrane. Can the authors clarify why a trafficking deficit would produce a different effect than a loss of function mutation? Are the authors proposing that a hypomorphic mutation involving both a partial trafficking deficit and a dominant negative effect of those channels that are properly localized is more severe than a "clean" loss of function? The roughly 50% loss of potassium current absent a change in gating would be expected to behave like a loss-of-function mutation. This might be addressed by comparing the surface expression of the other Kv channels and/or through the use of Kv3.1-selective pharmacology.

      These are excellent points raised by the Reviewer. As noted above, we have endeavored to clarify our hypothesis as to the basis of this phenomenon, although the mechanistic basis for the more severe phenotype in the Kcnc1-A421V/+ mouse relative to the Kv3.1 knockout is not entirely clear. Our physiology results and the evidence presented supporting a trafficking impairment, are consistent with dominant negative action of the Kv3.1 A421V variant at the level of channel gating and/or trafficking. To restate, we think the Kcnc1-A421V/+ heterozygous variant is more severe than a Kv3.1 knockout for (at least) three reasons: variant Kv3.1 is incorporated into Kv3.1/Kv3.2 heterotetramers to (1) impair trafficking to the membrane as well as (2) alter the electrophysiological function of those channels that do successfully traffic to the membrane (while Kv3.1 knockout affects Kv3.1 only), and (3) the heterozygous variant may escape compensatory upregulation of Kv3.2 and which is known to occur in Kv3.1 knockout mice.

      For example, our data suggests and is consistent with the view that heterotetramers of WT Kv3.1 and Kv3.2 potentially come together with the A421V Kv3.1 subunit in the endoplasmic reticulum and then fail to traffic to the membrane due to the presence of one or more A421V subunit(s), as evidenced by increased Kv3.1 staining in the cytosol in the Kcnc1-A421V/+ mouse relative to WT. This is in contrast to what would occur in the Kv3.1knockout mice as there is no subunit produced from the null allele to impair WT Kv3.2 subunits from forming fully functional Kv3.2 homotetramers to then reach the cell surface and function properly. This is one specific possible mechanism for dominant negative activity.

      A non-mutually-exclusive mechanism is that inclusion of one or more Kv3.1 A421V subunits into Kv3 heterotetramers impairs gating and prevents potassium flux such that, even if the tetramer does reach the membrane, that entire tetramer fails to contribute to the total potassium current. This is another possible mechanism for dominant negative function of the A421V subunit.

      Experimental elucidation of the precise mechanism of the dominant negative activity of the A421V Kcnc1 variant is beyond the scope of this study; yet, our lab is continuing to work on this. It will likely require dose-response experiments in which various ratios of WT and Kv3.1 A421V subunits are co-expressed in heterologous cells and then recorded for an overall effect on potassium current similar to (Clatot et al., 2017).

      In the revised manuscript, we have updated our discussion of these mechanistic considerations for KCNC1-related epilepsy syndromes in lines 868-883 in the Discussion. 

      References

      Cameron JM et al. (2019) Encephalopathies with KCNC1 variants: genotype-phenotypefunctional correlations. Annals of Clinical and Translational Neurology 6:1263– 1272.

      Clatot J, Hoshi M, Wan X, Liu H, Jain A, Shinlapawittayatorn K, Marionneau C, Ficker E, Ha T, Deschênes I (2017) Voltage-gated sodium channels assemble and gate as dimers. Nature Communications 8.

      Makinson CD, Tanaka BS, Sorokin JM, Wong JC, Christian CA, Goldin AL, Escayg A, Huguenard JR (2017) Regulation of Thalamic and Cortical Network Synchrony by Scn8a. Neuron 93:1165-1179.e6.

      Oliver KL et al. (2017) Myoclonus epilepsy and ataxia due to KCNC1 mutation: Analysis of 20 cases and K+ channel properties. Annals of Neurology 81.

      Park J et al. (2019) KCNC1-related disorders: new de novo variants expand the phenotypic spectrum. Annals of Clinical and Translational Neurology 6:1319–1326.

    1. Reviewer #1 (Public review):

      Sandkuhler et al. re-evaluated the biological functions of TANGO2 homologs in C. elegans, yeast, and zebrafish. Compared to the previously reported role of TANGO2 homologs in transporting heme, Sandkuhler et al. expressed a different opinion on the biological functions of TANGO2 homologs. With the support of some results from their tests, they conclude that 'there is insufficient evidence to support heme transport as the primary function of TANGO2', in addition to the evidence that C. elegans TANGO2 helps counteract oxidative stress.. While the differences are reported in this study, more work is needed to elucidate the intuitive biological function of TANGO2.

      Strengths:

      (1) This work revisits a set of key experiments, including the toxic heme analog GaPP survival assay, the fluorescent ZnMP accumulation assay, and the multi-organismal investigations documented by Sun et al. in Nature (2022), which are critical for comparing the two works. Meanwhile, the authors also highlight the differences in reagents and methods between the two studies, demonstrating significant academic merit.

      (2) This work reported additional phenotypes for the C. elegans mutant of the TANGO2 homologs, including lawn avoidance, reduced pharyngeal pumping, smaller brood size, faster exhaustion under swimming test, and a shorter lifespan. These phenotypes are important for understanding the biological function of TANGO2 homologs, while they were missing from the report by Sun et al.

      (3) Investigating the 'reduced GaPP consumption' as a cause of increased resistance against the toxic GaPP for the TANGO2 homologs, hrg-9 hrg-10 double null mutant provides a valuable perspective for studying the biological function of TANGO2 homologs.

      (4) The induction of hrg-9 gene expression by paraquat indicates a strong link between TANGO2 and mitochondrial function.

      (5) This work thoroughly evaluated the role of TANGO2 homologs in supporting yeast growth using multiple yeast strains and also pointed out the mitochondrial genome instability feature of the yeast strain used by Sun et al.

      Weakness:

      It is always a challenge to replicate someone else's work, but it is worthwhile to take on the challenge, provide evidence, and raise concerns about it. These authors attempted to replicate the experiment using the same biological material as that used by Sun et al. in Nature (2022), despite some experimental differences between the two studies. This study does not have many technical weaknesses, but it can become a much better project by focusing on the new phenotypes discovered here.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) A detailed comparison between this work and the work of Sun et al. on experimental protocols and reagents in the main text will be beneficial for readers to assess critically.

      We have added a Key Reagents Table outlining the key reagents used in our study. In terms of experimental protocols, we replicated those described by Sun et al. in most instances and described any differences when present. With this resubmission, we included additional ZnMP accumulation experiments in liquid media (see point 3 below).

      (2) The GaPP used by Sun et al. (purchased from Frontier Scientific) is more effective in killing the worm than the one used in this study (purchased from Santa Cruz). Is the different outcome due to the differences in reagents? Moreover, Sun et al. examined the lethality after 3-4 days, while this work examined the lethality after 72 hours. Would the extra 24 hours make any difference in the result?

      We now cite product vender differences as a possible reason for the observed difference in worm death, as the reviewer suggests, on page 8 (see text below) and include these differences in the Key Reagents Table. We also now stress the fact that our experiments included different doses of GaPP and the use of eat-2 mutants as an additional control, which we believe adds rigor and demonstrates the potency of GaPP in our experiments. We decided on assessment at 72 hours, as we deemed it a less nebulous time point as compared to 3-4 days. Most of the observed worm death occurred earlier in this interval, so we believe it is unlikely that large group differences would emerge after an additional 24 hours.

      “Exposing worms to GaPP, a toxic heme analog, we observed that nematodes deficient in HRG-9 and HRG-10 displayed increased survival compared to WT worms, consistent with prior work,[13] though the between-group difference was markedly smaller in our study. We required higher GaPP concentrations to induce lethality, potentially due to product vendor differences, but did observe a clear dose-dependent effect across strains. Although it was previously proposed that the survival benefit seen in worms lacking HRG-9 and HRG-10 resulted from reduced transfer from intestinal cells after GaPP ingestion, our data suggest the reduced lethality is more likely due to decreased environmental GaPP uptake. Supporting this notion, DKO worms exhibited lawn avoidance, reduced pharyngeal pumping, and modestly lower intestinal ZnMP accumulation when exposed to this fluorescent heme analog on agar plates. In liquid media, DKO worms demonstrated higher fluorescence, but only in ZnMP-free conditions, suggesting the presence of gut granule autofluorescence. Furthermore, survival following exposure to GaPP was highest in eat-2 mutants, despite heme trafficking being unaffected in this strain.”

      (3) This work reported the opposite result of Sun et al. for the fluorescent ZnMP accumulation assay. However, the experimental protocols used by the two studies are massively different. Sun et al. did the ZnMP staining by incubating the L4-stage worms in an axenic mCeHR2 medium containing 40 μM ZnMP (purchased from Frontier Scientific) and 4 μM heme at 20 ℃ for 16 h, while this work placed the L4-stage worms on the OP50 E. coli seeded NGM plates treated with 40 μM ZnMP (purchased from Santa Cruz) for 16 h. The liquid axenic mCeHR2 medium is bacteria-free, heme-free, and consistent for ZnMP uptake by worms. This work has mentioned that the hrg-9 hrg-10 double null mutant has bacterial lawn avoidance and reduced pharyngeal pumping phenotypes. Therefore, the ZnMP staining protocol used in this work faces challenges in the environmental control for the wild type vs. the mutant. The authors should adopt the ZnMP staining protocol used by Sun et al. for a proper evaluation of fluorescent ZnMP accumulation.

      We agree with this comment. As such, we performed the ZnMP assay in liquid media conditions, as now described on page 13:

      “For liquid media experiments, three generations of worms were cultured in regular heme (20 uM) axenic media, with the first two generations receiving antibiotic-supplemented media (10 mg/ml tetracycline) and the 3<sup>rd</sup> generation cultivated without antibiotic. L4 worms from the 3<sup>rd</sup> generation were placed in media containing 40uM ZnMP for 16 hours before being prepared and mounted for imaging as above. Worms were imaged on Zeiss Axio Imager 2 at 40x magnification, with image settings kept uniform across all images. Fluorescent intensity was measured within the proximal region of the intestine using ImageJ.”

      In heme-free media, both WT and DKO worms invariably entered L1 arrest, thus we were not able to replicate the results reported by Sun et al. Using media containing heme, we did see an increase in fluorescence, but this was only in the ZnMP-free condition, indicating that the increased signal was attributable to autofluorescence. This is a known phenomenon associated with gut granules in C. elegans in the setting of oxidative stress. The results of these experiments are now summarized on page 6:

      “DKO nematodes at the L4 larval stage were previously shown to accumulate the fluorescent heme analog zinc mesoporphyrin IX (ZnMP) in intestinal cells in low-heme (4 µM) liquid media. While attempting to replicate this experiment, we observed that both wildtype and DKO nematodes entered L1 arrest under these conditions. Therefore, to allow for developmental progression, we grew worms on standard OP50 E. coli plates and in media containing physiological levels of heme (20 µM). We then examined whether differences in ZnMP uptake persisted under these basal conditions. DKO worms grown on ZnMP-treated E. coli plates displayed significantly reduced intestinal ZnMP fluorescence compared to N2 (Figure 1B and C). Using basal heme media with ZnMP, there was no significant difference in ZnMP fluorescence between DKO and wildtype nematodes, although DKO worms grown in media without ZnMP exhibited significantly higher autofluorescence (Figure 1D and E). To test whether autofluorescence may have contributed to the higher fluorescent intensities previously reported in heme-deficient DKO worms, we repeated this experiment on agar plates under starved conditions but did not observe a difference between groups (Figure 1B).”

      (4) A striking difference between the two studies is that Sun et al. emphasize the biochemical function of TANGO2 homologs in heme transporting with evidence from some biochemical tests. In contrast, this work emphasizes the physiological function of TANGO2 homologs with evidence from multiple phenotypical observations. In the discussion part, the authors should address whether these observed phenotypes in this study can be due to the loss of heme transporting activities upon eliminating TANGO2 homologs. This action can improve the merit of academic debate and collaboration.

      Thank you for this suggestion. The following text has been added to the Discussion section (page 9):

      “In addition to altered pharyngeal pumping, DKO worms displayed multiple previously unreported phenotypic features, suggesting a broader metabolic impairment and reminiscent of some clinical manifestations observed in patients with TDD. Elucidating the mechanisms underlying this phenotype, and whether they reflect a core bioenergetic defect, is an active area of investigation in our lab. Several C. elegans heme-responsive genes have been characterized, revealing relatively specific defects in heme uptake or utilization rather than broad organismal dysfunction. For example, hrg-1 and hrg-4 mutants exhibit impaired growth only under heme-limited conditions,[23] and hrg-3 loss affects brood size and embryonic viability specifically when maternal heme is scarce.[24] ]By contrast, hrg-9 and hrg-10 mutants exhibit the most severe organismal phenotypes of the hrg family, to date, including reduced pharyngeal pumping, decreased motility, shortened lifespan, and smaller broods, even when fed a heme-replete diet.”

      Reviewer #2 (Public review):

      (1) The manuscript is written mainly as a criticism of a previously published paper. Although reproducibility in science is an issue that needs to be acknowledged, a manuscript should focus on the new data and the experiments that can better prove and strengthen the new claims.

      Thank you for this suggestion. While the primary intent of this study was to replicate key findings from the 2022 publication by Sun et al., the revised manuscript now emphasizes underlying mechanisms more broadly rather than focusing narrowly on that prior publication.

      (2) The current presentation of the logic of the study and its results does not help the authors deliver their message, although they possess great potential.

      We have attempted to rectify this through substantial revision of the Discussion section and other places throughout the manuscript.

      (3) The study is missing experiments to link hrg-9 and hrg-10 more directly to bioenergetic and oxidative stress pathways.

      The reviewer is correct in this assertion, but it was not our intent to definitively prove this link or, indeed, the primary mechanism of TANGO2 in the present manuscript. This said, we are actively engaged in this endeavor in our lab and anticipate these data will be published in a separate, forthcoming publication.

      We have added additional references pertaining to hrg-9 enrichment as part of the mitochondrial unfolded protein response (page 10) and a comparison of the phenotype observed in hrg-9 and hrg-10 deficient worms versus those lacking other proteins in the hrg family (page 9).

      Reviewer #3 (Public review):

      (1) The authors stress - with evidence provided in this paper or indicated in the literature - that the primary role of TANGO2 and its homologues is unlikely to be related to heme trafficking, arguing that observed effects on heme transport are instead downstream consequences of aberrant cellular metabolism. But in light of a mounting body of evidence (referenced by the authors) connecting more or less directly TANGO2 to heme trafficking and mobilization, it is recommended that the authors comment on how they think TANGO2 could relate to and be essential for heme trafficking, albeit in a secondary, moonlighting capacity. This would highlight a seemingly common theme in emerging key players in intracellular heme trafficking, as it appears to be the case for GAPDH - with accumulating evidence of this glycolytic enzyme being critical for heme delivery to several downstream proteins.

      TANGO2 is essential for mitochondrial health, albeit in a yet unknown capacity. In the absence of TANGO2, defects in heme trafficking may be secondary sequelae of mitochondrial dysfunction. We would point out that prior studies that attempted to show that TANGO2 and its homologs are involved in heme trafficking proposed very different mechanisms (direct binding vs. membrane protein interaction) and relied on artificially low or high heme conditions to produce these effects. We have attempted to address these more clearly in the Discussion section and have added a fifth figure to summarize our current unifying theory for how heme levels and mitochondrial stress may be linked.

      (2) The observation - using eat-2 mutants and lawn avoidance behaviour - that survival patterns can be partially explained by reduced consumption, is fascinating. It would be interesting to quantify the two relative contributions.

      We have completed additional ZnMP experiments in liquid media at the reviewers’ request. This experimental condition eliminates lawn avoidance as a factor in consumption. Fluorescent intensity was significantly higher in the DKO worms in media lacking ZnMP, indicating increased autofluorescence in DKO worms, while signal was not significantly different in media with ZnMP.

      (3) In the legend to Figure 1A it's a bit unclear what the differently coloured dots represent for each condition. Repeated measurements, worms, independent experiments? The authors should clarify this.

      The following sentence has been added to the legend for Figure 1:

      “Each dot represents the number of offspring laid by one adult worm on one GaPP-treated plate after 24 hours.”

      (4) It would help if the entire fluorescence images (raw and processed) for the ZnMP treatments were provided. Fluorescence images would also benefit Figure 1B.

      Fluorescent intensity values pertaining to the ZnMP experiments are included in our Extended Data supplement, and we have added representative images to Figure 1, per the reviewer’s request. We thank the reviewer for this helpful suggestion. We would be happy to upload raw images to an open-access repository if deemed necessary by the editorial team.

      (5) Increasingly, the understanding of heme-dependent roles relies on transient or indirect binding to unsuspected partners, not necessarily relying on a tight affinity and outdating the notion of heme as a static cofactor. Despite impressive recent advancements in the detection of these interactions (for example https://doi.org/10.1021/jacs.2c06104; cited by the authors), a full characterisation of the hemome is still elusive. Sandkuhler et al. deemed it possible but seem to question that heme binding to TANGO2 occurs. However, Sun et al. convincingly showed and characterised TANGO2 binding to heme. It is recommended that the authors comment on this.

      We believe it is plausible that TANGO2 binds heme (as do hundreds of other proteins), especially as it has been shown to bind other hydrophobic molecules. However, we also note that a separate paper examining the role of TANGO2 in heme transport posited that GAPDH is the sole heme binding partner for cytoplasmic transport (https://doi.org/10.1038/s41467-025-62819-2), contradicting the originally posited theory of how TANGO2 functions. This is described in the Discussion section and, as noted above, we have added an additional figure to demonstrate our unifying hypothesis for why TANGO2 may be important in the low-heme state, irrespective of any direct effect on heme trafficking.

      Additional comments and revisions:

      (1) It was suggested that a triple mutant (eat-2; hrg-9; hrg-10) be tested to determine the primary driver of GaPP toxicity. We appreciate this suggestion, but we offer the following rationale for why these experiments were not pursued. The eat-2 mutant, which lacks a nicotinic acetylcholine receptor subunit in pharyngeal muscles, was included solely as a dietary restriction control to illustrate that reduced GaPP toxicity in the hrg-9/10 double mutant could arise from poor feeding rather than defective heme transport. Both eat-2 and hrg-9/10 mutants exhibit markedly reduced feeding but via different mechanisms. In our assays, GaPP survival was inversely correlated with ingestion rate: eat-2 animals, which feed the least, showed the highest survival, while hrg-9/10 mutants showed intermediate feeding and intermediate survival. Consistent with this, eat-2 worms also displayed the lowest ZnMP accumulation.

      (2) GaPP solution was added to NGM plates after seeding with OP50. This is now expressly stated in the Methods section (page 15). We would note that Sun et al. mixed GaPP in with NGM in the liquid phase. We would expect that if there were a difference in GaPP exposure due to these different protocols, worms in our experiment would have received higher GaPP concentrations.

      “Standard NGM plates were treated with 1, 2, 5, or 10 µM gallium protoporphyrin IX (GaPP; Santa Cruz) after seeding with OP50. Plates were swirled to ensure an even distribution of GaPP and allowed to dry completely.

      (3) The manuscript has been reworked to read as more of an independent study rather than a rebuttal of prior work, though the primary objective of validating prior work remains unchanged.

      (4) Several technical details of experiments have been moved from the main text to the materials and methods section.

      (5) One reviewer noted that the figure numbering should be adjusted. Numbering does not progress sequentially (i.e., 1A…1B…2A…2B) early in the text, because we have opted to consolidate data pertaining to heme analog experiments in Figure 1 and behavioral data in Figure 2.

      (6) “Kingdoms” has been changed to “domains” (page 4).

      (7) Example images are now included for Figure 1B, as noted above.

    1. Reviewer #1 (Public review):

      Summary:

      The authors set out on the ambitious task of establishing the reproducibility of claims from the Drosophila immunity literature. Starting out from a corpus of 400 articles from 1959 and 2011, the authors sought to determine whether their claims were confirmed or contradicted by previous or subsequent publications. Additionally, they actively sought to replicate a subset of the claims for which no previous replications were available (although this set was not representative of the whole sample, as the authors focused on suspicious and/or easily testable claims). The focus of the article is on inferential reproducibility; thus, methods don't necessarily map exactly to the original ones.

      The authors present a large-scale analysis of the individual replication findings, which are presented in a companion article (Westlake et al., 2025. DOI 10.1101/2025.07.07.663442). In their retrospective analysis of reproducibility, the authors find that 61% of the original claims were verified by the literature, 7.5% were partialy verified, and only 6.8% were challenged, with 23.8% having no replication available. This is in stark contrast with the result of their prospective replications, in which only 16% of claims were successfully reproduced.

      The authors proceed to investigate correlates of replicability, with the most consistent finding being that findings stemming from higher-ranked universities (and possibly from very high impact journals) were more likely to be challenged.

      Strengths:

      (1) The work presents a large-scale, in-depth analysis of a particular field of science that includes authors with deep domain expertise of the field. This is a rare endeavour to establish the reproducibility of a particular subfield of science, and I'd argue that we need many more of these in different areas.

      (2) The project was built on a collaborative basis (https://ReproSci.epfl.ch/), using an online database (https://ReproSci.epfl.ch/), which was used to organize the annotations and comments of the community about the claims. The website remains online and can be a valuable resource to the Drosophila immunity community.

      (3) Data and code are shared in the authors' GitHub repository, with a Jupyter notebook available to reproduce the results.

      Main concerns:

      (1) Although the authors claim that "Drosophila immunity claims are mostly replicable", this conclusion is strictly based on the retrospective analysis - in which around 84% of the claims for which a published verification attempt was found. This is in very stark contrast with the findings that the authors replicate prospectively, of which only 16% are verified.

      Although this large discrepancy may be explained by the fact that the authors focused on unchallenged and suspicious claims (which seems to be their preferred explanation), an alternative hypothesis is that there is a large amount of confirmation bias in the Drosophila immunity literature, either because attempts to replicate previous findings tend to reach similar results due to researcher bias, or because results that validate previous findings are more likely to be published.

      Both explanations are plausible (and, not being an expert in the field, I'd have a hard time estimating their relative probability), and in the absence of prospective replication of a systematic sample of claims - which could determine whether the replication rate for a random sample of claims is as high as that observed in the literature -, both should be considered in the manuscript.

      (2) The fact that the analysis of factors correlating with reproducibility includes both prospective and retrospective replications also leads to the possibility of confusion bias in this analysis. If most of the challenged claims come from the authors' prospective replications, while most of the verified ones come from those that were replicated by the literature, it becomes unclear whether the identified factors are correlated with actual reproducibility of the claims or with the likelihood that a given claim will be tested by other authors and that this replication will be published.

      (3) The methods are very brief for a project of this size, and many of the aspects in determining whether claims were conceptually replicated and how replications were set up are missing.

      Some of these - such as the PubMed search string for the publications and a better description of the annotation process - are described in the companion article, but this could be more explicitly stated. Others, however, remain obscure. Statements such as "Claims were cross-checked with evidence from previous, contemporary and subsequent publications and assigned a verification category" summarize a very complex process for which more detail should be given - in particular because what constitutes inferential reproducibility is not a self-evident concept. And although I appreciate that what constitutes a replication is ultimately a case-by-case decision, a general description of the guidelines used by the authors to determine this should be provided. As these processes were done by one author and reviewed by another, it would also be useful to know the agreement rates between them to have a general sense of how reproducible the annotation process might be.

      The same gap in methods descriptions holds for the prospective replications. How were labs selected, how were experimental protocols developed, and how was the validity of the experiments as a conceptual replication assessed? I understand that providing the methods for each individual replication is beyond the scope of the article, but a general description of how they were developed would be important.

      (4) As far as I could tell, the large-scale analysis of the replication results was not preregistered, and many decisions seem somewhat ad hoc. In particular, the categorization of journals (e.g. low impact, high impact, "trophy") and universities (e.g. top 50, 51-100, 101+) relies on arbitrary thresholds, and it is unclear how much the results are dependent on these decisions, as no sensitivity analyses are provided.

      Particularly, for analyses that correlate reproducibility with continuous variable (such as year of publication, impact factor or university ranking, I'd strongly favor using these variables as continuous variables in the analysis (e.g. using logistic regression) rather than performing pairwise comparisons between categories determined by arbitrary cutoffs. This would not only reduce the impact of arbitrary thresholds in the analysis, but would also increase statistical power in the univariate analyses (as the whole sample can be used in at once) and reduce the number of parameters in the multivariate model (as they will be included as a single variable rather than multiple dummy variables when there are more than two categories).

      (5) The multivariate model used to investigate predictors of replicability includes unchallenged claims along with verified ones in the outcome, which seems like an odd decision. If the intention is to analyze which factors are correlated with reproducibility, it would make more sense to remove the unchallenged findings, as these are likely uninformative in this sense. In fact, based on the authors' own replications of unchallenged findings, they may be more likely to belong the "challenged" category than to the "unchallenged" one if they were to be verified.

    2. Reviewer #3 (Public review):

      Summary:

      The authors of this paper were trying to identify how reproducible, or not, their subfield (Drosophilia immunity) was since its inception over 50 years ago. This required identifying not only the papers, but the specific claims made in the paper, assessing if these claims were followed up in the literature, and if so whether the subsequent papers supported or refuted the original claim. In addition to this large manually curated effort, the authors further investigated some claims that were left unchallenged in the literature by conducting replications themselves. This provided a rich corpus of the subfield that could be investigated into what characteristics influence reproducibility.

      Strengths:

      A major strength of this study is the focus on a subfield, the detailing of identifying the main, major, and minor claims - which is a very challenging manual task - and then cataloging not only their assessment of if these claims were followed up in the literature, but also what characteristics might be contributing to reproducibility, which also included more manual effort to supplement the data that they were able to extract from the published papers. While this provides a rich dataset for analysis, there is a major weakness with this approach, which is not unique to this study.

      Weaknesses:

      The main weakness is relying heavily on the published literature as the source for if a claim was determined to be verified or not. There are many documented issues with this stemming from every field of research - such as publication bias, selective reporting, all the way to fraud. It's understandable why the authors took this approach - it is the only way to get at a breadth of the literature - however the flaw with this approach is it takes the literature as a solid ground truth, which it is not. At the same time, it is not reasonable to expect the authors to have conducted independent replications for all of the 400 papers they identified. However, there is a big difference trying to assess the reproducibility of the literature by using the literature as the 'ground truth' vs doing this independently like other large-scale replication projects have attempted to do. This means the interpretation of the data is a bit challenging.

      Below are suggestions for the authors and readers to consider:

      (1) I understand why the authors prefer to mention claims as their primary means of reporting what they found, but it is nested within paper, and that makes it very hard to understand how to interpret these results at times. I also cannot understand at the high-level the relationship between claims and papers. The methods suggest there are 3-4 major claims per paper, but at 400 papers and 1,006 claims, this averages to ~2.5 claims per paper. Can the authors consider describing this relationship better (e.g., distribution of claims and papers) and/or considering presenting the data two ways (primary figures as claims and complimentary supplementary figures with papers as the unit). This will help the reader interpret the data both ways without confusion. I am also curious how the results look when presented both ways (e.g., does shifting to the paper as the unit of analysis shift the figures and interpretation?). This is especially true since the first and last author analysis shows there is varying distribution of papers and claims by authors (and thus the relationship between these is important for the reader).

      (2) As mentioned above, I think the biggest weakness is that the authors are taking the literature at face value when assigning if a claim was validated or challenged vs gathering new independent evidence. This means the paper leans more on papers, making it more like a citation analysis vs an independent effort like other large-scale replication projects. I highly recommend the authors state this in their limitations section.

      On top of that, I have questions that I could not figure out (though I acknowledge I did not dig super deep into the data to try). The main comment I have is How was verified (and challenged) determined? It seems from the methods it was determined by "Claims were cross-checked with evidence from previous, contemporary and subsequent publications and assigned a verification category". If this is true, and all claims were done this way - are verified claims double counted then? (e.g., an original claim is found by a future claim to be verified - and thus that future claim is also considered to be verified because of the original claim).

      Related, did the authors look at the strength of validation or challenged claims? That is, if there is a relationship mapping the authors did for original claims and follow-up claims, I would imagine some claims have deeper (i.e., more) claims that followed up on them vs others. This might be interested to look at as well.

      (3) I recommend the authors add sample sizes when not present (e.g., Fig 4C). I also find that the sample sizes are a bit confusing, and I recommend the authors check them and add more explanation when not complete, like they did for Fig 4A. For example, Fig 7B equals to 178 labs (how did more than 156 labs get determined here?), and yet the total number of claims is 996 (opposed to 1,006). Another example, is why does Fig 8B not have all 156 labs accounted for? (related to Fig 8B, I caution on reporting a p value and drawing strong conclusions from this very small sample size - 22 authors). As a last example, Fig 8C has al 156 labs and 1,006 claims - is that expected? I guess it means authors who published before 1995 (as shown in Figure 8A continued to publish after 1995?) in that case, it's all authors? But the text says when they 'set up their lab' after 1995, but how can that be?

      (4) Finally, I think it would help if the authors expanded on the limitations generally and potential alternative explanations and/or driving factors. For example, the line "though likely underestimated' is indicated in the discussion about the low rate of challenged claims, it might be useful to call out how publication bias is likely the driver here and thus it needs to be carefully considered in the interpretation of this. Related, I caution the authors on overinterpreting their suggestive evidence. The abstract for example, states claims of what was found in their analysis, when these are suggestive at best, which the authors acknowledge in the paper. But since most people start with the abstract, I worry this is indicating stronger evidence than what the authors actually have.

      The authors should be applauded for the monumental effort they put into this project, which does a wonderful job of having experts within a subfield engage their community to understand the connectiveness of the literature and attempt to understand how reliable specific results are and what factors might contribute to them. This project provides a nice blueprint for others to build from as well as leverage the data generated from this subfield, and thus should have an impact in the broader discussion on reproducibility and reliability of research evidence.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This study introduces an important approach using selection linked integration (SLI) to generate Plasmodium falciparum lines expressing single, specific surface adhesins PfEMP1 variants, enabling precise study of PfEMP1 trafficking, receptor binding, and cytoadhesion. By moving the system to different parasite strains and introducing an advanced SLI2 system for additional genomic edits, this work provides compelling evidence for an innovative and rigorous platform to explore PfEMP1 biology and identify novel proteins essential for malaria pathogenesis including immune evasion.

      Reviewer #1 (Public review):

      One of the roadblocks in PfEMP1 research has been the challenges in manipulating var genes to incorporate markers to allow the transport of this protein to be tracked and to investigate the interactions taking place within the infected erythrocyte. In addition, the ability of Plasmodium falciparum to switch to different PfEMP1 variants during in vitro culture has complicated studies due to parasite populations drifting from the original (manipulated) var gene expression. Cronshagen et al have provided a useful system with which they demonstrate the ability to integrate a selectable drug marker into several different var genes that allows the PfEMP1 variant expression to be 'fixed'. This on its own represents a useful addition to the molecular toolbox and the range of var genes that have been modified suggests that the system will have broad application. As well as incorporating a selectable marker, the authors have also used selective linked integration (SLI) to introduce markers to track the transport of PfEMP1, investigate the route of transport, and probe interactions with PfEMP1 proteins in the infected host cell.

      What I particularly like about this paper is that the authors have not only put together what appears to be a largely robust system for further functional studies, but they have used it to produce a range of interesting findings including:

      Co-activation of rif and var genes when in a head-to-head orientation.

      The reduced control of expression of var genes in the 3D7-MEED parasite line.

      More support for the PTEX transport route for PfEMP1.

      Identification of new proteins involved in PfEMP1 interactions in the infected erythrocyte, including some required for cytoadherence.

      In most cases the experimental evidence is straightforward, and the data support the conclusions strongly. The authors have been very careful in the depth of their investigation, and where unexpected results have been obtained, they have looked carefully at why these have occurred.

      We thank the reviewer for the kind assessment and the comments to improve the paper.

      (1) In terms of incorporating a drug marker to drive mono-variant expression, the authors show that they can manipulate a range of var genes in two parasite lines (3D7 and IT4), producing around 90% expression of the targeted PfEMP1. Removal of drug selection produces the expected 'drift' in variant types being expressed. The exceptions to this are the 3D7-MEED line, which looks to be an interesting starting point to understand why this variant appears to have impaired mutually exclusive var gene expression and the EPCR-binding IT4var19 line. This latter finding was unexpected and the modified construct required several rounds of panning to produce parasites expressing the targeted PfEMP1 and bind to EPCR. The authors identified a PTP3 deficiency as the cause of the lack of PfEMP1 expression, which is an interesting finding in itself but potentially worrying for future studies. What was not clear was whether the selected IT4var19 line retained specific PfEMP1 expression once receptor panning was removed.

      We do not have systematic long-term data for the Var19 line but do have medium-term data. After panning the Var19 line, the binding assays were done within 3 months without additional panning. The first binding assay was 2 months after the panning and the last binding assays three weeks later, totaling about 3 months without panning. While there is inherent variation in these assays that precludes detection of smaller changes, the last assay showed the highest level of binding, giving no indication for rapid loss of the binding phenotype. Hence, we can say that the binding phenotype appears to be stable for many weeks without panning the cells again and there was no indication for a rapid loss of binding in these parasites.

      Systematic long-term experiments to assess how long the Var19 parasites retain binding would be interesting, but given that the binding-phenotype appears to remain stable over many weeks or even months, this would only make sense if done over a much longer time frame. Such data might arise if the line is used over extended times for a specific project in which case it might be advisable to monitor continued binding. We included a statement in the discussion that the binding phenotype was stable over many weeks but that if long-term work with this line is planned, monitoring the binding phenotype might be advisable: “In the course of this work the binding phenotype of the IT4var19 expressor line remained stable over many weeks without further panning. However, given that initial panning had been needed for this particular line, it might be advisable for future studies to monitor the binding phenotype if the line is used for experiments requiring extended periods of cultivation.”

      (2) The transport studies using the mDHFR constructs were quite complicated to understand but were explained very clearly in the text with good logical reasoning.

      We are aware of this being a complex issue and are glad this was nevertheless understandable.

      (3) By introducing a second SLI system, the authors have been able to alter other genes thought to be involved in PfEMP1 biology, particularly transport. An example of this is the inactivation of PTP1, which causes a loss of binding to CD36 and ICAM-1. It would have been helpful to have more insight into the interpretation of the IFAs as the anti-SBP1 staining in Figure 5D (PTP-TGD) looks similar to that shown in Figure 1C, which has PTP intact. The anti-EXP2 results are clearly different.

      We realize the description of the PTP1-TGD IFA data and that of the other TGDs (see also response to Recommendation to authors point 4 and reviewer 2, major points 6 and 7) was rather cursory. The previously reported PTP1 phenotype is a fragmentation of the Maurer’s clefts into what in IFA appear to be many smaller pieces (Rug et al 2014, referenced in the manuscript). The control in Fig. 5D has 13 Maurer’s cleft spots (previous work indicates an average of ~15 MC per parasite, see e.g. the originally co-submitted eLife preprint doi.org/10.7554/eLife.103633.1 and references therein). The control mentioned by the reviewer in Fig. 1C has about 22 Maurer’s clefts foci, at the upper end of the typical range, but not unusual. In contrast, the PTP1-TGD in Fig. 5D, has more than 30 foci with an additional cytoplasmic pool and additional smaller, difficult to count foci. This is consistent with the published phenotype in Rug et al 2014. The EXP1 stained cell has more than 40 Maurer’s cleft foci, again beyond what typically is observed in controls. Therefore, these cells show a difference to the control in Fig. 5 but also to Fig. 1C. Please note that we are looking at two different strains, in Fig. 1 it is 3D7 and in Fig. 5 IT4. While we did not systematically assess this, the Maurer’s clefts number per cell seemed to be largely comparable between these strains (Fig. 10C and D in the other eLife preprint doi.org/10.7554/eLife.103633.1). 

      Overall, as the PTP1 loss phenotype has already been reported, we did not go into more experimental detail. However, we now modified the text to more clearly describe how the phenotype in the PTP1-TGD parasites was different to control: “IFAs showed that in the PTP1-TGD parasites, SBP1 and PfEMP1 were found in many small foci in the host cell that exceeded the average number of ~ 15 Maurer’s clefts typically found per infected RBC [66] (Fig. 5D). This phenotype resembled the previously reported Maurer’s clefts phenotype of the PTP1 knock out in CS2 parasites [39].”

      (4) It is good to see the validation of PfEMP1 expression includes binding to several relevant receptors. The data presented use CHO-GFP as a negative control, which is relevant, but it would have been good to also see the use of receptor mAbs to indicate specific adhesion patterns. The CHO system if fine for expression validation studies, but due to the high levels of receptor expression on these cells, moving to the use of microvascular endothelial cells would be advisable. This may explain the unexpected ICAM-1 binding seen with the panned IT4var19 line.

      We agree with the reviewer that it is desirable to have better binding systems for studying individual binding interactions. As the main purpose of this paper was to introduce the system and provide proof of principle that the cells show binding, we did not move to more complicated binding systems. However, we would like to point out that the CSA binding was done on receptor alone in addition to the CSA-expressing HBEC-5i cells and was competed successfully with soluble CSA. In addition, apart from the additional ICAM1-binding of the Var19 line, all binding phenotypes were conform with expectations. We therefore hope the tools used for binding studies are acceptable at this stage of introducing the system while future work interested in specific PfEMP1 receptor interactions may use better systems, tailored to the specific question (e.g. endothelial organoid models and engineered human capillaries and inhibitory antibodies or relevant recombinant domains for competition).

      (5) The proxiome work is very interesting and has identified new leads for proteins interacting with PfEMP1, as well as suggesting that KAHRP is not one of these. The reduced expression seen with BirA* in position 3 is a little concerning but there appears to be sufficient expression to allow interactions to be identified with this construct. The quantitative impact of reduced expression for proxiome experiments will clearly require further work to define it.

      This is a valid point. Clearly there seems to be some impact on binding when BirA* is placed in the extracellular domain (either through reduced presentation or direct reduction of binding efficiency of the modified PfEMP1; please see also minor comment 10 reviewer 2). The exact quantitative impact on the proxiome is difficult to assess but we note that the relative enrichment of hits to each other is rather similar to the other two positions (Fig. 6H-J). We therefore believe the BioIDs with the 3 PfEMP1-BirA* constructs are sufficient to provide a general coverage of proteins proximal to PfEMP1 and hope this will aid in the identification of further proteins involved in PfEMP1 transport and surface display as illustrated with two of the hits targeted here.

      The impact of placing a domain on the extracellular region of PfEMP1 will have to be further evaluated if needed in other studies. But the finding that a large folded domain can be placed into this part at all, even if binding was reduced, in our opinion is a success (it was not foreseeable whether any such change would be tolerated at all).

      (6) The reduced receptor binding results from the TryThrA and EMPIC3 knockouts were very interesting, particularly as both still display PfEMP1 on the surface of the infected erythrocyte. While care needs to be taken in cross-referencing adhesion work in P. berghei and whether the machinery truly is functionally orthologous, it is a fair point to make in the discussion. The suggestion that interacting proteins may influence the "correct presentation of PfEMP1" is intriguing and I look forward to further work on this.

      We hope future work will be able to shed light on this.

      Overall, the authors have produced a useful and reasonably robust system to support functional studies on PfEMP1, which may provide a platform for future studies manipulating the domain content in the exon 1 portion of var genes. They have used this system to produce a range of interesting findings and to support its use by the research community. Finally, a small concern. Being able to select specific var gene switches using drug markers could provide some useful starting points to understand how switching happens in P. falciparum. However, our trypanosome colleagues might remind us that forcing switches may show us some mechanisms but perhaps not all.

      Point noted! From non-systematic data with the Var01 line that has been cultured for extended periods of time (several years), it seems other non-targeted vars remain silent in our SLI “activation” lines but how much SLI-based var-expression “fixing” tampers with the integrity of natural switching mechanisms is indeed very difficult to gage at this stage. We now added a statement to the discussion that even if mutually exclusive expression is maintained, it is not certain the mechanisms controlling var expression all remain intact: “However, it should be noted that it is not known whether all mechanisms controlling mutually exclusive expression and switching remain intact in parasites with SLI-activated var genes.”

      Reviewer #2 (Public review):

      Summary

      Croshagen et al develop a range of tools based on selection-linked integration (SLI) to study PfEMP1 function in P. falciparum. PfEMP1 is encoded by a family of ~60 var genes subject to mutually exclusive expression. Switching expression between different family members can modify the binding properties of the infected erythrocyte while avoiding the adaptive immune response. Although critical to parasite survival and Malaria disease pathology, PfEMP1 proteins are difficult to study owing to their large size and variable expression between parasites within the same population. The SLI approach previously developed by this group for genetic modification of P. falciparum is employed here to selectively and stably activate the expression of target var genes at the population level. Using this strategy, the binding properties of specific PfEMP1 variants were measured for several distinct var genes with a novel semi-automated pipeline to increase throughput and reduce bias. Activation of similar var genes in both the common lab strain 3D7 and the cytoadhesion competent FCR3/IT4 strain revealed higher binding for several PfEMP1 IT4 variants with distinct receptors, indicating this strain provides a superior background for studying PfEMP1 binding. SLI also enables modifications to target var gene products to study PfEMP1 trafficking and identify interacting partners by proximity-labeling proteomics, revealing two novel exported proteins required for cytoadherence. Overall, the data demonstrate a range of SLI-based approaches for studying PfEMP1 that will be broadly useful for understanding the basis for cytoadhesion and parasite virulence.

      We thank the reviewer for the kind assessment and the comments to improve the paper.

      Comments

      (1) While the capability of SLI to actively select var gene expression was initially reported by Omelianczyk et al., the present study greatly expands the utility of this approach. Several distinct var genes are activated in two different P. falciparum strains and shown to modify the binding properties of infected RBCs to distinct endothelial receptors; development of SLI2 enables multiple SLI modifications in the same parasite line; SLI is used to modify target var genes to study PfEMP1 trafficking and determine PfEMP1 interactomes with BioID. Curiously, Omelianczyk et al activated a single var (Pf3D7_0421300) and observed elevated expression of an adjacent var arranged in a head-to-tail manner, possibly resulting from local chromatin modifications enabling expression of the neighboring gene. In contrast, the present study observed activation of neighboring genes with head-to-head but not head-totail arrangement, which may be the result of shared promoter regions. The reason for these differing results is unclear although it should be noted that the two studies examined different var loci.

      The point that we are looking at different loci is very valid and we realize this is not mentioned in the discussion. We now added to the discussion that it is unclear if our results and those cited may be generalized and that different var gene loci may respond differently

      “However, it is unclear if this can be generalized and it is possible that different var loci respond differently.”

      (2) The IT4var19 panned line that became binding-competent showed increased expression of both paralogs of ptp3 (as well as a phista and gbp), suggesting that overexpression of PTP3 may improve PfEMP1 display and binding. Interestingly, IT4 appears to be the only known P. falciparum strain (only available in PlasmoDB) that encodes more than one ptp3 gene (PfIT_140083100 and PfIT_140084700). PfIT_140084700 is almost identical to the 3D7 PTP3 (except for a ~120 residue insertion in 3D7 beginning at residue 400). In contrast, while the C-terminal region of PfIT_140083100 shows near-perfect conservation with 3D7 PTP3 beginning at residue 450, the N-terminal regions between the PEXEL and residue 450 are quite different. This may indicate the generally stronger receptor binding observed in IT4 relative to 3D7 results from increased PTP3 activity due to multiple isoforms or that specialized trafficking machinery exists for some PfEMP1 proteins.

      We thank the reviewer for pointing this out, the exact differences between the two PTP3s of IT4 and that of other strains definitely should be closely examined if the function of these proteins in PfEMP1 binding is analysed in more detail. 

      It is an interesting idea that the PTP3 duplication could be a reason for the superior binding of IT4. We always assumed that IT4 had better binding because it was less culture adapted but this does not preclude that PTP3(s) is(are) a reason for this. However, at least in our 3D7 PTP3 can’t be the reason for the poor binding, as our 3D7 still has PfEMP1 on the surface while in the unpanned IT4-Var19 line and in the Maier et al., Cell 2008 ptp3 KO (PMID: 18614010)) PfEMP1 is not on the surface anymore. 

      Testing the impact of having two PTP3s would be interesting, but given the “mosaic” similarity of the two PTP3s isoforms, a simple add-on experiment might not be informative. Nevertheless, it will be interesting in future work to explore this in more detail.

      Reviewer #3 (Public review):

      Summary:

      The submission from Cronshagen and colleagues describes the application of a previously described method (selection linked integration) to the systematic study of PfEMP1 trafficking in the human malaria parasite Plasmodium falciparum. PfEMP1 is the primary virulence factor and surface antigen of infected red blood cells and is therefore a major focus of research into malaria pathogenesis. Since the discovery of the var gene family that encodes PfEMP1 in the late 1990s, there have been multiple hypotheses for how the protein is trafficked to the infected cell surface, crossing multiple membranes along the way. One difficulty in studying this process is the large size of the var gene family and the propensity of the parasites to switch which var gene is expressed, thus preventing straightforward gene modification-based strategies for tagging the expressed PfEMP1. Here the authors solve this problem by forcing the expression of a targeted var gene by fusing the PfEMP1 coding region with a drug-selectable marker separated by a skip peptide. This enabled them to generate relatively homogenous populations of parasites all expressing tagged (or otherwise modified) forms of PfEMP1 suitable for study. They then applied this method to study various aspects of PfEMP1 trafficking.

      Strengths:

      The study is very thorough, and the data are well presented. The authors used SLI to target multiple var genes, thus demonstrating the robustness of their strategy. They then perform experiments to investigate possible trafficking through PTEX, they knock out proteins thought to be involved in PfEMP1 trafficking and observe defects in cytoadherence, and they perform proximity labeling to further identify proteins potentially involved in PfEMP1 export. These are independent and complimentary approaches that together tell a very compelling story.

      We thank the reviewer for the kind assessment and the comments to improve the paper.

      Weaknesses:

      (1)  When the authors targeted IT4var19, they were successful in transcriptionally activating the gene, however, they did not initially obtain cytoadherent parasites. To observe binding to ICAM-1 and EPCR, they had to perform selection using panning. This is an interesting observation and potentially provides insights into PfEMP1 surface display, folding, etc. However, it also raises questions about other instances in which cytoadherence was not observed. Would panning of these other lines have been successfully selected for cytoadherent infected cells? Did the authors attempt panning of their 3D7 lines? Given that these parasites do export PfEMP1 to the infected cell surface (Figure 1D), it is possible that panning would similarly rescue binding. Likewise, the authors knocked out PTP1, TryThrA, and EMPIC3 and detected a loss of cytoadhesion, but they did not attempt panning to see if this could rescue binding. To ensure that the lack of cytoadhesion in these cases is not serendipitous (as it was when they activated IT4var19), they should demonstrate that panning cannot rescue binding.

      These are very important considerations. Indeed, we had repeatedly attempted to pan 3D7 when we failed to get the SLI-generated 3D7 PfEMP1 expressor lines to bind, but this had not been successful. The lack of binding had been a major obstacle that had held up the project and was only solved when we moved to IT4 which readily bound (apart from Var19 which was created later in the project). After that we made no further efforts to understand why 3D7 does not bind but the fact that PfEMP1 is on the surface indicates this is not a PTP3 issue because loss of PTP3 also leads to loss of PfEMP1 surface display. Also, as the parent 3D7 could not be panned, we assumed this issue is not easily fixed in the SLI var lines we made in 3D7.

      Panning the TGD lines: we see the reasoning for conducting panning experiments with the TGD lines. However, on second thought, we are unsure this should be attempted. The outcome might not be easily interpretable as at least two forces will contribute to the selection in panning experiments with TGD lines that do not bind anymore:

      Firstly, panning would work against the SLI of the TGD, resulting in a tug of war between the TGD-SLI and binding. This is because a small number of parasites will loop out the TGD plasmid (revert) and would normally be eliminated during standard culturing due to the SLI drug used for the TGD. These revertant cells would bind and the panning would enrich them. Hence, panning and SLI are opposed forces in the case of a TGD abolishing binding. It is unclear how strong this effect would be, but this would for sure lead to mixed populations that complicate interpretations. 

      The second selecting force are possible compensatory changes to restore binding. These can be due to different causes: (i) reversal of potential independent changes that may have occurred in the TGD parasites and that are in reality causing the binding loss (i.e. such as ptp3 loss or similar, the concern of the reviewer) or (ii) new changes to compensate the loss of the TGD target (in this case the TGD is the cause of the binding loss but for instance a different change ameliorates it by for instance increasing PfEMP1 expression or surface display). As both TGDs show some residual binding and have VAR01 on the surface to at least some extent, it is possible that new compensatory changes might indeed occur that indirectly increase binding again. 

      In summary, even if more binding occurs after panning of the lines, it is not clear whether this is due to a compensatory change ameliorating the TGD or reversal of an unrelated change or are counter-selections against the SLI. To determine the cause, the panned TGD lines would need to be subjected to a complex and time-consuming analysis (WGS, RNASeq, possibly Maurer’s clefts phenotype) to find out whether they were SLI-revertants, or had an unrelated chance that was reverted or a new compensatory change that helps binding. This might be further muddled if a mix of cells come out of the selection that have different changes of the options indicated above. In that case, it might even require scRNASeq to make sense of the panning experiment. Due to the envisaged difficulty in interpreting the outcome, we did not attempt this panning.

      To exclude loss of ptp3 expression as the reason for binding loss (something we would not have seen in the WGS if it is only due to a transcriptional change), we now carried out RNASeq with the TGD lines that have a binding phenotype. While we did not generate replicas to obtain quantitative data, the results show that both ptp3 copies were expressed in these TGDs comparable to other parasite lines that do bind with the same SLI-activated var gene, indicating that the effect is not due to ptp3 (see response to point 4 on PTP3 expression in the Recommendations for the authors). While we can’t fully exclude other changes in the TGDs that might affect binding, the WGS did not show any obvious alterations that could be responsible for this. 

      (2) The authors perform a series of trafficking experiments to help discern whether PfEMP1 is trafficked through PTEX. While the results were not entirely definitive, they make a strong case for PTEX in PfEMP1 export. The authors then used BioID to obtain a proxiome for PfEMP1 and identified proteins they suggest are involved in PfEMP1 trafficking. However, it seemed that components of PTEX were missing from the list of interacting proteins. Is this surprising and does this observation shed any additional light on the possibility of PfEMP1 trafficking through PTEX? This warrants a comment or discussion.

      This is an interesting point and we agree that this warrants to be discussed. A likely reason why PTEX components are not picked up as interactors is that BirA* is expected to be unfolded when it passes through the channel and in that state can’t biotinylate. Labelling likely would only be possible if PfEMP1 lingered at the PTEX translocation step before BirA* became unfolded to go through the channel which we would not expect under physiological conditions. We added the following sentences to the discussion: “While our data indicates PfEMP1 uses PTEX to reach the host cell, this could be expected to have resulted in the identification of PTEX components in the PfEMP1 proxiomes, which was not the case. However, as BirA* must be unfolded to pass through PTEX, it likely is unable to biotinylate translocon components unless PfEMP1 is stalled during translocation. For this reason, a lack of PTEX components in the PfEMP1 proxiomes does not necessarily exclude passage through PTEX.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Most of my comments are in the public section. I would just highlight a few things:

      (1) In the binding studies section you talk about "human brain endothelial cells (HBEC-5i)". These cells do indeed express CSA but this is a property of their immortalisation rather than being brain endotheliium, which does not express CSA. I think this could be confusing to readers so I think you might want to reword this sentence to focus on CSA expressing the cell line rather than other features.

      We thank the reviewer for pointing this out, we now modified the sentence to focus on the fact these are CSA expressing cells and provided a reference for it.

      (2) As I said in the public section, CHO cells are great for proof of concept studies, but they are not endothelium. Not a problem for this paper.

      Noted! Please also see our response to the public review.

      (3) I wonder whether your comment about how well tolerated the Bir3* insertion is may be a bit too strong. I might say "Nonetheless, overall the BirA* modified PfEMP1 were functional."

      Changed as requested.

      (4) I'm not sure how you explain the IFA staining patterns to the uninitiated, but perhaps you could explain some of the key features you are looking for.

      We apologise for not giving an explanation of the IFA staining patterns in the first place. Please see detailed response to public review of this reviewer (point 3 on PTP1-TGD phenotype) and to reviewer 2 (Recommendations to the authors, points 6 and 7 on better explaining and quantifying the Maurer’s clefts phenotypes). For this we now also generated parasites that episomally express mCherry tagged SBP1 in the TGD parasites with the reduced binding phenotype. This resulted in amendments to Fig. S7, addition of a Fig. S8 and updated results to better explain the phenotypes. 

      This is a great paper - I just wish I'd had this system before.

      Thank you!

      Reviewer #2 (Recommendations for the authors):

      Major Comments

      (1) Does the RNAseq analysis of 3D7var0425800 and 3D7MEEDvar0425800 (Figure 1G, H) reveal any differential gene expression that might suggest a basis for loss of mutually exclusive var expression in the MEED line?

      We now carried out a thorough analysis of these RNASeq experiments to look for an underlying cause for the phenotype. This was added as new Figure 1J and new Table S3. This analysis again illustrated the increased transcript levels of var genes. In addition, it showed that transcripts of a number of other exported proteins, including members of other gene families, were up in the MEED line. 

      One hit that might be causal of the phenotype was sip2, which was down by close to 8-fold (pAdj 0.025). While recent work in P. berghei found this ApiAP2 to be involved in the expression of merozoite genes (Nishi et al., Sci Advances 2025(PMID: 40117352)), previous work in P. falciparum showed that it binds heterochromatic telomere regions and certain var upstream regions (Flück et al., PlosPath 2010 (PMID: 20195509), now cited in the manuscript). The other notable change was an upregulation of the non-coding RNA ruf6 which had been linked with impaired mono-allelic var expression (Guizetti et al., NAR 2016 (PMID: 27466391), now also cited in the manuscript). While it would go beyond this manuscript to follow this up, it is conceivable that alterations in chromosome end biology due to sip2 downregulation or upregulation of ruf6 are causes of the observed phenotype

      We now added a paragraph on the more comprehensive analysis of the RNA Seq data of the MEED vs non-MEED lines at the end of the second results section.

      (2) Could the inability of the PfEMP1-mDHFR fusion to block translocation (Fig 2A) reflect unique features of PfEMP1 trafficking, such as the existence of a soluble, chaperoned trafficking state that is not fully folded? Was a PfEMP1-BPTI fusion ever tested as an alternative to mDHFR?

      This is an interesting suggestion. The PfEMP1-BPTI was never tested. However, a chaperoned trafficking state would likely also affect BPTI. Given that both domains (mDHFR and BPTI) in principle do the same when folded and would block when the construct is in the PV, it is not so likely that using a different blocking domain would make a difference. Therefore, the scenario where BPTI would block when mDHFR does not, is not that probable. The opposite would be possible (mDHFR blocking while BPTI does not, because only the latter depends on the redox state). However, this would only happen if the block  occurred before the construct reaches the PV.

      At present, we believe the lacking block to be due to the organization of the domains in the construct. In the PfEMP1-mDHFR construct in this manuscript the position of the blocking domain is further away from the TMD compared to all other previously tested mDHFR fusions. Increased distance to the TMD has previously been found to be a factor impairing the blocking function of mDHFR (Mesen-Ramirez et al., PlosPath 2016 (PMID: 27168322)). Hence, our suspicion that this is the reason for the lacking block with the PfEMP1-mDHFR rather than the type of blocking domain. However, the latter option can’t be fully excluded and we might test BPTI in future work.

      (3) The late promoter SBP1-mDHFR is 2A fused with the KAHRP reporter. Since 2A skipping efficiency varies between fusion contexts and significant amounts of unskipped protein can be present, it would be helpful to include a WB to determine the efficiency of skipping and provide confidence that the co-blocked KAHRP in the +WR condition (Fig 2D) is not actually fused to the C-terminus of SBP1-mDHFR-GFP.

      Fortunately, this T2A fusion (crt_SBP1-mDHFR-GFP-2A-KAHRP-mScarlet<sup>epi</sup>) was used before in work that included a Western blot showing its efficient skipping (S3 A Fig in MesenRamirez et al., PlosPath 2016). In agreement with these Western blot result, fluorescence microscopy showed very limited overlap of SBP1-mDHFR-GFP and KAHRP-mCherry in absence of WR (Fig. 3B in Mesen-Ramirez et al., PlosPath 2016 and Fig. 2 in this manuscript) which would not be the case if these two constructs were fused together. Please note that KAHRP is known to transiently localize to the Maurer’s clefts before reaching the knobs (Wickham et al., EMBOJ 2001, PMID: 11598007), and therefore occasional overlap with SBP1 at the Maurer’s clefts is expected. However, we would expect much more overlap if a substantial proportion of the construct population would not be skipped and therefore the co-blocked KAHRP-mCherry in the +WR sample is unlikely to be due to inefficient skipping and attachment to SBP1-mDHFR-GFP.

      (4) Does comparison of RNAseq from the various 3D7 and IT4 lines in the study provide any insight into PTP3 expression levels between strains with different binding capacities? Was the expression level of ptp3a/b in the IT4var19 panned line similar to the expression in the parent or other activated IT4 lines? Could the expanded ptp3 gene number in IT4 indicate that specialized trafficking machinery exists for some PfEMP1 proteins (ie, IT4var19 requires the divergent PTP3 paralog for efficient trafficking)?

      PTP3 in the different IT4 lines that bind:

      In those parasite lines that did bind, the intrinsic variation in the binding assays, the different binding properties of different PfEMP1 variants and the variation in RNA Seq experiments to compare different parasite lines precludes a correlation of binding level vs ptp3 expression. For instance, if a PfEMP1 variant has lower binding capacity, ptp3 may still be higher but binding would be lower than if comparing to a parasite line with a better binding PfEMP1 variant. Studying the effect of PTP3 levels on binding could probably be done by overexpressing PTP3 in the same PfEMP1 SLI expressor line and assessing how this affects binding, but this would go beyond this manuscript.

      PTP3 in panned vs unpanned Var19:

      We did some comparisons between IT4 parent, and the IT4-Var19 panned and unpanned

      (see Author response table 1). This did not reveal any clear associations. While the parent had somewhat lower ptp3 transcript levels, they were still clearly higher than in the unpanned Var19 line and other lines had also ptp3 levels comparable to the panned IT4-Var19 (see Author response table 2) 

      PTP3 in the TGDs and possible reason for binding phenotype:

      A key point is whether PTP3 could have influenced the lack of binding in the TGD lines (see also weakness section and point 1 of public review of reviewer 3: ptp3 may be an indirect cause resulting in lacking binding in TGD parasites). We now did RNA Seq to check for ptp3 expression in the relevant TGD lines although we did not do a systematic quantitative comparison (which would require 3 replicates of RNASeq), but we reasoned that loss of expression would also be evident in one replicate. There was no indication that the TGD lines had lost PTP3 expression (see Author response table 2) and this is unlikely to explain the binding loss in a similar fashion to the Var19 parasites. Generally, the IT4 lines showed expression of both ptp3 genes and only in the Var19 parasites before panning were the transcript levels considerably lower:

      Author response table 1.

      Parent vs IT4-Var19 panned and unpanned

      Author response table 2.

      TGD lines with binding phenotype vs parent

      The absence of an influence of PTP3 on the binding phenotype in the cell lines in this manuscript (besides Var19) is further supported by its role in PfEMP1 surface display. Previous work has shown that KO of ptp3 leads to a loss of VAR2CSA surface display (Maier et al., Cell 2008). The unpanned Var19 parasite also lacked PfEMP1 surface display and panning and the resulting appearance of the binding phenotype was accompanied by surface display of PfEMP1. As both, the EMPIC3 and TryThra-TGD lines had still at least some PfEMP1 on the surface, this also (in addition to the RNA Seq above) speaks against PTP3 being the cause of the binding phenotype. The same applies to 3D7 which despite the poor binding displays PfEMP1 on the host cell surface (Figure 1D). This indicating that also the binding phenotype in 3D7 is not due to PTP3 expression loss, as this would have abolished PfEMP1 surface display. 

      The idea about PTP3 paralogs for specific PfEMP1s is intriguing. In the future it might be interesting to test the frequency of parasites with two PTP3 paralogs in endemic settings and correlate it with the PfEMP1 repertoire, variant expression and potentially disease severity. 

      (5) The IT4var01 line shows substantially lower binding in Figure 5F compared with the data shown in Figure 4E and 6F. Does this reflect changes in the binding capacity of the line over time or is this variability inherent to the assay?

      There is some inherent variability in these assays. While we did not systematically assess this, we had no indication that this was due to the parasite line changing. The Var01 line was cultured for months and was frozen down and thawed more than once without a clear gradual trend for more or less binding. While we can’t exclude some variation from the parasite side, we suspect it is more a factor of the expression of the receptor on the CHO cells the iRBCs bind to. 

      Specifically, the assays in Fig. 6F and 4E mentioned by the reviewer both had an average binding to CD36 of around 1000 iE/mm2, only the experiments in Fig. 5F are different (~ 500 iE/mm2) but these were done with a different batch of CHO cells at a different time to the experiments in Fig. 6F and 4E. 

      (6) In Figure S7A, TryThrA and EMPIC3 show distinct localization as circles around the PfEMP1 signal while PeMP2 appears to co-localize with PfEMP1 or as immediately adjacent spots (strong colocalization is less apparent than SBP1, and the various PfEMP1 IFAs throughout the study). Does this indicate that TryThrA and EMPIC3 are peripheral MC proteins? Does this have any implications for their function in PfEMP1 binding? Some discussion would help as these differences are not mentioned in the text. For the EMPIC3 TGD IFAs, localization of SBP1 and PfEMP1 is noted to be normal but REX1 is not mentioned (although this also appears normal).

      We apologise for the lacking description of the candidate localisations and cursory description of the Maurer’s clefts phenotypes (next point). Our original intent was to not distract too much from the main flow of the manuscript as almost every part of the manuscript could be followed up with more details. However, we fully agree that this is unsatisfactory and now provided more description (this point) and more data (next point).

      Localisation of TryThrA and EMPIC3 compared to PfEMP1 at the Maurer’s clefts: the circular pattern is reminiscent of the results with Maurer’s clefts proteins reported by McMillan et al using 3D-SIM in 3D7 parasites (McMillan et al., Cell Microbiology 2014 (PMID: 23421990)). In that work SBP1 and MAHRP1 (both integral TMD proteins) were found in foci but REX1 (no TMD) in circular structures around these foci similar to what we observed here for TryThrA and EMPIC3 which both also lack a TMD. The SIM data in McMillan et al indicated that also PfEMP1 is “more peripheral”, although it did only partially overlap with REX1. The conclusion from that work was that there are sub-compartments at the Maurer’s clefts. In our IFAs (Fig. S7A) PfEMP1 is also only partially overlapping with the TryThrA and EMPIC3 circles, potentially indicating similar subcompartments to those observed by 3D-SIM. We agree with the reviewer that this might be indicative of peripheral MC proteins, fitting with a lack of TMD in these candidates, but we did not further speculate on this in the manuscript.

      We now added enlargements of the ring-like structures to better illustrate this observation in Fig. S7A. In addition, we now specifically mention the localization data and the ring like signal with TryThrA and EMPIC3 in the results and state that this may be similar to the observations by McMillan et al., Cell Microbiology 2014.

      We also thank the reviewer for pointing out that we had forgotten to mention REX1 in the EMPIC3-TGD, this was amended.  

      (7) The atypical localization in TryThrA TGD line claimed for PfEMP1 and SBP1 in Fig S7B is not obvious. While most REX1 is clustered into a few spots in the IFA staining for SBP1 and REX1, SBP1 is only partially located in these spots and appears normal in the above IFA staining for SBP1 and HA. The atypical localization of PfEMP1-HA is also not obvious to me. The authors should clarify what is meant by "atypical" localization and provide support with quantification given the difference between the two SBP1 images shown.

      We apologise for the inadequate description of these IFA phenotypes. The abnormal signal for SBP1, REX1 and PfEMP1 in the TryThrA-TGD included two phenotypes found with all 3 proteins: 

      (1) a dispersed signal for these proteins in the host cell in addition to foci (the control and the other TGD parasites have only dots in the host cell with no or very little detectable dispersed signal). 

      (2) foci of disproportionally high intensity and size, that we assumed might be aggregation or enlargement of the Maurer’s clefts or of the detected proteins.

      The reason for the difference between the REX1 (aggregation) phenotype and the PfEMP1 and SBP1 (dispersed signal, more smaller foci) phenotypes in the images in Fig. S7B is that both phenotypes were seen with all 3 proteins but we chose a REX1 stained cell to illustrate the aggregation phenotype (the SBP1 signal in the same cell is similar to the REX1 signal, illustrating that this phenotype is not REX1 specific; please note that this cell also has a dispersed pool of REX1 and SBP1). 

      Based on the IFAs 66% (n = 106 cells) of the cells in the TryThrA-TGD parasites had one or both of the observed phenotypes. We did not include this into the previous version of the manuscript because a description would have required detouring from the main focus of this results section. In addition, IFAs have some limitations for accurate quantifications, particularly for soluble pools (depending on fixing efficiency and agent, more or less of a soluble pool in the host cell can leak out). 

      To answer the request to better explain and quantify the phenotype and given the limitations of IFA, we now transfected the TryThrA-TGD parasites with a plasmid mediating episomal expression of SBP1-mCherry, permitting live cell imaging and a better classification of the Maurer’s clefts phenotype. Due to the two SLI modifications in these parasites (using up 4 resistance markers) we had to use a new selection marker (mutated lactate transporter PfFNT, providing resistance to BH267.meta (Walloch et al., J. Med. Chem. 2020 (PMID: 32816478))) to transfect these parasites with an additional plasmid. 

      These results are now provided as Fig. S8 and detailed in the last results section. The new data shows that the majority of the TryThrA-TGD parasites contain a dispersed pool of SBP1 in the host cell. About a third of the parasites also showed disproportionally strong SBP1 foci that may be aggregates of the Maurer’s clefts. We also transfected the EMPIC3-TGD parasites with the FNT plasmid mediating episomal SBP1-mCherry expression and observed only few cells with a cytoplasmic pool or aggregates (Fig. S8). Overall these findings agree with the previous IFA results. As the IFA suggests similar results also for REX1 and PfEMP1, this defect is likely not SBP1 specific but more general (Maurer’s clefts morphology; association or transport of multiple proteins to the Maurer’s clefts). This gives a likely explanation for the cytoadherence phenotype in the TryThrA-TGD parasites. The reason for the EMPIC3-TGD phenotype remains to be determined as we did not detect obvious changes of the Maurer’s clefts morphology or in the transport of proteins to these structures in these experiments. 

      Minor comments

      (1) Italicized numbers in parenthesis are present in several places in the manuscript but it is not clear what these refer to (perhaps differently formatted citations from a previous version of the manuscript). Figure 1

      legend: (121); Figure S3 legend: (110), (111); Figure S6 legend: (66); etc.

      We thank the reviewer for pointing out this issue with the references, this was amended.

      (2) Figure 5A and legend: "BSD-R: BSD-resistance gene". Blasticidin-S (BS) is the drug while Blasticidin-S deaminase (BSD) is the resistance gene.

      We thank the reviewer for pointing this out, the legend and figure were changed.

      (3) Figure 5E legend: µ-SBP1-N should be α-SBP1-N.

      This was amended.

      (4) Figure S5 legend: "(Full data in Table S1)" should be Table S3.

      This was amended.

      (5) Figure S1G: The pie chart shows PF3D7_0425700 accounts for 43% of rif expression in 3D7var0425800 but the text indicates 62%.

      We apologize for this mistake, the text was corrected. We also improved the citations to Fig. S1G and H in this section.

      (6) "most PfEMP1-trafficking proteins show a similar early expression..." The authors might consider including a table of proteins known to be required for EMP1 trafficking and a graph showing their expression timing. Are any with later expressions known?

      Most exported proteins are expressed early, which is nicely shown in Marti et al 2004 (cited for the statement) in a graph of the expression timing of all PEXEL proteins (Fig. 4B in that paper). PNEPs also have a similar profile (Grüring et al 2011, also cited for that statement), further illustrated by using early expression as a criterion to find more PNEPs (Heiber et al., 2013 (PMID: 23950716)). Together this includes most if not all of the known PfEMP1 trafficking proteins. The originally co-submitted paper (Blancke-Soares & Stäcker et al., eLife preprint doi.org/10.7554/eLife.103633.1) analysed several later expressed exported proteins

      (Pf332, MSRP6) but their disruption, while influencing Maurer’s clefs morphology and anchoring, did not influence PfEMP1 transport. However, there are some conflicting results for Pf332 (referenced in Blancke-Soares & Stäcker et al). This illustrates that it may not be so easy to decide which proteins are bona fide PfEMP1 trafficking proteins. We therefore did not add a table and hope it is acceptable for the reader to rely on the provided 3 references to back this statement.

      (7)  Figure S1J: The predominate var in the IT4 WT parent is var66 (which appears to be syntenic with Pf3D7_0809100, the predominate var in the 3D7 WT parent). Is there something about this locus or parasite culture conditions that selects for these vars in culture? Is this observed in other labs as well?

      This is a very interesting point (although we are not certain these vars are indeed syntenic, they are on different chromosomes). As far as we know at least Pf3D7_0809100 is commonly a dominant var transcribed in other labs and was found expressed also in sporozoites (Zanghì et al. Cell Rep. 2018). However, it is unclear how uniform this really is. For IT4 we do not know in full but have also here commonly observed centromeric var genes to be dominating transcripts in unselected parasite cultures. It is possible that transcription drifts to centromeric var genes in cultured parasites. However, given the anecdotal evidence, it is unknown to which extent this is related to an inherent switching and regulation regiment or a consequence of faulty regulation following prolonged culturing.

      (8) Figure 4B, C: Presumably the asterisks on the DNA gels indicate non-specific bands but this is not described in the legend. Why are non-specific bands not consistent between parent and integrated lanes?

      We apologize for not mentioning this in the legend, this was amended.

      It is not clear why the non-specific bands differ between the lines but in part this might be due to different concentrations and quality of DNA preps. A PCR can also behave differently depending on whether the correct primer target is present or not. If present, the PCR will run efficiently and other spurious products will be outcompeted, but in absence of the correct target, they might become detectable.  

      Overall, we do not think the non-specific bands are indications of anything untoward with the lines, as for instance in Fig. 4B the high band in the 5’ integration in the IT4 line (that does not occur anywhere else) can’t be due to a genomic change as this is the parental line and does not contain the plasmid for integration. In the same gel, the ori locus band of incorrect size (likely due to crossreaction of the primers to another var gene which due to the high similarity of the ATS region is not always fully avoidable), is present in both, the parent IT4 and the integrant line which therefore also is not of concern. In C there are a couple of bands of incorrect size in the Integration line. One of these is very faint and both are too large and again therefore are likely other vars that are inefficiently picked up by these primers. The reason they are not seen in the parent line is that there the correct primer binding site is present, which then efficiently produces a product that outcompetes the product derived from non-optimal matching primer products and hence appear in the Int line where the correct match is not there anymore. For these reasons we believe these bands are not of any concern.  

      (9) Figure 4C: Is there a reason KAHRP was used as a co-marker for the IFA detecting IT4var19 expression instead of SBP1 which was used throughout the rest of the study?

      This is a coincidence as this line was tested when other lines were tested for KAHRP. As there were foci in the host cell we were satisfied that the HA-tagged PfEMP1 is produced and the localization deemed plausible. 

      (10) Figure 6: Streptavidin labeling for the IT4var01-BirA position 3 line is substantially less than the other two lines in both IFA and WB. Does the position 3 fusion reduce PfEMP1 protein levels or is this a result of the context or surface display of the fusion? Interestingly, the position 3 trypsin cleavage product appears consistently more robust compared with the other two configurations. Does this indicate that positioning BirA upstream of the TM increases RBC membrane insertion and/or makes the surface localized protein more accessible to trypsin?

      It is possible that RBC membrane insertion or trypsin accessibility is increased for the position 3 construct. But there could also be other explanations:

      The reason for the more robustly detected protected fragment for the position 3 construct in the WB might also be its smaller size (in contrast to the other two versions, it does not contain BirA*) which might permit more efficient transfer to the WB membrane. In that case the more robust band might not (only) be due to better membrane insertion or better trypsin accessibility.

      The lower biotinylation signal with the position 3 construct might also be explained by the farther distance of BirA* to the ATS (compared to position 1 and 2), the region where interactors are expected to bind. The position 1 and 2 constructs may therefore generally be more efficient (as closer) to biotinylate ATS proximal proteins. Further, in the final destination (PfEMP1 inserted into the RBC membrane) BirA* would be on the other side of the membrane in the position 3 construct while in the position 1 and 2 constructs BirA* would be on the side of the membrane where the ATS anchors PfEMP1 in the knob structure. In that case, labelling with position 3 would come from interactions/proximities during transport or at the Maurer’s clefts (if there indeed PfEMP1 is not membrane embedded) and might therefore be less.

      Hence, while alterations in trypsin accessibility and RBC membrane insertion are possible explanations, other explanations exist. At present, we do not know which of these explanations apply and therefore did not mention any of them in the manuscript. 

      Reviewer #3 (Recommendations for the authors):

      (1) In the abstract and on page 8, the authors mention that they generate cell lines binding to "all major endothelial receptors" and "all known major receptors". This is a pretty allencompassing statement that might not be fully accepted by others who have reported binding to other receptors not considered in this paper (e.g. VCAM, TSP, hyaluronic acid, etc). It would be better to change this statement to something like "the most common endothelial receptors" or "the dominant endothelial receptors", or something similar.

      We agree with the reviewer that these statements are too all-encompassing and changed them to “the most common endothelial receptors” (introduction) and “the most common receptors” (results).

      (2) The authors targeted two rif genes for activation and in each case the gene became the most highly expressed member of the family. However, unlike var genes, there were other rif genes also expressed in these lines and the activated copy did not always make up the majority of rif mRNAs. The authors might wish to highlight that this is inconsistent with mutually exclusive expression of this gene family, something that has been discussed in the past but not definitively shown.

      We thank the reviewer for highlighting this, we now added the following statement to this section: “While SLI-activation of rif genes also led to the dominant expression of the targeted rif gene, other rif genes still took up a substantial proportion of all detected rif transcripts, speaking against a mutually exclusive expression in the manner seen with var genes.”

      (3) In Figure 6, H-J, the authors display volcano plots showing proteins that are thought to interact with PfEMP1. These are labeled with names from the literature, however, several are named simply "1, 2, 3, 4, 5, or 6". What do these numbers stand for?

      We apologize for not clarifying this and thank the reviewer for pointing this out. There is a legend for the numbered proteins in what is now Table S4 (previously Table S3). We now amended the legend of Figure 6 to explain the numbers and pointing the reader to Table S4 for the accessions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The parts of the text that have been changed.The major changes are as follows:

      We re-analyzed the dataset and improved the local resolution of the extracellular region (Author response image 1).

      We re-modeled based on the improved density and canceled the bicarbonate model based on comments from all reviewers.

      We performed calcium assay using cell lines stably expressing the mutants, whose surface expression levels were analyzed by fluorescence-activated cell sorting (FACS)<br /> (Figure 3F, G and Figure 3–figure supplement 1-3).

      Thus, we significantly revised our discussion of the extracellular binding pocket and the result of the mutational study. In the revised manuscript, we speculate that H307 is a candidate for the bicarbonate binding site.

      Author response image 1.

      Figure Comparison of local resolution between re-analyzed and previous maps.A Side and top view of the re-analyzed receptor-focused map of GPR30 colored by local resolution. B Side and top view of the previous receptor-focused map of GPR30 colored by local resolution

      Reviewer #1 (Public Review):

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, which was recently identified as a bicarbonate receptor by the authors' lab. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. However, the main claim of the paper, the identification of the bicarbonate binding site, is only partly supported by the structural and functional data, leaving the study incomplete.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling seem solid. The authors perform fairly extensive unbiased mutagenesis to identify a host of positions that are important to G-protein signaling. To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study a particularly important contribution to the field.

      Weaknesses:

      Without higher resolution structures and/or additional experimental assessment of the binding pocket, the assignment of the bicarbonate remains highly speculative. The local resolution is especially poor in the ECL loop region where the ligand is proposed to bind (4.3 - 4 .8 Å range). Of course, sometimes it is difficult to achieve high structural resolution, but in these cases, the assignment of ligands should be backed up by even more rigorous experimental validation.The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. Thus, disruption of bicarbonate signaling by mutagenesis of the putative coordinating residues does not necessarily mean that bicarbonate binding has been disrupted. Moreover, the mutagenesis was apparently done prior to structure determination, meaning that residues proposed to directly surround bicarbonate binding, such as E218, were not experimentally validated. Targeted mutagenesis based on the structure would strengthen the story.

      Moreover, the proposed bicarbonate binding site is surprising in a chemical sense, as it is located within an acidic pocket. The authors cite several other structural studies to support the surprising observation of anionic bicarbonate surrounded by glutamate residues in an acidic pocket (references 31-34). However, it should be noted that in general, these other structures also possess a metal ion (sodium or calcium) and/or a basic sidechain (arginine or lysine) in the coordination sphere, forming a tight ion pair. Thus, the assigned bicarbonate binding site in GPR30 remains an anomaly in terms of the chemical properties of the proposed binding site.

      Thank you for your insightful comments. Based on the weaknesses you pointed out, we reconstructed the receptor based on the improved density and removed the bicarbonate model. We performed calcium assays using cell lines stably expressing the variant based on the structure.

      Reviewer #2(Public Review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work (PMID: 38413581). In the current body of work, they solved the first cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.21 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 4 extracellular pockets created by extracellular loops (ECLs) (Pockets A-D). Based on the polarity, location, and charge of each pocket, the authors hypothesized that pocket D is a good candidate for the bicarbonate binding site. To verify their structural observation, on top of the 10 mutations they generated in the previous work, the authors introduced another 11 mutations to map out the essential residues for the bicarbonate response on hGPR30. In addition, the human GPR30-G-protein complex model also allowed the authors to untangle the G-protein coupling mechanism of this special class A GPCR that plays an important role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communication publication (PMID: 38413581), this study was carefully designed, and the authors used mutagenesis and functional studies to confirm their structural observations. This work provided high-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 4 extracellular pockets created by ECLs (Pockets A-D). The authors were able to filter out 3 of them and identified that pocket D was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they carefully mapped out nine amino acids that are critical for receptor reactivity.

      Weaknesses:

      It is unclear how novel the aspects presented in the new paper are compared to the most recent Nature Communications publication (PMID: 38413581). Some areas of the manuscript appear to be mixed with the previous publication. The work is still impactful to the field. The new and novel aspects of this manuscript could be better highlighted.

      I also have some concerns about the TGFα shedding assay the authors used to verify their structural observation. I understand that this assay was also used in the authors' previous work published in Nature Communications. However, there are still several things in the current data that raised concerns:

      Thank you for your insightful comments. Based on the weaknesses you pointed out, we highlighted the new and novel aspects of this manuscript could be better highlighted.l. We performed calcium assays using cell lines stably expressing the variant based on the structure.

      (1) The authors confirmed the "similar expression levels of HA-tagged hGPR30" mutants by WB in Supplemental Figure 1A and B. However, compared to the hGPR30-HA (~6.5 when normalized to the housekeeping gene, Na-K-ATPase), several mutants of the key amino acids had much lower surface expression: S134A, D210A, C207A had ~50% reduction, D125A had ~30% reduction, and Q215A and P71A had ~20% reduction. This weakens the receptor reactivity measured by the TGFα shedding assay.

      Since the calcium assay data is included in the main figure, the TGFα shedding assay and WB expression quantification data are Figure 3. –– supplement figure 1-4, but we included an explanation of the expression levels in the figure caption.

      (2) In the previous work, the authors demonstrated that hGPR30 signals through the Gq signaling pathway and can trigger calcium mobilization. Given that calcium mobilization is a more direct measurement for the downstream signaling of hGPR30 than the TGFα shedding assay, pairing the mutagenesis study with the calcium assay will be a better functional validation to confirm the disruption of bicarbonate signaling.

      According to the suggestion, we performed calcium assay using cell lines stably expressing the mutants (Figure 3F, G and Figure 3–figure supplement 1-3).

      (3) It was quite confusing for Figure 4B that all statistical analyses were done by comparing to the mock group. It would be clearer to compare the activity of the mutants to the wild-type cell line.

      Thank you for your comment. As you mentioned, the comparisons are made between wild-type GPR30 and mutants in the revised manuscript (Figure 3G, Figure 3.—figure supplement 4B)

      Additional concerns about the structural data include

      (1) E218 was in close contact with bicarbonate in Figure 4D. However, there is no functional validation for this observation. Including the mutagenesis study of this site in the cell-based functional assay will strengthen this structural observation.

      We cancelled the bicarbonate model, and we performed mutation analysis targeting all residues facing the binding pocket using cell lines that stably express variants including E218A.

      (2) For the flow chart of the cryo-EM data processing in Supplemental data 2, the authors started with 10,148,422 particles after template picking, then had 441,348 Particles left after 2D classification/heterogenous refinement, and finally ended with 148,600 particles for the local refinement for the final map. There seems to be a lot of heterogeneity in this purified sample. GPCRs usually have flexible and dynamic loop regions, which explains the poor resolution of the ECLs in this case. Thus, a solid cell-based functional validation is a must to assign the bicarbonate binding pocket to support their hypothesis.

      We re-analyzed the dataset and improved the local resolution of the extracellular region (Author response image 1) and cancelled the bicarbonate model. Yet, as suggested by the reviewer, solid cell-based functional validation is efficient to analyze the receptor function response to bicarbonate. Thus, we performed mutation analysis targeting all residues facing the binding pocket using cell lines stably expressing the mutants, whose surface expression levels were analyzed by FACS (Figure 3F, G and Figure 3.––figure supplement 1-3).

      Reviewer #3 (Public Review):

      Summary:

      GPR30 responds to bicarbonate and regulates cellular responses to pH and ion homeostasis. However, it remains unclear how GPR30 recognizes bicarbonate ions. This paper presents the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate. The structure together with functional studies aims to provide mechanistic insights into bicarbonate recognition and G protein coupling.

      Strengths:

      The authors performed comprehensive mutagenesis studies to map the possible binding site of bicarbonate.

      Weaknesses:

      Owing to the poor resolution of the structure, some structural findings may be overclaimed.

      Based on EM maps shown in Figure 1a and Figure Supplement 2, densities for side chains in the receptor particularly in ECLs (around 4 Å) are poorly defined. At this resolution, it is unlikely to observe a disulfide bond (C130ECL1-C207ECl2) and bicarbonate ions. Moreover, the disulfide between ECL1 and ECL2 has not been observed in other GPCRs and the published structure of GPR30 (PMID: 38744981). The density of this disulfide bond could be noise.

      The authors observed a weak density in pocket D, which is accounted for by the bicarbonate ions. This ion is mainly coordinated by Q215 and Q138. However, the Q215A mutation only reduced but not completely abolished bicarbonate response, and the author did not present the data of Q138A mutation. Therefore, Q215 and Q138 could not be bicarbonate binding sites. While H307A completely abolished bicarbonate response, the authors proposed that this residue plays a structural role. Nevertheless, based on the structure, H307 is exposed and may be involved in binding bicarbonate. The assignment of bicarbonate in the structure is not supported by the data.

      Thank you for your insightful comments. Based on the weaknesses you pointed out, we reconstructed the receptor based on the improved density and removed the bicarbonate model. We performed calcium assays using cell lines stably expressing the variant based on the structure.

      Reviewer #1 (Recommendations For The Authors):

      (1) The experimental validation of the bicarbonate binding could be strengthened by developing an assay that directly monitors bicarbonate binding (rather than GPCR signaling)

      We agree that a direct binding assay for bicarbonate would be highly attractive (i.e. Filter binding assay using 14C-HCO₃⁻). However, the weak affinity of bicarbonate ions (in the mM range) would make reliable radioisotope-based detection impossible due to minimal specific receptor occupancy and high non-specific background and thus it is highly challenging and there are limitations to what can be done in this structural paper.

      and determining a structure at comparable resolution in the absence of bicarbonate. In addition, all residues that are proposed to be located adjacent to the bicarbonate should be mutated and functionally validated.

      We re-modeled the receptor based on the improved density and canceled the bicarbonate model. We performed calcium assay using cell lines stably expressing the mutants (Figure 3F, G and Figure 3.–figure supplement 1-3).

      (2) What are the maps contoured in Figure 4D? The legend should describe this. Is 218 within the map region shown, or is there no density for its sidechain?

      We removed the corresponding figure and cancelled the bicarbonate model.

      (3) The contour level of the maps in Figure 1 - Figure Supplement 2 should also be indicated. Are these all contoured at the same level?

      Thank you for your comment. We re-analyzed the same data set and obtained new density maps and models. We reworked Figure 1 and Figure 1. figure supplement 2; the contour level of the map for Figure 1 and composite map for the Figure 1. figure supplement 2 is the same, 7.65. 

      (4) Regarding the cited structures of bicarbonate-binding proteins, for three of the four cited structures, the bicarbonate is actually coordinated by positive ligands, with the Asp/Glu playing a more peripheral role:

      Capper et al: Overall basic cavity with tight bidentate coordination by Arg. The Glu is 5-6 Å away.

      Koropatkin et al: Two structures. The first, solved at pH 5, is proposed to have carbonic acid bound. The second, solved at pH 8, shows carbonate in a complex with calcium, with the calcium coordinated by carboxylates.

      Wang et al: The bicarbonate is coordinated by a lysine and a sodium ion. The sodium is coordinated by carboxylates.

      The authors should more thoughtfully discuss the unusual properties of this binding site with regard to the previous literature. Is it possible that bicarbonate binds in complex with a metal ion? Could this possibility be experimentally tested?

      We cancelled the bicarbonate model.

      (5) As a structure of GPR30 has been recently published by another group (PMID: 38744981), it would be valuable to discuss structural similarities and differences and discuss how bicarbonate activation and activation by the chloroquine ligand identified by the other group might both be accommodated by this structure.

      Thank you for your valuable comment. We compared the structure presented by another group and added our discussion, as “During the revision of this manuscript, the structures of apo-GPR30-G<sub>q</sub> (PDB 8XOG) and the exogenous ligand Lys05-bound GPR30-G<sub>q</sub> (PDB 8XOF) were reported [42]. We compared our structure of GPR30 in the presence of bicarbonate with these structures. In the extracellular region, the position of TM5 in GPR30 in the presence of bicarbonate is similar to that in apo-GPR30. In contrast, the position of TM6 is shifted outward relative to that of apo-GPR30, resembling the conformation observed in Lys05-bound GPR30 (Figure 6A, B). Additionally, the position of ECL1 is also shifted outward compared to that of apo-GPR30 (Figure 6B). In the GPR30 structure in the presence of bicarbonate, ECL2 was modeled, suggesting differences in structural flexibility. These findings indicate that the structure of GPR30 in the presence of bicarbonate is different from both the apo structure and the Lys05-bound structure, demonstrating that the structure and the flexibility of the extracellular domain of GPR30 change depending on the type of ligand. Furthermore, focusing on the interaction with G<sub>q</sub>, the αN helix of G<sub>q</sub> is not rotated in the structure bound to Lys05, in contrast to the characteristic bending of the αN helix in our structure (Figure 6C, D). Although it is necessary to consider variations in experimental conditions, such as salt concentration, the differences in the G<sub>q</sub> binding modes suggest that the downstream signals may change in a ligand-dependent manner.” (lines 249-266).

      Reviewer #2 (Recommendations For The Authors):

      (1) It is highly recommended that the authors carefully go through the "insights into bicarbonate binding" section. The results of the new findings in this paper were blended in with the results from the previous work: the importance of E115, Q138, and H307 in the receptor-bicarbonate interaction was shown in the Nature Communication paper but the authors didn't make it clear, which added a little confusion.

      We emphasized this fact in the main text (lines 130-132).

      (2) It would be nice for the authors to add some content about the physiological concentration of HCO3 or refer more to their previous work about the rationale for selecting the bicarbonate dose in their functional assay.

      Thank you for your comment. The physiological concentration of bicarbonate is 22-26 mM in the extracellular fluid, including interstitial fluid and blood, and 10-12 mM in the intracellular fluid. The bicarbonate concentration alters in various physiological and pathological conditions – metabolic acidosis in chronic kidney disease causes a drop to 2-3 mM, and metabolic alkalosis induced by severe vomiting increases HCO<sub>3</sub><sup>-</sup> concentrations more than 30 mM. Thus, our present and previous works clearly show that GPR30 is activated by physiological concentrations of bicarbonate, whether it is localized intracellularly or on the membrane, and that GPR30 can be deactivated or reactivated in various pathophysiological conditions. We added this in the discussion section (lines 267-278).

      (3) In Figure 3A, in the legend, the authors mentioned: "black dashed lines indicate hydrogen bonds". No hydrogen bond was noted in the figure.

      We totally corrected Figure 3.

      (4) Figure 3B, it would be helpful for the authors to denote the meaning of the blue-white-red color coding in the legend.

      We removed the figure.

      (5) Supplemental Figure 3: since AF3 was released on May 3rd, it would be awesome in the revision version if the authors would update this to the AF3 model.

      The AF2 model has been replaced with the AF3. (Figure 2–figure supplement 2A-C). The AF2 and AF3 models are almost identical, and they form incorrect disulfide bonds. This confirms the usefulness of the experimental structural determination in this study.

      (6) Supplemental Figure 4: it wasn't clear to me if the expression experiments were repeated multiple times or if there was any statistical analysis for the expression level was done in this study.

      We performed the expression experiment by western blotting once and did not perform statistical analyses. We performed repeated FACS analyses of HEK cells stably expressing N-terminally HA-tagged wild-type or mutant GPR30s to analyze their membrane and whole-cell expressions during revision (Figure 3.–figure supplement 1-3). Using these stable cells, we performed calcium assays using cell lines stably expressing the mutants (Figure 3F, G and Figure 3–figure supplement 1-3).

      (7) Supplemental Figure 4: Also, is there a reason for the authors to compare the expression level of hGPR30 to the housekeeping gene NA-K-ATPase rather than the total loaded protein? Traditionally housekeeping genes have been used as loading controls to semiquantitatively compare the expression of target proteins in western blots. However, numerous recent studies show that housekeeping proteins can be altered due to experimental conditions, biological variability across tissues, or pathologies. A consensus has developed for using total protein as the internal control for loading. An editorial from the Journal of Biological Chemistry reporting on "Principles and Guidelines for Reporting Preclinical Research" from the workshop held in June 2014 by the NIH Director's Office, Nature Publishing Group, and Science stated, "It is typically better to normalize Western blots using total protein loading as the denominator".

      Thank you for your instructive comment. We evaluated western blotting with the same amount of total protein loaded 20 µg for whole-cell lysate and 1.5 µg for cell surface protein (Figure 3.–figure supplement 3C-F).

      Reviewer #3 (Recommendations For The Authors):

      The claim about this disulfide should be removed unless the authors can provide mass spec evidence.

      Thank you for your crucial comments. Firstly, C130 is a residue of TM3, not ECL1, so our misprint has been corrected to C130<sup>3.25</sup>. C207<sup>ECL2</sup>, located at position 45.50, is the most conserved residue in ECL2, and it forms a disulfide bond with cysteine at position 3.25 (PMID: 35113559). The paper was additionally cited regarding the preservation of the bond of C130<sup>3.25</sup>-C207<sup>ECL2</sup> (line 103). Indeed, disruption of this disulfide bond by the C207<sup>ECL2</sup> A mutation resulted in a marked reduction in receptor activity. In addition, the data set was re-analyzed to improve the local resolution of the extracellular region, and it was shown that the density of ECL2 is not noise (Figure 2. ––figure supplement 2). We are confident about the presence of the disulfide bond, based on the structural analysis data and the conservation.

      The highly flexible extracellular region is greatly affected by experimental conditions and ligands, so we speculate that the ECL2 and the disulfide bond was not observed in other reported structures of GPR30. Then, we have added the following content to the discussion, as “In the GPR30 in the presence of bicarbonate, ECL2 was modelled, suggesting differences in structural flexibility.” (lines 256-257).

      The authors should remove the assignment of bicarbonate in the structure, and tone down the binding site of bicarbonate.

      We cancelled the bicarbonate model.

      Minor:

      (1) The potency of bicarbonate for GPR30 is in the mM range. Although the concentration of bicarbonate in the serum can reach mM range, how about its concentration in the tissues? Given its low potency, it may be not appropriate to claim GPR30 is a bicarbonate receptor at this point, but the authors can claim that GPR30 can be activated by or responds to bicarbonate.

      The physiological concentration of bicarbonate is 22-26 mM in the extracellular fluid, including interstitial fluid and blood, and 10-12 mM in the intracellular fluid. Therefore, GPR30 is activated by physiological concentrations of bicarbonate in the tissues. Also, the bicarbonate concentration alters in various physiological and pathological conditions – metabolic acidosis in chronic kidney disease causes a drop to 2-3 mM, and metabolic alkalosis induced by severe vomiting increases HCO3- concentrations more than 30 mM. Thus, our work clearly shows that GPR30 is activated by physiological concentrations of bicarbonate, whether it is localized intracellularly or on the membrane, and that GPR30 can be deactivated or reactivated in various pathophysiological conditions. According to the reasons above, we claim GPR30 is a bicarbonate receptor (lines 267-278).

      (2) The description that there is no consensus on a drug that targets GPR30 is not accurate, since lys05 has been reported as an agonist of GPR30 and their structure is published (PMID: 38744981). The published structures of GPR30 should be introduced in the paper.

      We added the discussion about the structural comparison with the Lys05-bound structure (Figure 6, lines 249-266)

      (3) BW numbers in Figure 4A should be shown.

      We added BW numbers in the figures of the mutational studies.

    1. R0:

      Reviewer #1:

      This sub study was nested in a factorial randomized controlled trial (RCT) in women aged 18–30 years. Participants included in this study were randomized to receive either a preconception intervention package or routine care until early childhood. The design strategy involved a reasonable sample size justification to show superiority. The sample needed for the study objectives was well justified with power considerations. However, the investigators do note that the sample size, while adequate for detecting moderate effect sizes, may have been insufficient to identify smaller but clinically meaningful differences. The descriptives are informative as seen in Tables 1 and 2.

      1. Please define IQR in the footnote of Table 2 or put a descriptive section in the ‘Analysis Plan’ paragraph.

      Generalized linear models (GLMs) with a Gaussian family and identity link function were used to estimate mean differences in CRP, AGP, IGF-1, and IGFBP3 concentrations. To estimate risk ratios for inflammatory status between infants in the intervention and routine care groups, GLMs with a binomial family and log link function were employed. Final models were adjusted for place of birth. There are several considerations needing clarification.

      There are four endpoints. Therefore,

      1. Some consideration of multiple comparison p-value adjustment should have been discussed.

      Also, with respect to model content,

      1. Exactly how was adjustment by birthplace incorporated into the models?

      The overall conclusions follow from the analyses performed and results seen in Table 3. The strengths and limitations are reasonably described in the ‘Discussion’ section. As an added point, however,

      4.There is a gap between the manuscript text and the supplement supporting information proposal Version 2.0. Was there any attempt to explore the mediation analysis discussed in that proposal?

      Reviewer #2:

      1. Overall Assessment This study reports a well-designed randomized controlled trial. It investigates the impact of an integrated intervention on infant biomarkers related to inflammation and growth like CRP, AGP, IGF-1, IGFBP3. The research addresses a significant question in maternal and child health. However, the discussion sections can be improved with detailed explanation on biological plausibility. Also, the implications of this study can be broadly elaborated.
      2. Originality and Relevance The research topic appears to be original and highly relevant. The novelty in this study is integrated interventions across different stages right from preconception to 2 years of early child development. The intervention is policy-relevant and aligns well as per Goal-2 and Goal-4 of SDG-2030. The concept is innovative and similar integrated frameworks are reported in the literature. The specific distinct approach of this study needs to be articulated.
      3. Scientific Rigor and Methodology This randomized controlled design follows standard protocols and manuscript is well-aligned as per CONSORT guidelines. Please elaborate on randomization process, blinding, and control of confounders. The sample size calculations appear to be powered for anthropometric assessments. For biomarker outcomes, sample size calculations need to be refined/justified.
      4. Results and Interpretation The results of this study report no significant differences in biomarkers between intervention and control groups. The null findings can be discussed with possible biological explanations like timing of assessment, nutritional variability, breastfeeding. Subgroup analysis by maternal or infant characteristics can be helpful.
      5. Discussion and Implications There is a scope to elaborate the discussion section by linking the pathways of maternal interventions with infant biomarker responses. Implications of this study for public health, including integration into maternal and child health programs, can be discussed highlighting the need for long-term follow-up.
      6. Presentation and Clarity The manuscript is well-written and well-organized as per required guidelines. However, most of the references are quite older and references from 2022 onwards are missing. More recent Citations can be included from year 2023-2025.
      7. Ethical and Data Considerations All the ethical procedures are described clearly including IEC and CTRI. Data availability through Open Access links is provided.
      8. Conclusion and Recommendation This well-executed trial can be good evidence for understanding the biological outcomes of integrated maternal-child interventions.

      Recommendation: Minor Revision.

      Reviewer #3:

      This study is a secondary analysis of the WINGS factorial randomized controlled trial evaluating the effects of a multidomain, integrated intervention delivered from preconception through early childhood on infant biomarkers of inflammation and growth (CRP, AGP, IGF-1, IGFBP3) at 3 and 6 months of age. This study links the integrated intervention to specific changes in inflammatory and growth-related biomarkers like CRP, AGP, IGF-1 and IGFBP3. The study addressed the biologically relevant and policy-important question related to early-life interventions in low-resource settings The findings indicate no significant differences in these biomarkers between the intervention and control groups, except for a transient decrease in IGFBP3 at 3 months, which was not sustained at 6 months. The authors conclude that while the intervention improved growth outcomes in the parent trial, it did not significantly influence early-life inflammation or IGF axis biomarkers. The manuscript is well-written, clearly articulated and follows the required CONSORT Guidelines. Major Comments 1. Rationale and Framing • Biological rationale connecting integrated maternal–child interventions (nutrition, WASH, psychosocial care) with the specific biomarkers studied (CRP, AGP, IGF-1, IGFBP3), needs clarity • Clarify why these markers and 3- and 6-month time points were selected, especially since primary growth outcomes were reported at 24 months in the main WINGS paper. • A concise conceptual model or figure showing hypothesized pathways could help readers follow the mechanistic logic. 2. Study Power and Sample • The power calculation is based on CRP only. Please justify the adequacy of the sample size for detecting meaningful differences in IGF-1 and IGFBP3, given their biological variability in infancy. • Power calculations are based on LAZ outcomes from the primary WINGS study rather than biomarker data. This needs justification. 3. Statistical Analysis and results • Tables 2 and 3 could be simplified to highlight group comparisons more effectively. • Adjustment only for the place of delivery seems limited. • The author may consider other covariates, such as mothers’ BMI, socioeconomic indicators, or exposure to infections, in the analysis. In case they are intentionally excluded from the analysis, explain their exclusion. • It would be useful to include effect size interpretation (e.g., percentage change or standardized mean difference) to better convey the biological relevance of null findings. 4. Interpretation of Findings • However, cautious interpretation of the null findings is needed. Aspects such as biological plausibility, contextual limitations, and future implications for longitudinal research require further elaboration. • The discussion acknowledges the absence of significant effects, but can be deepened if the authors discuss the following issues o Address low baseline inflammation as a potential ceiling effect. o Note that intervention effects might appear later in life (after 6 months). o Acknowledge that non-inflammatory mechanisms (caregiving, infection prevention, psychosocial stimulation) might explain the positive growth outcomes in the primary trial. • Expand the comparison with similar trials—such as SHINE (Zimbabwe), ELICIT (Tanzania), and MAL-ED studies—that examined inflammation and growth factor pathways. • The trial was conducted in a single urban Indian setting, which limits extrapolation to rural or diverse socioeconomic contexts. The discussion should acknowledge this limitation more explicitly and suggest strategies for replication in varied environments. 5. Policy and Program Implications • The conclusion is based on the non-significant findings of biomarkers. Whereas the short duration of biomarker assessment may oversimplify complex biological processes. More elaborate discussion is needed on possible confounders like infections, duration, and type of breastfeeding.

      Minor Comments 1. Abstract: Conclude with a stronger statement about contribution: e.g., “These findings add to the understanding of biological mechanisms underlying integrated early-life interventions in LMICs.” 2. Tables: Present only adjusted results in the main text; unadjusted data may be submitted as supplementary files. Ensure all tables include units (mg/L, ng/mL) and consistent decimal formatting. 3. CONSORT Diagram: Please include the number of exclusions, losses to follow-up, and reasons for non-participation in Figure 1 for transparency. 4. Discussion: Add a short note acknowledging that biomarker variability in early infancy is high and may obscure subtle intervention effects. 5. References: Consider citing more recent literature (published within the last 3 years) that links microbiome–inflammation–growth relationships in infants. 6. Language and Formatting: Ensure consistency in abbreviations (e.g., IGFBP3 vs IGF-BP3). Use consistent phrasing for “preconception, pregnancy, and early childhood interventions, growth-related biomarkers, and growth factor profiles” throughout.

      Overall Recommendations: Minor–to–Moderate Revision This is a robust, well-implemented study addressing an important mechanistic question within global child health. Although the results are null, they offer valuable insights into early-life biology and integrated program evaluation. Strengthening the biological framing, contextual discussion, and presentation of adjusted analyses will substantially enhance the manuscript’s impact and readability.

    1. Synthèse de la Séance Plénière du Conseil Économique, Social et Environnemental

      Résumé

      La séance plénière du Conseil économique, social et environnemental (CESE) s'est articulée autour de deux axes majeurs :

      l'examen et l'adoption unanime d'un avis crucial sur les droits et les besoins fondamentaux de l'enfant,

      et une série d'interventions sur des sujets d'actualité reflétant les préoccupations de la société civile.

      L'avis intitulé "Satisfaire les besoins fondamentaux des enfants et garantir leurs droits dans tous les temps et espaces de leur vie quotidienne" a été adopté à l'unanimité (130 voix pour).

      Conçu en complément des travaux de la Convention Citoyenne sur le même sujet, cet avis dresse un constat sévère de la situation des enfants en France, marquée par des inégalités croissantes (sociales, territoriales, économiques) et un décalage persistant entre les droits proclamés et leur application réelle. Le document met en lumière une société pensée "par et pour les adultes", qui peine à placer l'enfant au cœur de ses préoccupations.

      Les préconisations phares incluent l'instauration d'une "clause impact enfance" dans chaque texte de loi, une réforme ambitieuse des rythmes scolaires, la garantie d'un accès équitable aux loisirs et aux vacances, et la création d'un "service public de la continuité éducative" pour coordonner l'ensemble des acteurs.

      L'intervention de Claire Hédon, Défenseure des droits, a renforcé ce diagnostic par des données chiffrées alarmantes sur les atteintes aux droits de l'enfant, notamment pour les plus vulnérables.

      En amont de ce débat, la séance d'expression libre a permis d'aborder des enjeux variés :

      • la remise en cause de la légitimité de la participation citoyenne,

      • les coupes drastiques dans l'aide publique au développement,

      • les menaces sur le système de santé,

      • la dérégulation environnementale au niveau européen, les dangers des nouveaux OGM,

      • la hausse des accidents du travail,

      • la pression exercée sur les demandeurs d'emploi,

      • et les appels à une souveraineté alimentaire concrète.

      Enfin, la présentation du budget du CESE a révélé une situation financière tendue, marquée par une baisse des dotations de l'État et menacée par de nouvelles coupes potentielles votées par le Sénat, mettant en péril la capacité de l'institution à mener ses missions, notamment l'organisation de futures conventions citoyennes.

      I. Session d'Expression Libre : Un Panorama des Préoccupations Sociétales

      Avant l'examen de l'avis sur l'enfance, plusieurs intervenants ont exprimé les préoccupations de leurs groupes respectifs sur des sujets d'actualité.

      Défense de la Participation Citoyenne (Agatha Mel) :

      Au nom des organisations étudiantes, une défense de la Convention Citoyenne sur les temps de l'enfant a été formulée, dénonçant les "procès d'illégitimité, d'incompétence et de manipulation" et appelant à un débat sérieux sur le fond du rapport, sans caricaturer le travail des citoyens.

      Aide Publique au Développement (Jean-Marc Boivin) :

      Le groupe des associations a alerté sur les coupes "drastiques et disproportionnées" (-60 % en 2 ans) dans le budget de l'aide publique au développement, entraînant la fermeture de 1300 projets, la suppression de 10 000 emplois et impactant plus de 15 millions de personnes.

      Impact sur la Santé (Dominique Joseph) :

      La Mutualité Française a qualifié d'irresponsable l'augmentation de la taxe sur les complémentaires santé, la qualifiant de "TVA sur la santé", et a souligné la nécessité d'une réforme de fond du système de protection sociale.

      Dérégulation Environnementale (Florent Compnibus) :

      Le groupe environnement a dénoncé le projet législatif européen "Omnibus" comme une "dérégulation massive" et un "abandon pur et simple du principe de précaution", instaurant des autorisations illimitées pour les pesticides et biocides et affaiblissant le devoir de vigilance des entreprises.

      Opposition aux Nouveaux OGM (Éric Meer) :

      Le groupe alternative sociale et écologique a critiqué l'accord européen sur les nouvelles techniques génomiques (NGT), y voyant une "fuite en avant technologique" qui favorise le brevetage, la dépendance des paysans et prive les consommateurs de traçabilité.

      Accidents du Travail (Ingrid Clément) :

      La CFDT a qualifié 2024 d'"année noire" avec 774 décès au travail (deux par jour), une augmentation de 26 % des accidents pour les femmes, et une hausse des troubles musculosquelettiques et des affections psychiques, appelant à renforcer la prévention primaire.

      Pression sur les Demandeurs d'Emploi (Isabelle Dor) :

      Le groupe des associations a relayé des témoignages de personnes suivies par France Travail décrivant "infantilisation", "pression folle" et menaces de radiation, illustrant des situations qualifiées d'ubuesques pour les bénéficiaires du RSA et les travailleurs pauvres.

      Soutien à la Solidarité Syndicale (Alain le corps) :

      La CGT a dénoncé la mise en examen de sa secrétaire générale, Sophie Binet, pour avoir utilisé l'expression "les rats quittent le navire", affirmant qu'il s'agit "non pas une injure, mais le constat amer d'un comportement irresponsable".

      Souveraineté Alimentaire (Henriespéré) :

      Le groupe de l'agriculture a relayé les propos de la ministre sur la "guerre agricole" qui se prépare, appelant à passer "des discours aux actes" pour relancer les filières agricoles françaises via l'innovation et la réciprocité des normes.

      II. L'Avis du CESE sur les Besoins et les Droits Fondamentaux de l'Enfant

      Le cœur de la séance a été consacré à l'avis "Satisfaire les besoins fondamentaux des enfants et garantir leurs droits", élaboré par la commission éducation, culture et communication.

      Cet avis constitue la contribution de la société civile organisée en parallèle de la Convention Citoyenne sur les temps de l'enfant, saisie par le Premier ministre.

      A. Le Discours de la Défenseure des Droits (Claire Hédon)

      En introduction, Claire Hédon, Défenseure des droits et des enfants, a livré une intervention dense, soulignant l'écart entre le "droit annoncé et son effectivité".

      Volume des Saisines : L'institution a reçu 3 073 réclamations relatives à des atteintes aux droits de l'enfant en 2024. 30 % de ces réclamations concernent la scolarisation d'élèves en situation de handicap.

      Consultation des Enfants : Pour préparer son rapport 2025, plus de 1 600 enfants et jeunes ont été écoutés, soulignant l'importance de leur parole "trop souvent absente du débat public".

      Accès aux Loisirs : Un chiffre marquant illustre les inégalités massives : 71 % des enfants issus de familles modestes ne pratiquent aucune activité sportive ou culturelle, contre seulement 38 % des familles aisées.

      La situation est encore plus critique en Outre-mer, où les équipements sont quatre fois moins nombreux qu'en métropole à Mayotte.

      Temps d'Écran : Le temps passé devant les écrans augmente fortement, atteignant en moyenne 4h48 par jour chez les 11-14 ans (hors école) et jusqu'à 5h10 chez les 16 ans, avec des conséquences graves sur le sommeil et la santé mentale.

      Droit à l'Éducation : La Défenseure a alerté sur les heures d'enseignement perdues, citant le cas d'élèves de CP à Marseille sans cours pendant un mois, et le chiffre de 27 000 jeunes sans affectation au lycée début 2024 sur tout le territoire.

      Impact Climatique : Le réchauffement climatique menace la continuité du service public de l'éducation.

      D'ici 2030, près de 7 000 écoles maternelles seront exposées à des vagues de chaleur supérieures à 35°C.

      B. Présentation du Projet d'Avis par la Commission

      Les rapporteurs ont présenté un projet d'avis structuré autour d'un principe fondamental : l'enfant est une personne à part entière.

      Le fil rouge de l'analyse est un triptyque : droits de l'enfant, satisfaction de ses besoins et lutte contre les inégalités.

      Constats et Enjeux Majeurs

      Des Droits Peu Effectifs : Malgré la ratification de la Convention internationale des droits de l'enfant, la réalité quotidienne est marquée par des droits non respectés, comme le soulignent les rapports de l'ONU et de la Défenseure des droits.

      Des Inégalités Croissantes : Les inégalités sociales, économiques, territoriales et environnementales percutent de plein fouet la vie des enfants.

      34,3 % des familles monoparentales vivent en situation de pauvreté.

      À la veille de la rentrée 2025, au moins 2 159 enfants sont restés sans solution d'hébergement.

      Une Société "Adulto-centrée" : L'organisation sociale, notamment les rythmes de travail et les temps scolaires, est pensée pour les adultes, laissant peu de place aux besoins biologiques et psychologiques des enfants.

      L'Enfant "de l'intérieur" : En 20 ans, le périmètre de déplacement autonome des enfants a chuté de plusieurs kilomètres à moins de 300 mètres.

      Quatre enfants sur 10 (3-10 ans) ne jouent jamais dehors pendant la semaine.

      Préconisations Clés

      L'avis formule 19 préconisations pour répondre à ces enjeux. Les plus structurantes sont :

      Thématique

      Préconisation Phare

      Description

      Gouvernance et Législation

      Créer une clause "impact enfance"

      Intégrer dans l'évaluation de chaque projet de loi ou de règlement une analyse de ses conséquences sur les droits et le bien-être des enfants.

      Temps Scolaire

      Affirmer que le statu quo n'est plus tenable

      Appeler à revoir l'organisation des journées et des semaines scolaires, en préconisant une alternance de 7 semaines de cours et 2 semaines de vacances, tout en maintenant 8 semaines l'été.

      Droit aux Vacances et Loisirs

      Garantir un accès équitable pour tous

      Développer une information ciblée, mettre en place une tarification sociale et soutenir financièrement les structures d'accueil collectif pour lutter contre les inégalités d'accès.

      Lien à la Nature

      Valoriser et accompagner l'éducation "au dehors"

      Déployer des aménagements tels que la végétalisation des cours d'école, les aires éducatives et les plans locaux d'éducation à la nature pour reconnecter les enfants à leur environnement.

      Coordination des Acteurs

      Créer un service public de la continuité éducative

      Articuler les outils existants (PEDT, CTG) pour garantir à chaque enfant un accès à des temps éducatifs variés, cohérents et de qualité, en mobilisant l'ensemble des acteurs (école, familles, associations, collectivités).

      Parentalité et Travail

      Créer un droit attaché aux obligations parentales

      Transposer la directive européenne sur l'équilibre vie pro/vie perso pour permettre aux parents de recourir à des formules souples de travail.

      Financement

      Assurer un effort budgétaire conséquent et pérenne

      Reconnaître l'éducation comme un investissement d'avenir et non comme une simple dépense, en garantissant les moyens nécessaires à l'État, la Sécurité sociale et aux collectivités pour mener des politiques publiques ambitieuses.

      C. Réception et Adoption de l'Avis

      L'ensemble des groupes politiques et de la société civile présents au CESE ont salué la qualité et l'ambition de l'avis.

      Les déclarations ont convergé sur le diagnostic des inégalités croissantes et la nécessité d'une action politique forte.

      Le projet d'avis a été adopté à l'unanimité des 130 votants.

      En complément, la députée Florence Erroin-Léoté a annoncé son intention de porter une proposition de loi sur le droit au loisir des enfants, s'appuyant sur les travaux de la Convention Citoyenne et du CESE pour faire du temps libre un "lieu éducatif, de mixité, d'émancipation et de démocratie vivante".

      III. Le Budget du CESE : Enjeux et Vulnérabilités

      La séance s'est conclue par la présentation du budget du CESE, qui a mis en lumière une situation financière préoccupante.

      Contexte de Pression Budgétaire : Le président a rappelé qu'au même moment, le Sénat votait une baisse de 5 millions d'euros du budget du CESE, contre l'avis de sa propre commission des finances et du gouvernement.

      Baisse des Recettes : Le budget présenté montre une érosion continue des recettes, notamment la fin de la dotation spécifique de 4 millions d'euros pour l'organisation des conventions citoyennes.

      De plus, les travaux de rénovation du Palais d'Iéna vont priver le CESE d'environ 1,6 million d'euros de recettes de valorisation (location d'espaces) en 2026.

      Un Budget 2026 à l'Équilibre Fragile : Le budget pour 2026 est présenté comme étant à l'équilibre, mais cet équilibre est atteint en n'incluant pas le financement d'une nouvelle convention citoyenne et en réduisant certains postes comme la communication.

      Incapacité à Financer de Nouvelles Missions : Le questeur a été clair : "en l'état, [...] on est demain incapable de refaire une convention citoyenne à 4 millions d'euros".

      L'organisation de telles missions dépendra désormais de la capacité du CESE à obtenir des financements ad hoc auprès du gouvernement pour chaque commande.

      Investissement Immobilier Massif : La présentation a souligné que les réserves de trésorerie accumulées sont désormais engagées dans un plan pluriannuel d'investissement indispensable pour la rénovation du bâtiment, rattrapant des décennies de sous-investissement.

    1. Dossier d'Information : L'Impact du Smartphone et de l'IA sur l'Adolescence

      Résumé

      Cette synthèse examine l'analyse de l'anthropologue David Le Breton sur les transformations profondes induites par l'omniprésence du smartphone et de l'intelligence artificielle (IA) dans la vie des adolescents.

      Le constat central est celui d'une rupture anthropologique majeure, marquée par le remplacement de la "conversation" – un échange incarné, empathique et réciproque – par la "communication" numérique, une interaction désincarnée, utilitariste et source d'isolement.

      Les points critiques à retenir sont :

      La Fin de la Conversation : L'interaction en face à face est constamment rompue par les notifications, dévalorisant la présence physique au profit d'un univers virtuel.

      Cette fragmentation du lien social direct entraîne une érosion documentée de l'empathie chez les jeunes générations.

      L'Ascension du Compagnon IA : Pour combler le vide affectif et social, les adolescents se tournent vers des chatbots, des "compagnons secrets" virtuels qui offrent une attention constante et sans jugement.

      Cette relation, bien que narcissiquement rassurante, amplifie l'isolement et transforme l'utilisateur en produit, ses données étant captées et valorisées.

      Des Conséquences Cognitives et Physiques Sévères : L'exposition massive aux écrans est corrélée à un affaiblissement des capacités de concentration, de lecture approfondie et de pensée critique.

      Elle favorise une sédentarité accrue, entraînant des problèmes de santé (douleurs cervicales, myopie) et une baisse drastique de l'activité physique par rapport aux générations précédentes.

      Une Crise de Santé Mentale Planétaire : David Le Breton, s'appuyant sur de multiples travaux, établit un lien direct entre l'explosion de l'anxiété, de la dépression, des tentatives de suicide et des scarifications chez les adolescents depuis 2010 et l'adoption généralisée du smartphone connecté à Internet.

      Enjeux Sociétaux et Éthiques : Au-delà de l'individu, l'analyse pointe vers une homogénéisation culturelle mondiale ("MacWorld"), la vulnérabilité accrue aux fausses nouvelles, et les graves implications éthiques et environnementales de la technologie (travail des enfants, exploitation de métaux rares, pollution des data centers).

      En conclusion, loin d'être un simple outil, le smartphone dopé à l'IA façonne une nouvelle anthropologie où la simulation du lien supplante l'expérience réelle, avec des conséquences délétères sur le développement individuel et la cohésion sociale.

      --------------------------------------------------------------------------------

      1. Contexte de l'Analyse

      La présente analyse se fonde sur les propos de David Le Breton, professeur émérite d'anthropologie à l'Université de Strasbourg, reconnu pour ses travaux sur les conduites à risque, le corps, et plus récemment sur le ralentissement et la marche.

      Son intervention s'inscrit dans une réflexion plus large sur la santé mentale des jeunes et l'impact de l'intelligence artificielle (IA) sur la société.

      2. La Rupture Anthropologique : L'Avant et l'Après Smartphone

      David Le Breton postule qu'une rupture anthropologique fondamentale a eu lieu autour des années 2008-2009 avec l'avènement de l'Internet à haut débit sur les smartphones.

      Ce changement a transformé radicalement l'espace public et les interactions humaines.

      Une "Société Spectrale" : Les villes sont désormais "hantées par des espèces de fantômes qui sont hypnotisés par leur téléphone portable et qui ne voient plus rien du tout à leur entour".

      Perte d'Attention à l'Environnement : Cet état d'hypnose crée des dangers physiques (piétons et cyclistes inattentifs) et sociaux, car l'attention n'est plus portée à l'environnement immédiat ou aux autres personnes présentes.

      Le Monde d'Avant : Il y a une vingtaine d'années, le monde était radicalement différent.

      Même avec les premiers téléphones portables, l'attention au monde environnant n'était pas abolie comme elle l'est aujourd'hui par l'hypnose de l'écran du smartphone.

      3. Distinction Fondamentale : Conversation contre Communication

      Le cœur de l'analyse de Le Breton repose sur une distinction anthropologique essentielle entre deux modes d'interaction.

      Caractéristique

      La Conversation

      La Communication (numérique)

      Cadre

      Visage à visage, présence physique.

      À distance, anonymat fréquent.

      Corps

      Central (mimiques, expressions, gestes).

      Absent, désincarné.

      Temporalité

      Imprévisible, inclut le temps du silence et de la réflexion.

      Urgence, efficacité, utilitarisme. Le silence est perçu comme une "panne".

      Qualité du lien

      Écoute, attention, empathie, réciprocité.

      Centrée sur soi, instrumentale.

      David Le Breton cite son propre ouvrage pour souligner ce point :

      La conversation à l'implique de l'empathie c'est-à-dire une capacité à se mettre à la place de l'autre et à ne pas être étranger à ses ressentis.

      Cette qualité disparaît dans la communication à distance [...] l'autre se transforme alors en fiction sans épaisseur.

      4. Données Clés sur le Temps d'Écran

      L'intervention initiale d'Axel fournit des chiffres qui contextualisent l'ampleur du phénomène, basés notamment sur un rapport de l'ARCOM d'avril 2025.

      Catégorie d'Âge

      Temps d'Écran en 2011

      Temps d'Écran en 2022/récent

      1-6 ans

      1h 47min

      2h 03min

      7-12 ans

      2h 51min

      4h 12min

      13-19 ans

      4h 20min

      5h 10min

      15-24 ans

      (non spécifié)

      5h 48min (dépasse les 50-64 ans)

      50-64 ans

      (non spécifié)

      5h 27min (principalement TV en direct)

      Ces données montrent une augmentation astronomique du temps passé devant les écrans en une décennie, les jeunes de 15-24 ans étant désormais les plus grands consommateurs, principalement via le smartphone. Pour certains adolescents, ce temps peut dépasser les dix heures par jour.

      5. L'Adolescent et le Compagnon Virtuel (IA)

      Face à un lien social qui s'effrite et à une désertion affective des proches, l'IA, via les chatbots, offre une solution de substitution qui devient un phénomène central de l'adolescence contemporaine.

      Le "Doudou de Substitution" : L'IA permet de fabriquer un "compagnon secret fictionnel" pour combler un manque affectif.

      Le jeune programme ce personnage virtuel (nom, voix, personnalité) pour en faire un interlocuteur idéal.

      Un Bouclier de Sens : Le chatbot est toujours disponible, bienveillant, sans jugement, et procure un sentiment de maîtrise et de reconnaissance.

      Il devient un "bouclier de sens pour conjurer les désarrois, les souffrances".

      L'Illusion de la Réciprocité : L'adolescent interagit avec le chatbot comme avec une personne réelle, oubliant qu'il s'agit d'un programme conçu pour capter ses données et le maintenir connecté le plus longtemps possible.

      La Violence de l'Indifférence : Cette quête d'attention virtuelle naît souvent d'un manque d'attention réelle, illustré par l'anecdote poignante d'une petite fille disant à son père hypnotisé par son portable :

      Papa je veux que tu m'écoutes avec les yeux.

      6. Conséquences sur le Lien Social et l'Érosion de l'Empathie

      L'hyper-connexion paradoxalement génère un isolement profond et une dégradation des compétences sociales.

      La Liquidation de l'Interlocuteur : La présence physique d'un ami ou d'un parent est immédiatement "liquidée" dès qu'une notification apparaît.

      L'interlocuteur réel a "moins d'épaisseur ontologiquement que les autres virtuels".

      La Simulation du Lien : Les "centaines d'amis" des réseaux sociaux ne valent pas un ou deux amis réels capables d'un geste de réconfort physique.

      La communication numérique simule le lien social mais ne crée ni intimité ni raisons de vivre.

      Le Déclin de l'Empathie : Une étude menée par la sociologue Sherry Turkle sur 14 000 étudiants sur 30 ans montre que depuis les années 2000, "les jeunes témoignent d'un moindre intérêt pour les autres".

      Les auteurs de l'étude établissent un lien direct entre ce retrait de l'empathie et la croissance de l'accès aux jeux en ligne et aux réseaux sociaux.

      7. Impacts Cognitifs, Physiques et Comportementaux

      La surexposition aux écrans et la délégation de la pensée à l'IA ont des effets directs et mesurables sur le développement des jeunes.

      7.1. Impacts Cognitifs

      Difficulté de Lecture : La communication "synchopée, simple, permanente, ultra rapide" rend difficile la lecture de textes longs et élaborés, y compris des SMS de plus de quelques phrases.

      Faible Culture Générale : La croyance que toute information est accessible en un clic décourage l'apprentissage en profondeur.

      Les étudiants "peinent à lire simplement quelques pages d'un article ou d'un livre".

      Apprentissage de la Passivité : Le recours systématique à l'IA pour obtenir des réponses immédiates (ex: ChatGPT pour un devoir) empêche le développement de la recherche personnelle, de la nuance et de la pensée critique.

      Externalisation de la Mémoire : L'usage du clavier et la possibilité de tout retrouver en ligne affaiblissent la mémorisation, qui est un processus affectif et contextuel, et non un simple stockage d'informations.

      7.2. Impacts Physiques et Comportementaux

      Sédentarité Extrême : Une recherche du médecin William Bird montre qu'en quelques décennies, la distance parcourue par un enfant de 8 ans autour de son domicile est passée de 9 km à 300 mètres.

      Baisse des Performances Physiques : Les adolescents des années 70 étaient "deux fois plus actifs". Un 800 mètres qui se courait en 3 minutes en prend aujourd'hui 4.

      Problèmes de Santé : Le développement planétaire des douleurs cervicales et dorsales, ainsi que de la myopie, est directement lié à la posture penchée sur l'écran.

      8. La Crise de la Santé Mentale Adolescente

      David Le Breton conclut son analyse sur un bilan humain alarmant, établissant une corrélation temporelle forte entre la généralisation du smartphone et l'explosion des troubles psychiques chez les jeunes à partir de 2010.

      En se référant aux travaux du psychologue Jonathan Haidt ("Génération anxieuse"), il affirme que jamais dans l'histoire on n'a connu une telle ampleur de souffrances adolescentes :

      Anxiété et Dépression

      Sentiment d'Isolement

      Tentatives de Suicide et Suicides

      Scarifications (particulièrement chez les filles)

      Cette crise est également visible chez les tout-petits, avec des retards de langage chez des enfants surexposés aux écrans, privés des interactions parentales cruciales à leur développement.

      9. Enjeux Éthiques, Culturels et Environnementaux

      L'impact du smartphone et de l'IA dépasse la sphère individuelle pour toucher l'ensemble de la société.

      Manipulation et Harcèlement : L'IA permet de créer facilement des "deepfakes" ou "deepnudes" pour humilier, discréditer ou faire chanter des individus, les adolescentes étant des victimes fréquentes.

      Homogénéisation Culturelle ("MacWorld") : Les technologies créent une culture mondiale unifiée par les mêmes films, musiques, séries et modes de consommation, liquidant les cultures locales et les savoir-faire traditionnels.

      Hypocrisie de la Silicon Valley : Les dirigeants des géants du numérique protègent leurs propres enfants des technologies qu'ils promeuvent, en les inscrivant dans des écoles (ex: Waldorf) où le numérique est banni, conscients de ses dangers.

      Impacts Environnementaux et Géopolitiques : Le numérique a une empreinte écologique massive (data centers, consommation d'énergie) et repose sur l'exploitation de métaux rares, alimentant des conflits géopolitiques et le travail d'enfants dans certains pays.

      Ces aspects sont souvent occultés dans les débats sur le climat.

      10. Conclusion et Posture de l'Analyste

      David Le Breton insiste sur le fait que son analyse n'est pas celle d'un "moraliste" mais celle d'un sociologue et anthropologue qui observe et documente une réalité.

      Son travail vise à pointer des faits observables et documentés par de nombreuses études, soulignant que jamais dans l'histoire le lien social n'a été aussi "abîmé".

      Le monde hyper-connecté a coïncidé avec le début de "l'hyperindividualisation de nos sociétés", menant au paysage social et psychologique actuel.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a new Bayesian approach to estimate importation probabilities of malaria, combining epidemiological data, travel history, and genetic data through pairwise IBD estimates. Importation is an important factor challenging malaria elimination, especially in low-transmission settings. This paper focuses on Magude and Matutuine, two districts in southern Mozambique with very low malaria transmission. The results show isolation-by-distance in Mozambique, with genetic relatedness decreasing with distances larger than 100 km, and no spatial correlation for distances between 10 and 100 km. But again, strong spatial correlation in distances smaller than 10 km. They report high genetic relatedness between Matutuine and Inhambane, higher than between Matutuine and Magude. Inhambane is the main source of importation in Matutuine, accounting for 63.5% of imported cases. Magude, on the other hand, shows smaller importation and travel rates than Matutuine, as it is a rural area with less mobility. Additionally, they report higher levels of importation and travel in the dry season, when transmission is lower. Also, no association with importation was found for occupation, sex, and other factors. These data have practical implications for public health strategies aiming for malaria elimination, for example, testing and treating travelers from Matutuine in the dry season.

      Strengths:

      The strength of this study lies in the combination of different sources of data - epidemiological, travel, and genetic data - to estimate importation probabilities, and the statistical analyses.

      Weaknesses:

      The authors recognize the limitations related to sample size and the biases of travel reports.

      We appreciate the review and comment about the manuscript.

      Reviewer #2 (Public review):

      Summary:

      Based on a detailed dataset, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired.

      Strengths:

      The proposed Bayesian approach for case classification is simple, well justified, and allows the integration of parasite genomics, travel history, and epidemiological data. The work is well-written, very organized, and brings important contributions both to malaria control efforts in Mozambique and to the scientific community. Understanding the origin of cases is essential for designing more effective control measures and elimination strategies.

      Weakness:

      While the authors aim to classify cases as imported or locally acquired, the work lacks a quantification of the contribution of each case type to overall transmission.

      The method presented here allows for classifying individual cases according to whether the infection occurred locally or was imported during a trip. By definition, it does not look to secondary infections after an importation event. Our next step is to conduct outbreak investigation to quantify the impact of importation events on the overall transmission, but this activity goes beyond the scope of this manuscript. We clarify this in the discussion section.

      The Bayesian rationale is sound and well justified; however, the formulation appears to present an inconsistency that is replicated in both the main text and the Supplementary Material.

      Thank you for pointing out the inconsistency in the final formula. In fact, the final formula corresponds to P(IA | G), instead of P(IA), so:

      instead of

      We have now corrected this error in the new version of the manuscript.

      Reviewer #3 (Public review):

      The authors present an important approach to identify imported P. falciparum malaria cases, combining genetic and epidemiological/travel data. This tool has the potential to be expanded to other contexts. The data was analyzed using convincing methods, including a novel statistical model; although some recognized limitations can be improved. This study will be of interest to researchers in public health and infectious diseases.

      Strengths:

      The study has several strengths, mainly the development of a novel Bayesian model that integrates genomic, epidemiological, and travel data to estimate importation probabilities. The results showed insights into malaria transmission dynamics, particularly identifying importation sources and differences in importation rates in Mozambique. Finally, the relevance of the findings is to suggest interventions focusing on the traveler population to help efforts for malaria elimination.

      Weaknesses:

      The study also has some limitations. The sample collection was not representative of some provinces, and not all samples had sufficient metadata for risk factor analysis, which can also be affected by travel recall bias. Additionally, the authors used a proxy for transmission intensity and assumed some conditions for the genetic variable when calculating the importation probability for specific scenarios. The weaknesses were assessed by the authors.

      We acknowledge the limitations commented by the reviewer. We have the following plans to address the limitations. We will repeat the study for our data collected in 2023, which this time contains a good representation of all the provinces of Mozambique, and completeness of the metadata collection was ensured by implementing a new protocol in January 2023. Regarding the proxy for transmission intensity, we will refine the model by integrating monthly estimates of malaria incidence (previously calibrated to address testing and reporting rates) from the DHIS2 data, taking also into account the date of the reported cases in the analysis.

      Reviewing Editor Comments:

      The reviewers have made specific suggestions that could improve the clarity and accuracy of this report.

      Reviewer #1 (Recommendations for the authors):

      (1) Abstract, lines 36, 37 and 38: "Spatial genetic structure and connectivity were assessed using microhaplotype-based genetic relatedness (identity-by-descent) from 1605 P. falciparum samples collected (...)", but only 540 samples were successfully sequenced, therefore used in spatial genetic structure and connectivity analysis.

      The 540 samples refer to those from Maputo province and are described in Fig. 1. The Spatial and connectivity analyses also included the samples from the rest of the provinces from the multi-cluster sampling scheme. Sample sizes from these provinces are described in Suppl. Table 2, and the total between them and the 540 samples from Maputo are the 1605 samples mentioned in the abstract. We specify this number in the caption of Sup. Fig. 4, and add it now into Fig. 3

      (2) In the Introduction, some epidemiological context about Magude and Matutuine could be added. It is only mentioned in the Discussion section (lines 265-269).

      We have added some context about both districts in the introduction now.

      (3) In the Discussion, lines 241-244, could the lack of structure mean no barriers for gene flow due to high mobility in short distances? Maybe it could only be resolved with a large number of samples.

      This could be an explanation (we mention it in the new version), although it is not something we can prove, or at least in this study.

      Reviewer #2 (Recommendations for the authors):

      The work is well written, very organized, and brings important contributions both to malaria control efforts in Mozambique and to the scientific community. Based on detailed datasets from Mozambique, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired. Understanding the origin of cases is essential for designing more effective control measures and elimination strategies. My review focuses on the Bayesian approach as well as on a few aspects of the presentation of results.

      The authors combine travel history, parasite genetic relatedness, and transmission intensity from different areas to compute the probability of infection occurring in the study area, given the P. falciparum genome. The Bayesian rationale is sound and well justified; however, the formulation appears to present an inconsistency that is replicated in both the main text and the Supplementary Material. According to Bayes' Rule:

      P(I_A |G) = (P(I_A) ∙ P(G|I_A)) / (P(G)),

      with

      P(I_A) = K ∙ T_A ∙ PR_A,

      P(G│I_A) = R'_A,

      and assuming

      P(I_A│G) + P(I_B│G) = 1,

      the expression,

      (T_A ∙ PR_A ∙ R'_A) / (T_A ∙ PR_A ∙ R'_A + T_B ∙ PR_B ∙ R'_B)

      appears to refer to P(I_A│G), not to P(I_A) (as indicated in the main text and Supplementary Material).

      P(I_A│G) + P(I_B│G) = (P(I_A) ∙ P(G|I_A) + P(I_B) ∙ P(G|I_B)) / P(G) = 1

      ⇒P(G) = P(I_A) ∙ P(G|I_A) + P(I_B) ∙ P(G|I_B)

      ⇒P(G) = K ∙ T_A ∙ PR_A ∙ R'_A + K ∙ T_B ∙ PR_B ∙ R'_B

      ⇒P(I_A│G) = (T_A ∙ PR_A ∙ R'_A) / (T_A ∙ PR_A ∙ R'_A + T_B ∙ PR_B ∙ R'_B)

      Please clarify this.

      As mentioned in a previous comment, we acknowledge this point from the reviewer.  In fact, the final formula corresponds to P(IA | G), instead of P(IA), so:

      instead of

      We have now corrected this error in the new version of the manuscript and in the supplementary information.

      Additional comments:

      (1) Figure 3A has a scale that includes negative values, which is not reasonable for R.

      We agree that R estimates are not compatible with negative values. The intention of this scale was to show the overall mean R in the centre, in white, so that blue colours represented values below the average and red values above the average. However, we proceeded to update the figures according to your recommendations.

      (2) I suggest using a common scale from 0 to 0.12 (maximum values among panels) across panels A, C, and D, as well as in Sup Fig 3, to facilitate comparison.

      We updated the figures according to the recommendations.

      (3) The x-axis labels in Figure 3A and Supplementary Figure 2A are not aligned with the x-axis ticks.

      We updated the figures so that the alignment in the x-axis is clear.

      (4) Supplementary Figure 5 would be better presented if the data were divided into four separate panels.

      We have divided the figure into four separate panels.

      (6) Figure 5D is not referenced in the main text.

      We missed the mention, which is now fixed in the new version.

      (7) The authors state: "No significant differences in R were found comparing parasite samples from Magude and the rest of the districts." However, Supplementary Figure 3 shows statistically significant relatedness between parasites from Magude and Matutuine. Please clarify this.

      Answer: we added clarity to this sentence which was indeed confusing.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: More background info about malaria in Mozambique would be appreciated.

      We included some contextualisation about malaria in Mozambique and our study districts.

      (2) Why were most of the samples collected from children? Is malaria most prevalent in that group? Information could be added in the introduction.

      Children are usually considered an appropriate sentinel group for malaria surveillance for several reasons. First, most malaria cases reported from symptomatic outpatient visits are children, especially in areas with moderate to high burden. Second (and probably the cause for the first reason), their lower immunity levels, due to lower time of exposure, and their immature system, provides a cleaner scenario of the effects of malaria, since the body response is less adapted from past exposures. Finally, as a vulnerable population, they deserve a stronger focus in surveillance systems. We added a comment in the introduction referring to them as a common sentinel group for surveillance.

      (3) Minor: Check spaces in the text (for example, line 333 and the start of the Discussion).

      Thank you for noticing, we fixed in in the new version

      (4) Minor: In my case, the micro (u) symbol can be observed in Word, but not in PDF.

      One of the symbols produced an error, we hope that the new version is correct now.

      (5) Were COI calculations with MOIRE performed across provinces and regions, or taking all samples as one population?

      Wwe took all samples as one population. However, we validated that the same results (reaching equivalent numbers and the same conclusions) were obtained when run across different populations (regions or provinces). We mention this in the manuscript now.

      (6) Have you tested lower values than 0.04 for PR in Maputo?

      This would not have had any impact in the classification. Only two individuals reported a trip to Maputo city (where we assumed PR=0.04), and none of them were classified as imported. If lower values of PR were assumed, their probabilities of importation would have reduced, so that we would still obtain no imported cases.

      (7) Map (Supplementary Figure 1): Please, improve the resolution (like in the zoom in) and add a scale and a compass rose.

      We improved the resolution of the map. We did not add a scale and a compass rose, but labelled the coordinates as longitude and latitude to clarify the scale and orientation of the map. We added this in the rest of the maps of the manuscript as well.

      (8) In this work, Pimp values were bimodal to 0 or 1, making the classification easy. I wonder in other scenarios, where Pimp values are more intermediate (0.4-0.6), is the threshold at 0.5 still useful? Is there another way, like having a confidence interval of Pimp, to ensure the final classification? A discussion on this topic may be appreciated.

      In this case, we would recommend doing probabilistic analyses, keeping the probability of being imported as the final outcome, and quantifying the importation rates from the weighted sum of probabilities across individuals. We added this clarification in the Methods section: “ In case of obtaining a higher fraction of intermediate values (0.4-0.6), weighted sums of individual probabilities would be more appropriate to better quantify importation rates.”

      (9) Results: More details per panel, not as the whole figure (Figure 2B, Figure 3A, etc) in the manuscript would be appreciated.

      We appreciate the comment and added more details

      (10) Figure 3: Please, add a color legend in panel B (not only in the caption, but in the panel, such as in A, C, D).

      We added a color legend in panel B.

      (11) Do the authors recommend routine surveillance to detect importation in Mozambique, or are these results solid enough to propose strategies? How possible is it that importation rates vary in the future in the south? If so, how feasible is it to implement all this process (including the amplicon sequencing) routinely?

      We added the following text in the discussion: “While these results propose programmatic strategies for the two study districts, routine surveillance to detect importation in Mozambique would allow for identifying new strategies in other districts aiming for elimination, as well as monitoring changes in importation rates in Magude and Matutuine in the future. If scaling molecular surveillance is not feasible, travel reports could be integrated in the routing surveillance to extrapolate the case classification based on the results of this study. “

      (12) Which other proxies of transmission intensity could have been used?

      Better proxies of transmission intensity could be malaria incidence at the monthly level from national surveillance systems, or estimates of force of infection, for example from the use of molecular longitudinal data if available. We added this text in the discussion.

      (13) Can this strategy be applied to P. vivax-endemic areas outside Africa?

      This new method can also be applied to P. vivax-endemic areas outside Africa. Symptomatic P. vivax cases are not necessarily reflecting recent infections, so that travel reports might need to cover longer time periods, which does not require any essential adaptation to the method. We added this text to the discussion.

    1. Rules for the implementation of this Article shall be adoptedby 15 May 2008 in accordance with the regulatory procedurereferred to in Article 22(2). These rules shall take account ofrelevant, existing international standards and user requirements,in particular with relation to validation metadata

      Replaced by 4. The Commission is empowered to adopt implementing acts laying down rules for the application of this Article, taking account of relevant, existing international standards and user requirements, in particular with relation to validation metadata. Those implementing acts shall be adopted in accordance with the procedure referred to in Article 22b(2)’

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Colorectal cancer (CRC) is the third most common cancer globally and the second leading cause of cancer-related deaths. Colonoscopy and fecal immunohistochemical testing are among the early diagnostic tools that have significantly enhanced patient survival rates in CRC. Methylation dysregulation has been identified in the earliest stages of CRC, offering a promising avenue for screening, prediction, and diagnosis. The manuscript entitled "Early Diagnosis and Prognostic Prediction of Colorectal Cancer through Plasma Methylation Regions" by Zhu et al. presents that a panel of genes with methylation pattern derived from cfDNA (27 DMRs), serving as a noninvasive detection method for CRC early diagnosis and prognosis.

      Strengths:

      The authors provided evidence that the 27 DMRs pattern worked well in predicting CRC distant metastasis, and the methylation score remarkably increased in stage III-IV.

      Weaknesses:

      The major concerns are the design of DMR screening, the relatively low sensitivity of this DMR pattern in detecting early-stage CRC, the limited size of the cohorts, and the lack of comparison with the traditional diagnosis test.

      We sincerely thank the reviewer for their thorough evaluation and constructive feedback on our manuscript. We are encouraged that the reviewer found our 27-DMR panel promising for predicting distant metastasis and for its performance in late-stage CRC. We have carefully considered the weaknesses pointed out and have made revisions to address these concerns, which we believe have significantly strengthened our paper.

      We agree with the reviewer that achieving high sensitivity for early-stage disease is the ultimate goal for any noninvasive screening test. Detecting the minute quantities of cfDNA shed from early-stage tumors is a well-recognized challenge in the field. Although the sensitivity of our current panel for early-stage CRC is modest, its core strengths, lie in its capability to also detect advanced adenomas and its excellent performance in assessing CRC metastasis and prognosis. Furthermore, we have now added a direct comparative analysis of our 27-DMR panel against the most widely used clinical serum biomarker for CRC, carcinoembryonic antigen (CEA), using samples from the same patient cohorts. Our results demonstrate that 27-DMR methylation score significantly outperforms CEA in diagnostic accuracy for early-stage CRC (64% vs. 18%) (Table s7). And in the Discussion section, we have also acknowledged our limitations and suggest that future studies are warranted to combine the cfDNA methylation model with commonly used clinical markers, such as CEA and CA19-9, with the aim of improving the sensitivity for early diagnosis.

      We acknowledge the reviewer's concern regarding the cohort size and validation in larger, prospective, multi-center cohorts is essential before this panel can be considered for clinical application. We have explicitly stated this as a limitation of our study in the Discussion section and have highlighted the need for future large-scale validation studies (Page 18, Lines 367-373). We once again thank the reviewer for their insightful comments, which have allowed us to substantially improve our manuscript. We hope that the revised version is now suitable for publication.

      Reviewer #2 (Public review):

      This work presents a 27-region DMR model for early diagnosis and prognostic prediction of colorectal cancer using plasma methylation markers. While this non-invasive diagnostic and prognostic tool could interest a broad readership, several critical issues require attention.

      Major Concerns:

      (1) Inconsistencies and clarity issues in data presentation

      (a) Sample size discrepancies

      The abstract mentions screening 119 CRC tissue samples, while Figure 1 shows 136 tissues. Please clarify if this represents 119 CRC and 17 normal samples.

      We sincerely thank the reviewer for this careful observation and for pointing out the inconsistency. We apologize for the error and the confusion it caused. Regarding Figure 1: The reviewer is correct. The number 136 in the original Figure 1 was an error. This was due to an inadvertent double-counting of the tumor samples that were used in the differential analysis against adjacent normal tissues. The actual number of tissue samples used in this analysis is 89. We have now corrected this value in the revised Figure 1.

      Regarding the Abstract: The 119 CRC tissue samples mentioned in the abstract represents the total number of unique tumor samples analyzed across all stages of our study. This number is composed of two cohorts: the initial 15 pairs of tissues used for preliminary screening, and the subsequent 89 tissue samples used for validation, totaling 119 samples. We have ensured all sample numbers are now consistent throughout the revised manuscript.

      The plasma sample numbers vary across sections: the abstract cites 161 samples, Figure 1 shows 116 samples, and the Supplementary Methods mentions 77 samples (13 Normal, 15 NAA, 12 AA, 37 CRC).

      We sincerely thank the reviewer for their meticulous review and for identifying these inconsistencies in the plasma sample numbers. We apologize for this oversight and the lack of clarity.

      Figure 1 & Supplementary Methods (77 samples): The number 116 in the original Figure 1 was a clerical error. The correct number is 77, which is the cohort used for our differential methylation analysis. This number is now consistent with the Supplementary Methods. This cohort is composed of 13 Normal, 15 NAA, 12 AA, and 37 CRC samples. The figure has been revised accordingly.

      Abstract (161 samples): The total of 161 plasma samples mentioned in the abstract is the sum of two distinct sample sets used for different stages of our analysis: The 77 samples (13 Normal, 15 NAA, 12 AA, 37 CRC) used for the differential analysis.  An additional 84 samples (33 Normal, 51 CRC) which served as the training set for the LASSO regression model. We have now clarified these distinctions in the text and ensured consistency across the abstract, figures, and methods sections.

      (b) Methodological inconsistencies

      The Supplementary Material reports 477 hypermethylated sites from TCGA data analysis (Δβ>0.20, FDR<0.05), but Figure 1 indicates 499 sites.

      The manuscript states that analyzing TCGA data across six cancer types identified 499 CRC-specific methylation sites, yet Figure 1 shows 477. Please also explain the rationale for selecting these specific cancer types from TCGA.

      We sincerely thank the reviewer for their sharp observation and for highlighting these inconsistencies. We apologize for this clerical error, which occurred when labeling the figure. The numbers 477 and 499 in Figure 1 were inadvertently swapped and the text in Supplementary Material is correct. We have now corrected this error throughout the manuscript to ensure clarity and consistency. We deeply regret the confusion this has caused.

      Regarding the rationale for selecting the cancer types:

      The selection of colorectal, esophageal, gastric, lung, liver, and breast cancers was based on the following strategic criteria to ensure the stringent identification of CRC-specific markers. Firstly, esophageal, gastric, liver, and colorectal cancers all originate from the gastrointestinal tract and share developmental and functional similarities. Comparing CRC against these closely related cancers allowed us to filter out general GI-tract-related methylation patterns and isolate those that are truly unique to colorectal tissue. Secondly, we included lung and breast cancer as they are two of the most common non-GI malignancies worldwide with distinct tissue origins. This helps ensure our identified markers are not just pan-cancer methylation events but are specific to CRC, even when compared against highly prevalent cancers from different lineages. Finally, these six cancer types have some of the largest and most complete datasets available in the TCGA database, including high-quality methylation data. This provided a robust statistical foundation for a reliable cross-cancer comparison. We hope this explanation clarifies our methodology. Thank you again for your valuable feedback.

      "404 CRC-specific DMRs" mentioned in the main text while "404 MCBs" in Figure 1, the authors need to clarify if these terms are interchangeable or how MCBs are defined.

      We sincerely thank the reviewer for pointing out this important inconsistency in terminology. We apologize for the confusion this has caused and for the error in Figure 1. The two terms are closely related in our study. The final 404 markers are technically DMRs that were identified through an analysis of MCBs. To avoid confusion, we have decided to unify the terminology. The manuscript has now been revised to consistently use "DMRs", which is the most accurate final descriptor. The label in Figure 1 has been corrected accordingly.

      (2) Methodological documentation

      The Results section requires a more detailed description of marker identification procedures and justification of methodological choices.

      Figure 3 panels need reordering for sequential citation.

      We thank the reviewer for this valuable suggestion. We agree that the original Results section lacked sufficient detail regarding the marker identification procedures and the justification for our methodological choices. To address this, we have substantially rewritten the "Methylation markers selection" subsection. This revised section provides a clear, step-by-step narrative of our marker discovery. The revised text now integrates the specific methodological details and statistical criteria. For instance, we now explicitly describe the three-pronged approach for the initial TCGA data mining and the specific criteria (Δβ, FDR, log2FC) for each, and the analysis methodology such as Wilcoxon test and LASSO regression analysis. We believe this detailed narrative now provides the necessary description and justification for our methodological choices directly within the results, significantly improving the clarity and logical flow of our manuscript. This revision can be found on (Page 9-11, Lines 180-195, 202-213). We hope these changes fully address the reviewer's concerns.

      We thank the reviewer for pointing out the citation order of the panels in Figure 3. This was a helpful suggestion for improving the clarity of our manuscript. We have now reordered the panels in Figure 3 to ensure they are cited sequentially within the text. These adjustments have been made in the "Development and validation of the CRC diagnosis model" subsection of the Results (Page 11, lines 224-230). We appreciate the reviewer's attention to detail.

      (3) Quality control and data transparency

      No quality control metrics are presented for the in-house sequencing data (e.g., sequencing quality, alignment rate, BS conversion rate, coverage, PCA plots for each cohort).

      The analysis code should be publicly available through GitHub or Zenodo.

      At a minimum, processed data should be made publicly accessible to ensure reproducibility.

      We sincerely thank the reviewer for their valuable and constructive feedback regarding quality control and data transparency. We fully agree that these elements are crucial for ensuring the robustness and reproducibility of our research. As the reviewer suggested, we have made all processed data and the key quality control metrics for each sample including sequencing quality scores, bisulfite (BS) conversion rates, and sequencing coverage publicly available to ensure the reproducibility of our findings. The analysis was performed using standard algorithms as detailed in the Methods section. While we are unable to host the code in a public repository at this time, all analysis scripts are available from the corresponding author upon reasonable request. The data has been deposited in the National Genomics Data Center (NGDC) and is accessible under the accession number OMIX009128. This information is now clearly stated in the "Data and Code Availability" section of the manuscript. We thank the reviewer again for pushing us to improve our manuscript in this critical aspect.

      Reviewer #3 (Public review):

      Summary:

      This article provides a model for early diagnosis and prognostic prediction of Colorectal Cancer and demonstrates its accuracy and usability. However, there are still some minor issues that need to be revised and paid attention to.

      Strengths:

      A large amount of external datasets were used for verification, thus demonstrating robustness and accuracy. Meanwhile, various influencing factors of multiple samples were taken into account, providing usability.

      Weaknesses:

      There are notable language issues that hinder readability, as well as a lack of some key conclusions provided.

      We are very grateful to the reviewer for their positive assessment of our study and for the constructive feedback provided. We are particularly encouraged that the reviewer recognized the strengths of our work, especially the robustness demonstrated through extensive external validation and the practical usability of our model. Regarding the weaknesses, we have taken the comments very seriously and have thoroughly revised the manuscript. We sincerely apologize for the language issues that hindered readability in our initial submission. To address this, the entire manuscript has undergone a comprehensive round of professional language polishing and editing. We have carefully reviewed and revised the text to improve clarity, flow, and grammatical accuracy. Besides, we agree that the conclusions could be stated more explicitly. To rectify this, we have substantially revised the final paragraph of the Discussion and the Conclusion section (Page 14-18, lines 279-305, 319-334, 346-348, 358-360, 367-379). We now more clearly summarize the main findings of our study, emphasize the clinical significance and potential applications of our model, and provide clear take-home messages. We thank you again for your time and insightful comments, which have been invaluable in improving the quality of our paper. We hope the revised manuscript now meets the standards for publication.

      Reviewer #1 (Recommendations for the authors):

      Detail comments are outlined below:

      (1) In this study, the authors have highlighted methylated cfDNA as a noninvasive approach for CRC early diagnosis. However, the small size of cohorts for plasma screening, particularly the sample number of NAA and AA , may cause bias in the selection of DMRs. This bias may lead to inappropriate DMRs for early diagnosis. Furthermore, the similar issues for the training set with a high percentage of late-stage CRC, no AA or NAA samples were included. This absence may be the key factor in screening changed methylated cfDNA that can predict the early stages of CRC.

      We are very grateful to the reviewer for this insightful methodological critique. We agree that cohort composition and sample size are critical factors in the development of robust biomarkers, and we appreciate the opportunity to clarify our study design and the interpretation of our results.

      We agree with the reviewer that the number of precancerous lesion samples (NAA and AA) in our initial plasma screening cohort was limited. This is a valid point. However, it is important to contextualize the role of this step within our overall multi-stage marker selection funnel. The markers evaluated in this plasma cohort were not discovered from this small sample set alone. They were the result of a rigorous pre-selection process based on large-scale public TCGA data and our own tissue-level sequencing. This robust, tissue-based validation ensured that only the most promising CRC-specific markers were advanced for plasma testing. Therefore, while the plasma cohort was modest in size, its purpose was to confirm the circulatory detectability of markers already known to have a strong tissue-of-origin signal, thereby mitigating the potential bias from a smaller discovery set.

      Our primary aim was to first build a model that could robustly and accurately identify a definitive cancer-specific methylation signal. By training the model on clear-cut invasive cancer cases versus healthy controls, we could isolate the most powerful and specific markers for established malignancy. Our working hypothesis was that these strong cancer-specific methylation patterns are initiated during the precursor stages and would therefore be detectable, albeit at lower levels, in precancerous lesions.  Unfortunately, the panel could only identify a limited proportion of precancerous lesions (48.4% in the NAA group and 52.2% in the AA group). We fully agree with the reviewer's sentiment that including a larger and more balanced set of precancerous lesions in future training cohorts could potentially optimize a model specifically for adenoma detection. We have now explicitly added this point to our Discussion section, highlighting it as an important direction for future research (Page 18, lines 367-373).

      (2) The sensitivity of 27 DMRs in the external validation set (for NAA, AA and CRC 0-Ⅱare 48.4%. 52.2% and 66.7%, respectively) were much lower compared with previously published studies, like ColonES assay (DOI: 10.1016/j.eclinm.2022.101717) and ColonSecure test (DOI: 10.1186/s12943-023-01866-z). The 27 DMRs from the layered screening process did not show superior performance in a small population of an external validation cohort. Therefore, it is unlikely that this DMR pattern will be applicable to the general population in the future.

      We sincerely thank the reviewer for their insightful comments and for providing a thorough comparison with the highly relevant ColonES and ColonSecure assays. This has given us an important opportunity to clarify the unique contributions and specific clinical applications of our 27-DMR panel.

      We acknowledge the reviewer's point that the sensitivities of our panel for precancerous lesions (NAA: 48.4%, AA: 52.2%), while substantial, are numerically lower than those reported by the excellent ColonES assay (AA: 79.0%). However, it is important to clarify that while the ColonES and ColonSecure tests are outstanding benchmarks designed primarily for early detection and screening, the primary objective and contribution of our study were slightly different. Our model demonstrated an exceptional ability to predict distant metastasis with an AUC of 0.955 and a strong capacity for predicting overall prognosis with an AUC of 0.867. Our goal was to develop a multi-functional, biologically-rooted biomarker panel that not only contributes to early detection but, more importantly, provides crucial information for post-diagnosis patient management, including staging, risk stratification, and prognostication, from a single preoperative sample. We believe this ability to preoperatively identify high-risk patients who may require more aggressive treatment or intensive surveillance is the key contribution of our work. It provides a distinct clinical utility that complements, rather than directly competes with, pure screening assays.

      We agree with the reviewer that our external validation was performed on a limited cohort, and we have acknowledged this as a limitation in our Discussion section. However, the purpose of this validation was to provide a proof-of-concept for the panel's performance across its multiple functions. The promising and exceptionally high-performing results in the prognostic domain strongly warrant further validation in larger, prospective, multi-center cohorts.

      (3) The 27 DMRs pattern worked well in predicting CRC distant metastasis, and the methylation score remarkably increased in stage III-IV. In contrast, the increase of AA and 0-II groups was very mild in the validation cohort. This observation raises concerns regarding the study design, particularly in the context of the layered screening process and sample assigning.

      We sincerely thank the reviewer for this insightful and critical comment. We agree with the reviewer's observation that the methylation score increased more remarkably in late-stage (III-IV) CRC compared to the milder increase in adenoma (AA) and early-stage (0-II) CRC in the validation cohort. However, the observed pattern is biologically plausible and consistent with the nature of colorectal cancer progression. Carcinogenesis is a multi-step process involving the gradual accumulation of genetic and epigenetic alterations. The methylation changes we identified are likely associated with tumor progression and metastasis. Therefore, it is expected that advanced, metastatic cancers (Stage III-IV), which have undergone significant biological changes, would exhibit a much stronger and more robust methylation signal compared to pre-cancerous lesions (adenomas) or early-stage, non-metastatic cancers (Stage 0-II). The "mild" increase in early stages reflects the initial, more subtle epigenetic alterations, while the "remarkable" increase in late stages reflects the extensive changes required for invasion and metastasis. We believe this graduated increase actually strengthens the validity of our methylation signature, as it mirrors the underlying biological progression of the disease. We hope this response and the corresponding revisions address the reviewer's comments.

      (4) The authors did not provide the 27 DMRs prediction efficacy comparison with other noninvasive CRC assays, like a CEA and a FIT test.

      Thank you for this valuable suggestion. We agree that comparing our model with established non-invasive assays is crucial for demonstrating its clinical potential. Following your advice, we have now included a direct comparison of the diagnostic performance between our model and the traditional tumor marker, carcinoembryonic antigen (CEA), using the external validation cohort. The results show that our model has a significantly higher sensitivity for detecting early-stage colorectal cancer and adenomas compared to CEA. This detailed comparison has been added as Table s7 in the supplementary materials, and the corresponding description has been incorporated into the Results section of our manuscript (Page 12, lines 234-236). Regarding the Fecal Immunochemical Test (FIT), we unfortunately could not perform a direct statistical comparison because very few individuals in our cohort had undergone FIT. A comparison based on such a small sample size would lack statistical power and might not yield meaningful conclusions. We have acknowledged this as a limitation of our study in the Discussion section.We believe these additions and clarifications have substantially strengthened our manuscript. Thank you again for your constructive feedback.

      (5) The authors did not explicitly describe how they assigned the plasma samples to the distinct sets, nor did they specify the criteria for the plasma screen set, training set, and validation set. The detailed information for the patient grouping should be listed.

      Responce: Thank you for this essential feedback. We agree that a transparent and detailed description of the sample allocation process is crucial for the manuscript. We apologize for the previous lack of clarity and have now revised the Methods section to address this. Our patient cohorts were assigned to the screening, training, and validation sets based on a chronological splitting strategy. Specifically, samples were allocated based on the date of collection in a consecutive manner. This approach was chosen to minimize selection bias and to provide a more realistic, forward-looking assessment of the model's performance, simulating a prospective validation scenario. The screening set comprised 89 tissue samples and 77 plasma samples collected between June to December 2020. The primary purpose of this set was for the initial discovery and screening of potential methylation markers. The training set and validation set included 165 plasma samples collected from December 2020 to July 2022. The external validation cohort comprised 166 plasma samples collected from from July 2022 to December 2022. The subsection titled "Study design and samples" within the Methods section of the revised manuscript, which now contains all of this detailed information (Page 6, lines 116-133). We believe this detailed explanation now makes our study design clear and transparent. Thank you again for helping us improve our manuscript.

      Reviewer #2 (Recommendations for the authors):

      The manuscript requires significant language editing to improve clarity and readability. We recommend that the authors seek professional editing services for revision.

      Thank you for your constructive comments on the language of our manuscript. We apologize for any lack of clarity in the previous version. To address this, we have performed a thorough revision of the manuscript. The text has been carefully reviewed and edited by a native English-speaking colleague who is an expert in our research field. We have focused on correcting all grammatical errors, improving sentence structure, and refining the phrasing throughout the document to enhance readability. We are confident that these extensive revisions have significantly improved the clarity of the manuscript. We hope you will find the current version much easier to read and understand.

      Reviewer #3 (Recommendations for the authors):

      (1) However, I think the abstract part of the article is too detailed and should be more concise and shortened. It is not necessary to show detailed values but to summarize the results.

      Thank you for this valuable suggestion. We agree that the previous version of the abstract was overly detailed and that a more concise summary would be more effective for the reader. Following your advice, we have substantially revised the abstract. We have removed the specific numerical values (such as detailed statistics) and have instead focused on summarizing the key findings and their broader implications (Page 3, lines 54-60, 64-66, 70-72). The revised abstract is now shorter and provides a clearer, high-level overview of our study's background, methods, main results, and conclusions. We believe these changes have significantly improved its readability and impact. We hope you will find the current version more appropriate.

      (2) Figure 4, the color in the legend and plot are not the same, and should be revised.

      Thank you for your careful attention to detail and for pointing out the color inconsistency in Figure 4. We apologize for this oversight. We have now corrected the figure as you suggested, ensuring that the colors in the legend perfectly match those in the plot. The revised Figure 4 has been updated in the manuscript. We appreciate your help in improving the quality of our figures.

      (3) Please pay attention to the article format, such as the consistency of fonts and punctuation marks. (For example, Lines 75 and Line 230).

      Thank you for your meticulous review and for pointing out the inconsistencies in our manuscript's formatting. We sincerely apologize for these oversights and any inconvenience they may have caused. Following your feedback, we have carefully corrected the specific issues you highlighted. Furthermore, we have conducted a thorough proofread of the entire manuscript to ensure consistency in all fonts, punctuation marks, and overall adherence to the journal's formatting guidelines. We appreciate your help in improving the presentation and professionalism of our paper.

    1. Reviewer #1 (Public review):

      Summary:

      Taylar Hammond and colleagues identified new regulators of the G1/S transition of the cell cycle. They did so by screening publicly available data from the Cancer Dependency Map and identified FAM53C as a positive regulator of the G1/S transition. Using biochemical assays they then show that FAM53 interacts with the DYRK1A kinase to inhibit its function. They show in RPE1 cells that loss of FAMC53 leads to a DYRK1A + P53-dependent cell cycle arrest. Combined inactivation of FAM53C and DYRK1A in a TP53-null background caused S-phase entry with subsequent apoptosis. Finally the authors assess the effect of FAM53C deletion in a cortical organoid model, and in Fam53c knockout mice. Whereas proliferation of the organoids is indeed inhibited, mice show virtually no phenotype.

      The authors have revised the manuscript, and I respond here point-by-point to indicate which parts of the revision I found compelling, and which parts were less convincing. So the numbering is consistent with the numbering in my first review report.

      (1) The p21 knockdowns are a valuable addition, and the claim that other p53 targets than p21 are involved in the FAMC53 RNAi-mediated arrest is now much more solid. Minor detail: if S4D is a quantification of S4C, it is hard to believe that the quantification was done properly (at least the DYRK1Ai conditions). Perhaps S4C is not the best representative example, or some error was made?

      (2a) I appreciate the decision to remove the cyclin D1 phosphorylation data. A more nuanced model now emerges. It is not clear to me however why the Protein Simple immunoassay was used for experiments with RPE cells, and not the cortical organoids. Even though no direct claims are made based on the phospho-cyclin D data in Figure 5E+G, showing these data suggests that FAM53C deletion increases DYRK1A-mediated cyclin D1 phosphorylation. I find it tricky to show these data, while knowing now that this effect could not be shown in the RPE1 cells.<br /> (2b) The quantifications of the immunoassays are not convincing. In multiple experiments, the HSP90 levels vary wildly, which indicates big differences in protein loading if HSP90 is a proper loading control. This is for example problematic for the interpretation of figure 3F and S3I. The cyclin D1 "bands" look extremely similar between siCtrl and siFAM53C (Fig S3I), in fact the two series of 6 samples with different dosages of DYRK1Ai look seem an identical repetition of each other. I did not have to option to overlay them, but it would be important to check if a mistake was made here. The cyclin D1 signals aside, the change in cycD1/HSP90 ratios seems to be entirely caused by differences in HSP90 levels. Careful re-analysis of the raw data and more equal loading seem necessary. The same goes (to a lesser extent) for S3J+K.<br /> (2c) the new model in Fig S4L: what do the arrows at the right FAM53C and p53 that merge a point straight towards S-phase mean? They suggest that p53 (and FAM53C) directly promote S-phase progression, but most likely this is not what the authors intended with it.

      (3) Clear; nicely addressed.

      (4) Thank you for correcting.

      (5) I appreciate that the authors are now more careful to call the IMPC analysis data preliminary. This is acceptable to me, but nevertheless, I suggest the authors to seriously consider taking this part entirely out. The risk of chance finding and the extremely skewed group sizes (as reviewer #2 had pointed out) hamper the credibility of this statistical analysis.

    2. Reviewer #3 (Public review):

      Summary:

      In this study Hammond et al. investigated the role of Dual-specificity Tyrosine Phosphorylation regulated Kinase 1A (DYRK1) in G1/S transition. By exploiting Dependency Map portal, they identified a previously unexplored protein FAM53C as potential regulator of G1/S transition. Using RNAi, they confirmed that depletion of FAM53C suppressed proliferation of human RPE1 cells and that this phenotype was dependent on the presence protein RB. In addition, they noted increased level of CDKN1A transcript and p21 protein that could explain G1 arrest of FAM53C-depleted cells but surprisingly, they did not observe activation of other p53 target genes. Proteomic analysis identified DYRK1 as one of the main interactors of FAM53C and the interaction was confirmed in vitro. Further, they showed that purified FAM53C blocked the ability of DYRK1 to phosphorylate cyclin D in vitro although the activity of DYRK1 was likely not inhibited (judging from the modification of FAM53C itself). Instead, it seems more likely that FAM53C competes with cyclin D in this assay. Authors claim that the G1 arrest caused by depletion of FAM53C was rescued by inhibition of DYRK1 but this was true only in cells lacking functional p53. This is quite confusing as DYRK1 inhibition reduced the fraction of G1 cells in p53 wild type cells as well as in p53 knock-outs, suggesting that FAM53C may not be required for regulation of DYRK1 function. Instead of focusing on the impact of FAM53C on cell cycle progression, authors moved towards investigating its potential (and perhaps more complex) roles in differentiation of IPSCs into cortical organoids and in mice. They observed a lower level of proliferating cells in the organoids but if that reflects an increased activity of DYRK1 or if it is just an off-target effect of the genetic manipulation remains unclear. Even less clear is the phenotype in FAM53C knock-out mice. Authors did not observe any significant changes in survival nor in organ development but they noted some behavioral differences. Weather and how these are connected to the rate of cellular proliferation was not explored. In the summary, the study identified previously unknown role of FAM53C in proliferation but failed to explain the mechanism and its physiological relevance at the level of tissues and organism. Although some of the data might be of interest, in current form the data is too preliminary to justify publication.

      Major comments:

      (1) Whole study is based on one siRNA to Fam53C and its specificity was not validated. Level of the knock down was shown only in the first figure and not in the other experiments. The observed phenotypes in the cell cycle progression may be affected by variable knock-down efficiency and/or potential off target effects.

      (2) Experiments focusing on the cell cycle progression were done in a single cell line RPE1 that showed a strong sensitivity to FAM53C depletion. In contrast, phenotypes in IPSCs and in mice were only mild suggesting that there might be large differences across various cell types in the expression and function of FAM53C. Therefore, it is important to reproduce the observations in other cell types.

      (3) Authors state that FAM53C is a direct inhibitor of DYRK1A kinase activity (Line 203), however this model is not supported by the data in Fig 4A. FAM53C seems to be a good substrate of DYRK1 even at high concentrations when phosphorylations of cyclin D is reduced. It rather suggests that DYRK1 is not inhibited by FAM53C but perhaps FAM53C competes with cyclin D. Further, authors should address if the phosphorylation of cyclin D is responsible for the observed cell cycle phenotype. Is this Cyclin D-Thr286 phosphorylation, or are there other sites involved?

      (4) At many places, information on statistical tests is missing and SDs are not shown in the plots. For instance, what statistics was used in Fig 4C? Impact of FAM53C on cyclin D phosphorylation does not seem to be significant. IN the same experiment, does DYRK1 inhibitor prevent modification of cyclin D?

      (5) Validation of SM13797 compound in terms of specificity to DYRK1 was not performed.

      (6) A fraction of cells in G1 is a very easy readout but it does not measure progression through the G1 phase. Extension of the S phase or G2 delay would indirectly also result in reduction of the G1 fraction. Instead, authors could measure the dynamics of entry to S phase in cells released from a G1 block or from mitotic shake off.

      Comments to the revised manuscript:

      In the revised version of the manuscript, authors addressed most of the critical points. They now include new data with depletion of FAM53C using single siRNAs that show small but significant enrichment of population of the G1 cells. This G1 arrest is likely caused by a combined effects on induction of p21 expression and decreased levels of cyclin D1. Authors observed that inhibition of DYRK1 rescued cyclin D1 levels in FAM53 depleted cells suggesting that FAM53C may inhibit DYRK1. This possibility is also supported by in vitro experiments. On the other hand, inhibition of DYRK1 did not rescue the G1 arrest upon depletion of FAM53C, suggesting that FAM53C may have also DYRK1-independent role in G1. Functional rescue experiments with cyclin D1 mutants and detection of DYRK1 activity in cells would be necessary to conclusively explain the function of FAM53C in progression through G1 phase but unfortunately these experiments were technically not possible. Knock out of FAM53C in iPSCs and in mice suggest that FAM53C may have additional functions besides the cell cycle control and/or that adaptation may have occurred in these model systems. Overall, the study implicated FAM53C in fine tuning DYRK1 activity in cells that may to some extent influence the progression through G1 phase. In addition, FAM53C may also have DYRK1 and cell cycle independent functions that remain to be addressed by future studies.

    3. Author response:

      (1) General Statements

      We thank the Reviewers for a fair review of our work and helpful suggestions. We have significantly revised the manuscript in response to these suggestions. We provide a point-by-point response to the Reviewers below but wanted to highlight in our response a recurring concern related to the strong cell cycle arrest observed upon the acute FAM53C knock-down being different than the limited phenotypes in other contexts, including the knockout mice and DepMap data.

      First, we now show that we can recapitulate the strong G1 arrest resulting from the FAM53C knock-down using two independent siRNAs in RPE-1 cells, supporting the specificity of the effects.

      Second, the G1 arrest that results from the FAM53C knock-down is also observed in cells with inactive p53, suggesting it is not due to a non-specific stress response due to “toxic” siRNAs. In addition, the arrest is dependent on RB, which fits with the genetic and biochemical data placing FAM53C upstream of RB, further supporting a specific phenotype.

      Third, we have performed experiments in other human cells, including cancer cell lines. As would be expected for cancer cells, the G1 arrest is less pronounced but is still significant, indicating that the G1 arrest is not unique to RPE-1 cells.

      Fourth, it is not unexpected that compensatory mechanisms would be activated upon loss of FAM53C during development or in cancer – which may explain the lack of phenotypes in vivo or upon long-term knockout. This has been true for many cell cycle regulators, either because of compensation by other family members that have overlapping functions, or by a larger scale rewiring of signaling pathways. 

      (2) Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity): 

      Summary: 

      Taylar Hammond and colleagues identified new regulators of the G1/S transition of the cell cycle.

      They did so by screening public available data from the Cancer Dependency Map, and identified FAM53C as a positive regulator of the G1/S transition. Using biochemical assays they then show that FAM53 interacts with the DYRK1A kinase to inhibit its function. DYRK1A in its is known to induce degradation of cyclin D, leading the authors to propose a model in which DYRK1Adependent cyclin D degradation is inhibited by FAM53C to permit S-phase entry. Finally the authors assess the effect of FAM53C deletion in a cortical organoid model, and in Fam53c knockout mice. Whereas proliferation of the organoids is indeed inhibited, mice show virtually no phenotype.  

      Major comments: 

      The authors show convincing evidence that FAM53C loss can reduce S-phase entry in cell cultures, and that it can bind to DYRK1A. However, FAM53 has multiple other binding partners and I am not entirely convinced that negative regulation of DYRK1A is the predominant mechanism to explain its effects on S-phase entry. Some of the claims that are made based on the biochemical assays, and on the physiological effects of FAM53C are overstated. In addition, some choices made methodology and data representation need further attention. 

      (1) The authors do note that P21 levels increase upon FAM53C. They show convincing evidence that this is not a P53-dependent response. But the claim that " p21 upregulation alone cannot explain the G1 arrest in FAM53C-deficient cells (line 138-139) is misleading. A p53-independent p21 response could still be highly relevant. The authors could test if FAM53C knockdown inhibits proliferation after p21 knockdown or p21 deletion in RPE1 cells. 

      The Reviewer raises a great point. Our initial statement needed to be clarified and also need more experimental support. We have performed experiments where we knocked down FAM53C and p21 individually, as well as in combination, in RPE-1 cells. These experiment show that p21 knock-down is not sufficient to negate the cell cycle arrest resulting from the FAM53C knockdown in RPE-1 cells (Figure 4B,C and Figure S4C,D).

      We now extended these experiments to conditions where we inhibited DYRK1A, and we also compared these data to experiments in p53-null RPE-1 cells. Altogether, these experiments point to activation of p53 downstream of DYRK1A activation upon FAM53C knock-down, and indicate that p21 is not the only critical p53 target in the cell cycle arrest observed in FAM53C knock-down cells (Figure 4 and Figure S4).

      (2) The authors do not convincingly show that FAM53C acts as a DYRK1A inhibitor in cells. Figures 4B+C and S4B+C show extremely faint P-CycD1 bands, and tiny differences in ratios. The P values are hovering around the 0.05, so n=3 is clearly underpowered here. Total CycD1 levels also correlate with FAM53C levels, which seems to affect the ratios more than the tiny pCycD1 bands. Why is there still a pCycD1 band visible in 4B in the GFP + BTZ + DYRK1Ai condition? And if I look at the data points I honestly don't understand how the authors can conclude from S4C that knockdown of siFAM53C increases (DYRK1A dependent) increases in pCycD1 (relative to total CycD1). In figure 5C, no blot scans are even shown, and again the differences look tiny. So the authors should either find a way to make these assays more robust, or alter their claims appropriately. 

      We appreciate these comments from the Reviewer and have significantly revised the manuscript to address them.

      The analysis of Cyclin D phosphorylation and stability are complicated by the upregulation of p21 upon FAM53C knock-down, in particular because p21 can be part of Cyclin D complexes, which may affect its protein levels in cells (as was nicely showed in a previous study from the lab of Tobias Meyer – Chen et al., Mol Cell, 2013). Instead of focusing on Cyclin D levels and stability, we refocused the manuscript on RB and p53 downstream of FAM53C loss.

      We removed previous panel 4B from the revised manuscript. For panels 4E and S4B (now panels S3J and S3K)), we used a true “immunoassay” (as indicated in the legend – not an immunoblot), which is much more quantitative and avoids error-prone steps in standard immunoblots (“Western blots”). Briefly, this system was developed by ProteinSimple. It uses capillary transfer of proteins and ELISA-like quantification with up to 6 logs of dynamic range (see their web site https://www.proteinsimple.com/wes.html). The “bands” we show are just a representation of the luminescence signals in capillaries. We made sure to further clarify the figure legends in the revised manuscript.

      The representative Western blot images for 5C-D (now 5F-G) in the original submission are shown in Figure 5E, we apologize if this was not clear. The differences are small, which we acknowledge in the revised manuscript. Note that several factors can affect Cyclin D levels in cells, including the growth rate and the stage of the cell cycle. Our FACS analysis shows that normal organoids have ~63% of cells in G1 and ~13% in S phase; the overall lower proportion of S-phase cells in organoids may make the immunoblot difference appear smaller, with fewer cycling cells resulting in decreased Cyclin D phosphorylation.

      Nevertheless, the Reviewer brings up a good point and comments from this Reviewer and the others made us re-think how to best interpret our results. As discussed above, we re-read carefully the Meyer paper and think that FAM53C’s role and DYRK1A activity in cells may be understood when considering levels of both CycD and p21 at the same time in a continuum. While our genetic and biochemical data support a role for FAM53C in DYRK1A inhibition, it is likely that the regulation of cell cycle progression by FAM53C is not exclusively due to this inhibition. As discussed above and below, we noted an upregulation of p21 upon FAM53C knock-down, and activation of p53 and its targets likely contributes significantly to the phenotypes observed. We added new experiments to support this more complex model (Figure 4 and Figure S4, with new model in S4L).

      (3) The experiments to test if DYRK1A inhibition could rescue the G1 arrest observed upon FAM53C knockdown are not entirely convincing either. It would be much more convincing if they also perform cell counting experiments as they have done in Figures 1F and 1G, to complement the flow cytometry assays. I suggest that the authors do these cell counting experiments in RPE1 +/- P53 cells as well as HCT116 cells. In addition, did the authors test if P21 is induced by DYRK1Ai in HCT116 cells? 

      We repeated the experiments with the DYRK1A inhibitor and counted the cells. In p53-null RPE1 cells, we found that cell numbers do not increase in these conditions where we had observed a cell cycle re-entry (Fig. 4E), which was accompanied by apoptotic cell death (Fig. S4I). Thus, cells re-enter the cell cycle but die as they progress through S-phase and G2/M. We note that inhibition of DYRK1A has been shown to decrease expression of G2/M regulators (PMID: 38839871), which may contribute to the inability of cells treated to DYRK1Ai to divide. Because our data in RPE-1 cells showed that p21 knock-down was not sufficient to allow the FAM53C knock-down cells to re-enter the cell cycle, we did not further analyze p21 in HCT-116 cells.

      (4) The data in Figure 5C and 5D are identical, although they are supposed to represent either pCycD1 ratios or p21 levels. This is a problem because at least one of the two cannot be true. Please provide the proper data and show (representative) images of both data types.

      We apologize for these duplicated panels in the original submission. We now replaced the wrong panel with the correct data (Fig. 5F,G). 

      (5) Line 246: "Fam53c knockout mice display developmental and behavioral defects." I don't agree with this claim. The mutant mice are born at almost the expected Mendelian ratios, the body weight development is not consistently altered. But more importantly, no differences in adult survival or microscopic pathology were seen. The authors put strong emphasis on the IMPC behavioral analysis, but they should be more cautious. The IMPC mouse cohorts are tested for many other phenotypes related to behavior and neurological symptoms and apparently none of these other traits were changed in the IMPC Famc53c-/- cohort. Thus, the decreased exploration in a new environment could very well be a chance finding. The authors need to take away claims about developmental and behavioral defects from the abstract, results and discussion sections; the data are just too weak to justify this. 

      We agree with the Reviewer that, although we observed significant p-values, this original statement may not be appropriate in the biological sense. We made sure in the revised manuscript to carefully present these data.

      Minor comments: 

      (6) Can the authors provide a rationale for each of the proteins they chose to generate the list of the 38 proteins in the DepMap analysis? I looked at the list and it seems to me that they do not all have described functions in the G1/S transition. The analysis may thus be biased. 

      To address this point, we updated Table S1 (2nd tab) to provide a better rationale for the 38 factors chosen. Our focus was on the canonical RB pathway and we included RB binding proteins whose function had suggested they may also be playing a role in the G1/S transition. We do agree that there is some bias in this selection (e.g., there are more RB binding factors described) but we hope the Reviewer will agree with us that this list and the subsequent analysis identified expected factors, including FAM53C. Future studies using this approach and others will certainly identify new regulators of cell cycle progression.

      (7) Figure 1B is confusing to me. Are these just some (arbitrarily) chosen examples? Consider leaving this heatmap out altogether, of explain in more detail. 

      We agree with the Reviewer that this panel was not necessarily useful and possibly in the wrong place, and we removed it from the manuscript. We replaced it with a cartoon of top hits in the screen.

      (8) The y-axes in Figures 2C, 2D, 2E, and 4D are misleading because they do not start at 0. Please let the axis start at 0, or make axis breaks. 

      We re-graphed these panels.

      (9) Line 229: " Consequences ... brain development." This subheader is misleading, because the in vitro cortical organoid system is a rather simplistic model for brain development, and far away from physiological brain development. Please alter the header. 

      We changed the header to “Consequences of FAM53C inactivation in human cortical organoids in culture”.

      (10) Figure S5F: the gating strategy is not clear to me. In particular, how do the authors know the difference between subG1 and G1 DAPI signals? Do they interpret the subG1 as apoptotic cells? If yes, why are there so many? Are the culturing or harvesting conditions of these organoids suboptimal? Perhaps the authors could consider doing IF stainings on EdU or BrdU on paraffin sections of organoids to obtain cleaner data?

      Thank you for your feedback. The subG1 population in the original Figure S5F represents cells that died during the dissociation step of the organoids for FACS analysis. To address this point, we performed live & dead staining to exclude dead cells and provide clearer data. We refined gating strategy for better clarity in the new S5F panel.

      (11) Figure S6A; the labeling seems incorrect. I would think that red is heterozygous here, and grey mutant. 

      We fixed this mistake, thank you. 

      Reviewer #1 (Significance): 

      The finding that the poorly studied gene FAM53C controls the G1/S transition in cell lines is novel and interesting for the cell cycle field. However, the lack of phenotypes in Famc53-/- mice makes this finding less interesting for a broader audience. Furthermore, the mechanisms are incompletely dissected. The importance of a p53-indepent induction of p21 is not ruled out. And while the direct inhibitory interaction between FAM53C and DYRK1A is convincing (and also reported by others; PMID: 37802655), the authors do not (yet) convincingly show that DYRK1A inhibition can rescue a cell proliferation defect in FAM53C-deficient cells. 

      Altogether, this study can be of interest to basic researchers in the cell cycle field. 

      I am a cell biologist studying cell cycle fate decisions, and adaptation of cancer cells & stem cells to (drug-induced) stress. My technical expertise aligns well with the work presented throughout this paper, although I am not familiar with biolayer interferometry. 

      Reviewer #2 (Evidence, reproducibility and clarity): 

      Summary 

      In this study Hammond et al. investigated the role of Dual-specificity Tyrosine Phosphorylation regulated Kinase 1A (DYRK1) in G1/S transition. By exploiting Dependency Map portal, they identified a previously unexplored protein FAM53C as potential regulator of G1/S transition. Using RNAi, they confirmed that depletion of FAM53C suppressed proliferation of human RPE1 cells and that this phenotype was dependent on the presence protein RB. In addition, they noted increased level of CDKN1A transcript and p21 protein that could explain G1 arrest of FAM53Cdepleted cells but surprisingly, they did not observe activation of other p53 target genes. Proteomic analysis identified DYRK1 as one of the main interactors of FAM53C and the interaction was confirmed in vitro. Further, they showed that purified FAM53C blocked the ability of DYRK1 to phosphorylate cyclin D in vitro although the activity of DYRK1 was likely not inhibited (judging from the modification of FAM53C itself). Instead, it seems more likely that FAM53C competes with cyclin D in this assay. Authors claim that the G1 arrest caused by depletion of FAM53C was rescued by inhibition of DYRK1 but this was true only in cells lacking functional p53. This is quite confusing as DYRK1 inhibition reduced the fraction of G1 cells in p53 wild type cells as well as in p53 knock-outs, suggesting that FAM53C may not be required for regulation of DYRK1 function. Instead of focusing on the impact of FAM53C on cell cycle progression, authors moved towards investigating its potential (and perhaps more complex) roles in differentiation of IPSCs into cortical organoids and in mice. They observed a lower level of proliferating cells in the organoids but if that reflects an increased activity of DYRK1 or if it is just an off target effect of the genetic manipulation remains unclear. Even less clear is the phenotype in FAM53C knock-out mice. Authors did not observe any significant changes in survival nor in organ development but they noted some behavioral differences. Weather and how these are connected to the rate of cellular proliferation was not explored. In the summary, the study identified previously unknown role of FAM53C in proliferation but failed to explain the mechanism and its physiological relevance at the level of tissues and organism. Although some of the data might be of interest, in current form the data is too preliminary to justify publication.

      Major points 

      (1) Whole study is based on one siRNA to Fam53C and its specificity was not validated. Level of the knock down was shown only in the first figure and not in the other experiments. The observed phenotypes in the cell cycle progression may be affected by variable knock-down efficiency and/or potential off target effects. 

      We thank the Reviewer for raising this important point. First, we need to clarify that our experiments were performed with a pool of siRNAs (not one siRNA). Second, commercial antibodies against FAM53C are not of the best quality and it has been challenging to detect FAM53C using these antibodies in our hands – the results are often variable. In addition, to better address the Reviewer’s point and control for the phenotypes we have observed, we performed two additional series of experiments: first, we have confirmed G1 arrest in RPE-1 cells with individual siRNAs, providing more confidence for the specificity of this arrest (Fig. S1B); second, we have new data indicating that other cell lines arrest in G1 upon FAM53C knock-down (Fig. S1E,F and Fig. 4F).

      (2) Experiments focusing on the cell cycle progression were done in a single cell line RPE1 that showed a strong sensitivity to FAM53C depletion. In contrast, phenotypes in IPSCs and in mice were only mild suggesting that there might be large differences across various cell types in the expression and function of FAM53C. Therefore, it is important to reproduce the observations in other cell types. 

      As mentioned above, we have new data indicating that other cell lines arrest in G1 upon FAM53C knock-down (three cancer cell lines) (Fig. S1E,F and Fig. 4F).

      (3) Authors state that FAM53C is a direct inhibitor of DYRK1A kinase activity (Line 203), however this model is not supported by the data in Fig 4A. FAM53C seems to be a good substrate of DYRK1 even at high concentrations when phosphorylations of cyclin D is reduced. It rather suggests that DYRK1 is not inhibited by FAM53C but perhaps FAM53C competes with cyclin D. Further, authors should address if the phosphorylation of cyclin D is responsible for the observed cell cycle phenotype. Is this Cyclin D-Thr286 phosphorylation, or are there other sites involved? 

      We revised the text of the manuscript to include the possibility that FAM53C could act as a competitive substrate and/or an inhibitor.

      We removed most of the Cyclin D phosphorylation/stability data from the revised manuscript. As the Reviewers pointed out, some of these data were statistically significant but the biological effects were small. As discussed above in our response to Reviewer #1, the analysis of Cyclin D phosphorylation and stability are complicated by the upregulation of p21 upon FAM53C knockdown, in particular because p21 can be part of Cyclin D complexes, which may affect its protein levels in cells (as was nicely showed in a previous study from the lab of Tobias Meyer – Chen et al., Mol Cell, 2013). Instead of focusing on Cyclin D levels and stability, we refocused the manuscript on RB and p53 downstream of FAM53C loss.

      We note, however, that we used specific Thr286 phospho-antibodies, which have been used extensively in the field. Our data in Figure 1 with palbociclib place FAM53C upstream of Cyclin D/CDK4,6. We performed Cyclin D overexpression experiments but RPE-1 cells did not tolerate high expression of Cyclin D1 (T286A mutant) and we have not been able to conduct more ‘genetic’ studies. 

      (4) At many places, information on statistical tests is missing and SDs are not shown in the plots. For instance, what statistics was used in Fig 4C? Impact of FAM53C on cyclin D phosphorylation does not seem to be significant. In the same experiment, does DYRK1 inhibitor prevent modification of cyclin D? 

      As discussed above, we removed some of these data and re-focused the manuscript on p53-p21 as a second pathway activated by loss of FAM53C.

      (5) Validation of SM13797 compound in terms of specificity to DYRK1 was not performed. 

      This is an important point. We had cited an abstract from the company (Biosplice) but we agree that providing data is critical. We have now revised the manuscript with a new analysis of the compound’s specificity using kinase assays. These data are shown in Fig. S3F-H.

      (6) A fraction of cells in G1 is a very easy readout but it does not measure progression through the G1 phase. Extension of the S phase or G2 delay would indirectly also result in reduction of the G1 fraction. Instead, authors could measure the dynamics of entry to S phase in cells released from a G1 block or from mitotic shake off. 

      The Reviewer made a good point. As discussed in our response to Reviewer #1, with p53-null RPE-1 cells, we found that cell numbers do not increase in these conditions where we had observed a cell cycle re-entry (Fig. 4E), which was accompanied by apoptotic cell death (Fig. S4I). Thus, cells re-enter the cell cycle but die as they progress through S-phase and G2/M. We note that inhibition of DYRK1A has been shown to decrease expression of G2/M regulators (PMID: 38839871), which may contribute to the inability of cells treated to DYRK1Ai to divide.

      Because our data in RPE-1 cells showed that p21 knock-down was not sufficient to allow the FAM53C knock-down cells to re-enter the cell cycle, we did not further analyze p21 in HCT-116 cells. These data indicate that G1 entry by flow cytometry will not always translate into proliferation.

      Other points:

      (7) Fig. 2C, 2D, 2E graphs should begin with 0 

      We remade these graphs.

      (8) Fig. 5D shows that the difference in p21 levels is not significant in FAM53C-KO cells but difference is mentioned in the text. 

      We replaced the panel by the correct panel; we apologize for this error.

      (9) Fig. 6D comparison of datasets of extremely different sizes does not seem to be appropriate

      We agree and revised the text. We hope that the Reviewer will agree with us that it is worth showing these data, which are clearly preliminary but provide evidence of a possible role for FAM53C in the brain.

      (10) Could there be alternative splicing in mice generating a partially functional protein without exon 4? Did authors confirm that the animal model does not express FAM53C? 

      We performed RNA sequencing of mouse embryonic fibroblasts derived from control and mutant mice. We clearly identified fewer reads in exon 4 in the knockout cells, and no other obvious change in the transcript (data not shown). However, immunoblot with mouse cells for FAM53C never worked well in our hands. We made sure to add this caveat to the revised manuscript.

      Reviewer #2 (Significance): 

      Main problem of this study is that the advanced experimental models in IPSCs and mice did not confirm the observations in the cell lines and thus the whole manuscript does not hold together. Although I acknowledge the effort the authors invested in these experiments, the data do not contribute to the main conclusion of the paper that FAM53C/DYRK1 regulates G1/S transition. 

      Reviewer #3 (Evidence, reproducibility and clarity: 

      This paper identifies FAM53C as a novel regulator of cell cycle progression, particularly at the G1/S transition, by inhibiting DYRK1A. Using data from the Cancer Dependency Map, the authors suggest that FAM53C acts upstream of the Cyclin D-CDK4/6-RB axis by inhibiting DYRK1A.  Specifically, their experiments suggest that FAM53C Knockdown induces G1 arrest in cells, reducing proliferation without triggering apoptosis. DYRK1A Inhibition rescues G1 arrest in P53KO cells, suggesting FAM53C normally suppresses DYRK1A activity. Mass Spectrometry and biochemical assays confirm that FAM53C directly interacts with and inhibits DYRK1A. FAM53C Knockout in Human Cortical Organoids and Mice leads to cell cycle defects, growth impairments, and behavioral changes, reinforcing its biological importance. 

      Strength of the paper: 

      The study introduces a novel cell cycle control signalling module upstream of CDK4/6 in G1/S regulation which could have significant impact. The identification of FAM53C using a depmap correlation analysis is a nice example of the power of this dataset. The experiments are carried out mostly in a convincing manner and support the conclusions of the manuscript. 

      Critique: 

      (1) The experiments rely heavily on siRNA transfections without the appropriate controls. There are so many cases of off-target effects of siRNA in the literature, and specifically for a strong phenotype on S-phase as described here, I would expect to see solid results by additional experiments. This is especially important since the ko mice do not show any significant developmental cell cycle phenotypes. Moreover, FAM53C does not show a strong fitness effect in the depmap dataset, suggesting that it is largely non-essential in most cancer cell lines. For this paper to reach publication in a high-standard journal, I would expect that the authors show a rescue of the S-phase phenotype using an siRNA-resistant cDNA, and show similar S-phase defects using an acute knock out approach with lentiviral gRNA/Cas9 delivery. 

      We thank the Reviewer for this comment. Please refer to the initial response to the three Reviewers, where we discuss our use of single siRNAs and our results in multiple cell lines. Briefly, we can recapitulate the G1 arrest upon FAM53C knock-down using two independent siRNAs in RPE-1 cells. We also observe the same G1 arrest in p53 knockout cells, suggesting it is not due to a non-specific stress response. In addition, the arrest is dependent on RB, which fits with the genetic and biochemical data placing FAM53C upstream of RB, further supporting a specific phenotype. Human cancer cell lines also arrest in G1 upon FAM53C knock-down, not just RPE-1 cells. Finally, we hope the Reviewer will agree with us that compensatory mechanisms are very common in the cell cycle – which may explain the lack of phenotypes in vivo or upon long-term knockout of FAM53C.

      (2) The S-phase phenotype following FAM53C should be demonstrated in a larger variety of TP53WT and mutant cell lines. Given that this paper introduces a new G1/S control element, I think this is important for credibility. Ideally, this should be done with acute gRNA/Cas9 gene deletion using a lentiviral delivery system; but if the siRNA rescue experiments work and validate an on-target effect, siRNA would be an appropriate alternative. 

      We now show data with three cancer cell lines (U2OS, A549, and HCT-116 – Fig. S1E,F and Fig. 4F), in addition to our results in RPE-1 cells and in human cortical organoids. We note that the knock-down experiments are complemented by overexpression data (Fig. 1G-I), by genetic data (our original DepMap screen), and our biochemical data (showing direct binding of FAM53C to DYRK1A).

      (3) The western blot images shown in the MS appear heavily over-processed and saturated (See for example S4B, 4A, B, and E). Perhaps the authors should provide the original un-processed data of the entire gels? 

      For several of our panels (e.g., 4E and S4B, now panels S3J and S3K)), we used a true “immunoassay” (as indicated in the legend – not an immunoblot), which is much more quantitative and avoids error-prone steps in standard immunoblots (“Western blots”). Briefly, this system was developed by ProteinSimple. It uses capillary transfer of proteins and ELISA-like quantification with up to 6 logs of dynamic range (see their web site https://www.proteinsimple.com/wes.html). The “bands” we show are just a representation of the luminescence signals in capillaries. We made sure to further clarify the figure legends in the revised manuscript.

      Data in 4A are also not a western blot but a radiograph.

      For immunoblots, we will provide all the source data with uncropped blots with the final submission.

      (4) A critical experiment for the proposed mechanism is the rescue of the FAM53C S-phase reduction using DYRK1A inhibition shown in Figure 4. The legend here states that the data were extracted from BrdU incorporation assays, but in Figure S4D only the PI histograms are shown, and the S-phase population is not quantified. The authors should show the BrdU scatterplot and quantify the phenotype using the S-phase population in these plots. G1 measurements from PI histograms are not precise enough to allow for conclusions. Also, why are the intensities of the PI peaks so variable in these plots? Compare, for example, the HCT116 upper and lower panels where the siRNA appears to have caused an increase in ploidy. 

      We apologize for the confusion and we fixed these errors, for most of the analyses, we used PI to measure G1 and S-phase entry. We added relevant flow cytometry plots to supplemental figures (Fig. S1G, H, I, as well as Fig. S4E and S4K, and Fig. S5F).

      (5) There's an apparent contradiction in how RB deletion rescues the G1 arrest (Figure 2) while p21 seems to maintain the arrest even when DYRK1A is inhibited. Is p21 not induced when FAM53C is depleted in RB ko cells? This should be measured and discussed. 

      This comment and comments from the two other Reviewers made us reconsider our model. We re-read carefully the Meyer paper and think that DYRK1A activity may be understood when considering levels of both CycD and p21 at the same time in a continuum (as was nicely showed in a previous study from the lab of Tobias Meyer – Chen et al., Mol Cell, 2013). While our genetic and biochemical data support a role for FAM53C in DYRK1A inhibition, it is obvious that the regulation of cell cycle progression by FAM53C is not exclusively due to this inhibition. As discussed above and below, we noted an upregulation of p21 upon FAM53C knock-down, and activation of p53 and its targets likely contributes significantly to the phenotypes observed. We added new experiments to support this more complex model (Figure 4 and Figure S4, with new model in S4L).

      Reviewer #3 (Significance): 

      In conclusion, I believe that this MS could potentially be important for the cell cycle field and also provide a new target pathway that could be relevant for cancer therapy. However, the paper has quite a few gaps and inconsistencies that need to be addressed with further experiments. My main worry is that the acute depletion phenotypes appear so strong, while the gene is nonessential in mice and shows only a minor fitness effect in the depmap screens. More convincing controls are necessary to rule out experimental artefacts that misguide the interpretation of the results.

      We appreciate this comment and hope that the Reviewer will agree it is still important to share our data with the field, even if the phenotypes in mice are modest.

    1. As a consequence of the amendments set out above in relation to network services,interoperability and data sharing, it is furthermore proposed to repeal the following relatedimplementing acts, by way of the applicable procedure, and to delete the correspondingempowerments:(1) Commission Regulation (EC) No 976/2009 as regards Network Services21(2) Commission Regulation (EU) No 1089/2010 on interoperability of spatial data setsand services22, and(3) Commission Regulation (EU) No 268/2010 on data and service sharing23.(4) Commission Implementing Decision (EU) 2019/1372 implementing Directive2007/2/EC as regards monitoring and reporting24.

      I read this as taking out all INSPIRE obligations, whereas the HVD reg builds on these pre-existing obligations. (Stating that sharing data / services must be open)- [ ] Crosscheck if HVD states an explicit independent mandate, without reference to INSPIRE mandates. #geonovumtb #10mins #belangrijkeerst

    1. Today's simplification package is composed of six legislative proposals.

      6 legislative proposals (but press release lists 5)

      1. Environmental assessments wrt permits
      2. industrial emissions directive
      3. SCIP database (substances of concern, in the Waste Framework directive) to be replaced with DPP ( #openvraag DPP is not in effect yet, so repeal of SCIP early / protection erosion?)
      4. Extended Producer Responsibility req changed for EU producers.
      5. INSPIRE
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This valuable study examines how mammals descend effectively and securely along vertical substrates. The conclusions from comparative analyses based on behavioral data and morphological measurements collected from 21 species across a wide range of taxa are convincing, making the work of interest to all biologists studying animal locomotion.

      We would like to greatly thank the two reviewers for their time in reviewing this work, and for their valuable comments and suggestions that will help to improve this manuscript.

      Overall, we agree with the weaknesses raised, which are mainly areas for consideration in future studies: to study more species, and in a natural habitat context.

      We will nevertheless add a few modifications to improve the manuscript, notably by making certain figures more readable, and adding definitions and bibliography in the main text concerning gait characteristics.

      We also provide brief comments on each point of weakness raised by the reviewers below, in blue.

      Reviewer #1 (Public review):

      Summary:

      This unique study reports original and extensive behavioral data collected by the authors on 21 living mammal taxa in zoo conditions (primates, tree shrew, rodents, carnivorans, and marsupials) on how descent along a vertical substrate can be done effectively and securely using gait variables. Ten morphological variables reflecting head size and limb proportions are examined in relationship to vertical descent strategies and then applied to reconstruct modes of vertical descent in fossil mammals.

      Strengths:

      This is a broad and data-rich comparative study, which requires a good understanding of the mammal groups being compared and how they are interrelated, the kinematic variables that underlie the locomotion used by the animals during vertical descent, and the morphological variables that are associated with vertical descent styles. Thankfully, the study presents data in a cogent way with clear hypotheses at the beginning, followed by results and a discussion that addresses each of those hypotheses using the relevant behavioral and morphological variables, always keeping in mind the relationships of the mammal groups under investigation. As pointed out in the study, there is a clear phylogenetic signal associated with vertical descent style. Strepsirrhine primates much prefer descending tail first, platyrrhine primates descend sideways when given a choice, whereas all other mammals (with the exception of the raccoon) descend head first. Not surprisingly, all mammals descending a vertical substrate do so in a more deliberate way, by reducing speed, and by keeping the limbs in contact for a longer period (i.e., higher duty factors).

      Weaknesses:

      The different gait patterns used by mammals during vertical descent are a bit more difficult to interpret. It is somewhat paradoxical that asymmetrical gaits such as bounds, half bounds, and gallops are more common during descent since they are associated with higher speeds and lower duty factors. Also, the arguments about the limb support polygons provided by DSDC vs. LSDC gaits apply for horizontal substrates, but perhaps not as much for vertical substrates.

      We analyzed gait patterns using methods commonly found in the literature and discussed our results accordingly. However, the study of limbs support polygons was indeed developed specifically for studying locomotion on horizontal supports, and may not be applicable for studying vertical locomotion, which is in fact a type of locomotion shared by all arboreal species. In the future, it would be interesting to consider new methods for analyzing vertical gaits.

      The importance of body mass cannot be overemphasized as it affects all aspects of an animal's biology. In this case, larger mammals with larger heads avoid descending head-first. Variation in trunk/tail and limb proportions also covaries with different vertical descent strategies. For example, a lower intermembral index is associated with tail-first descent. That said, the authors are quick to acknowledge that the five lemur species of their sample are driving this correlation. There is a wide range of intermembral indices among primates, and this simple measure of forelimb over hindlimb has vital functional implications for locomotion: primates with relatively long hindlimbs tend to emphasize leaping, primates with more even limb proportions are typically pronograde quadrupeds, and primates with relatively long forelimbs tend to emphasize suspensory locomotion and brachiation. Equally important is the fact that the intermembral index has been shown to increase with body mass in many primate families as a way to keep functional equivalence for (ascending) climbing behavior (see Jungers, 1985). Therefore, the manner in which a primate descends a vertical substrate may just be a by-product of limb proportions that evolved for different locomotor purposes. Clearly, more vertical descent data within a wider array of primate intermembral indices would clarify these relationships. Similarly, vertical descent data for other primate groups with longer tails, such as arboreal cercopithecoids, and particularly atelines with very long and prehensile tails, should provide more insights into the relationship between longer tail length and tail-first descent observed in the five lemurs. The relatively longer hallux of lemurs correlates with tail-first descent, whereas the more evenly grasping autopods of platyrrhines allow for all four limbs to be used for sideways descent. In that context, the pygmy loris offers a striking contrast. Here is a small primate equipped with four pincer-like, highly grasping autopods and a tail reduced to a short stub. Interestingly, this primate is unique within the sample in showing the strongest preference for head-first descent, just like other non-primate mammals. Again, a wider sample of primates should go a long way in clarifying the morphological and behavioral relationships reported in this study.

      We agree with this statement. In the future, we plan to study other species, particularly large-bodied ones with varied intermembral indexes.

      Reconstruction of the ancient lifestyles, including preferred locomotor behaviors, is a formidable task that requires careful documentation of strong form-function relationships from extant species that can be used as analogs to infer behavior in extinct species. The fossil record offers challenges of its own, as complete and undistorted skulls and postcranial skeletons are rare occurrences. When more complete remains are available, the entire evidence should be considered to reconstruct the adaptive profile of a fossil species rather than a single ("magic") trait.

      We completely agree with this, and we would like to emphasize that our intention here was simply to conduct a modest inference test, the purpose of which is to provide food for thought for future studies, and whose results should be considered in light of a comprehensive evolutionary model.

      Reviewer #2 (Public review):

      Summary:

      This paper contains kinematic analyses of a large comparative sample of small to medium-sized arboreal mammals (n = 21 species) traveling on near-vertical arboreal supports of varying diameter. This data is paired with morphological measures from the extant sample to reconstruct potential behaviors in a selection of fossil euarchontaglires. This research is valuable to anyone working in mammal locomotion and primate evolution.

      Strengths:

      The experimental data collection methods align with best research practices in this field and are presented with enough detail to allow for reproducibility of the study as well as comparison with similar datasets. The four predictions in the introduction are well aligned with the design of the study to allow for hypothesis testing. Behaviors are well described and documented, and Figure 1 does an excellent job in conveying the variety of locomotor behaviors observed in this sample. I think the authors took an interesting and unique angle by considering the influence of encephalization quotient on descent and the experience of forward pitch in animals with very large heads.

      Weaknesses:

      The authors acknowledge the challenges that are inherent with working with captive animals in enclosures and how that might influence observed behaviors compared to these species' wild counterparts. The number of individuals per species in this sample is low; however, this is consistent with the majority of experimental papers in this area of research because of the difficulties in attaining larger sample sizes.

      Yes, that is indeed the main cost/benefit trade-off with this type of study. Working with captive animals allows for large comparative studies, but there is a risk of variations in locomotor behavior among individuals in the natural environment, as well as few individuals per species in the dataset. That is why we plan and encourage colleagues to conduct studies in the natural environment to compare with these results. However, this type of study is very time-consuming and requires focusing on a single species at a time, which limits the comparative aspect.

      Figure 2 is difficult to interpret because of the large amount of information it is trying to convey.

      We agree that this figure is dense. One possible solution would be to combine species by phylogenetic groups to reduce the amount of information, as we did with Fig. 3 on the dataset relating to gaits. However, we believe that this would be unfortunate in the case of speed and duty factor because we would have to provide the complete figure in SI anyway, as the species-level information is valuable. We therefore prefer to keep this comprehensive figure here and we will enlarge the data points to improve their visibility, and provide the figure with a sufficiently high resolution to allow zooming in on the details.

      Reviewer #1 (Recommendations for the authors):

      As indicated in the first section above, this is a strong comparative study that addresses important questions, relative to the evolution of arboreal locomotion in primates and close mammal relatives. My recommendations should be taken in the context of improving a manuscript that is already generally acceptable.

      (1) The terms symmetrical and asymmetrical gaits should be briefly defined in the main text (not just in the Methods section) by citing work done by Hildebrand and other relevant studies. To that effect, the statement on lines 96-97 about the convergence of symmetrical gaits is unclear. What does "Symmetrical gaits have evolved convergently in rodents, scandentians, carnivorans, and marsupials" mean? Symmetrical gaits such as the walk, run, trot, etc., are pretty the norm in most mammals and were likely found in metatherians and basal eutherians. This needs clarification. On line 239, the term "ambling" is used in the context of related asymmetrical gaits. To be clear, the amble is a type of running gait involving no whole-body aerial phase and is therefore a symmetrical gait (see Schmitt et al., 2006).

      We have added a definition of the terms symmetrical and asymmetrical gaits and added references in the introduction such as: “Symmetrical gaits are defined as locomotor patterns in which the footfalls of a girdle (a pair of fore- or hindlimbs) are evenly spaced in time, with the right and left limbs of a pair of limbs being approximately 50% out of phase with each other (Hildebrand, 1966, 1967). Symmetrical gaits can be further divided into two types: diagonal-sequence gaits, in which a hindlimb footfall is followed by that of the contralateral forelimb, and lateral-sequence gaits, in which a hindlimb footfall is followed by that of the ipsilateral forelimb (Hildebrand, 1967; Shapiro and Raichlen, 2005; Cartmill et al., 2007b). In contrast, asymmetrical gaits are characterized by unevenly spaced footfalls within a girdle, with the right and left limbs moving in near synchrony (Hildebrand, 1977).” Now found in lines 87-94.

      We corrected the sentence such as “Symmetrical gaits are also common in rodents, scandentians, etc..” Now found in line 107.

      Thank you for pointing this out. We indeed did not use the right term to mention related asymmetrical gaits with increased duty factors. We removed the term « ambling » and the associated reference here. Now found in line 256.

      (2) Correlations are used in the paper to examine how brain mass scales with body mass. It is correct to assume that a correlation significantly different from 0 is indicative of allometry (in this case, positive). That said, lines are used in Figure S2 that go through the bivariate scatter plot. The vast majority of scaling studies rely on regression techniques to calculate and compare slopes, which are different statistically from correlations. In this case, a slope not significantly different from 1.0 would support the hypothesis of isometry based on geometric similarity (as brain mass and body mass are two volumes). The authors could refer to the work of Bob Martin and the 1985 edited book by Jungers and contributions therein. These studies should also be cited in the paper.

      Thank you for recommending us this better suited method. We replaced the correlations with major axis orthogonal regressions, as recommended by Martin and Barbour 1989. We found a positive slope for all species significantly different from 1 (0.36), indicating a negative allometry (we realized we were mistaken about the allometry terminology, initially reporting a “positive allometry” instead of a positive correlation).

      We corrected in the manuscript in the Results and Methods sections, and cited Martin and Barbour 1989 such as:

      “To ensure that the EQs of the different species studied are comparable and meaningful, we tested the allometry between the brain and body masses in our dataset following [84] and found a significant and positive slope for all species (major axis orthogonal regression on log transformed values: slope = 0.36, r<sup>2</sup> = 0.92, p = 5.0.10<sup>-12</sup>), indicating a negative allometry (r = 0.97, df = 19, p = 2.0.10<sup>-13</sup>), and similar allometric coefficients when restricting the analysis to phylogenetic groups (Fig. S2).” Now found in lines 289-298.

      - “To control that brain allometry is homogeneous among all phylogenetic groups, to be able to compare EQ between species, we computed major axis orthogonal regressions, following the recommendation of Martin and Barbour [84], between the Log transformed brain and body masses, over all species and by phylogenetic group using the sma package in R (Fig. S2).” Now found in lines 336-338.

      We also changed Figure S2 in Supplementary Information accordingly.

      (3) Trunk length is used as the denominator for many of the indices used in the study. In this way, trunk length is considered to be a proxy for body size. There should be a demonstration that trunk length scales isometrically with body mass in all of the mammals compared. If not the case, some of the indices may not be directly comparable.

      We did not use trunk length as a proxy for body mass, but to compute geometric body proportions in order to test whether intrinsic body proportions could be related to vertical descent behaviors, namely the length of the tail and of the fore- and hindlimbs relative to the animal. We chose those indices to quantify the capability of limbs to act as levers or counterweights to rotate the animals for this specific question of vertical descent behavior. We therefore do not think that body mass allometry with respect to trunk length is relevant to compare these indices across species here. Also, we don’t expect that trunk length (which is a single dimension) would scale isometrically with body mass, which scales more as a volume.

      (4) Given the numerous comparisons done in this study, a Bonferroni correction method should be considered to mitigate type I error (accepting a false positive).

      We had already corrected all our statistical tests using the Benjamini-Hochberg method to control for false positives; see the SuppTables Excel file for the complete results of the statistical analyses. We chose this method over the Bonferroni correction because the more modern and balanced Benjamini-Hochberg procedure is better suited for analyses involving a large number of hypotheses.

      (5) The terms "arm" and "leg" used in the main text and Table 1 are anatomically incorrect. Instead, the terms "forelimb" and hindlimb" should be used as they include the length sum of the stylopod, zeugopod, and autopod.

      Indeed, thank you for pointing that out. We have corrected this error within the manuscript as well as in the figures 4 and S3.

      (6) On p. 14, the authors make the statement that the postcranial anatomy of Adapis and Notharctus remains undescribed. The authors should consult the work of Dagosto, Covert, Godinot and others.

      We did not state that the postcranial remains of Adapis and Notharctus have not been described. However, we were unfortunately unable to find published illustrations of the known postcranial elements that could be reliably used in this study. To avoid any misunderstanding, we removed the sentence such as: “However, we could not find suitable illustrations of the known postcranial elements of these species in the literature that could be reliably incorporated into this study. Thus, we only included their reconstructed body mass and EQ,..”. Now found in lines 393-397.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 65/69 - Perchalski et al. 2021 is a single-author publication, so no et al. or w/ colleagues.

      Indeed. This has been corrected in the manuscript, now found in lines 65 and 70.

      (2) Lines 96-98 - Is it appropriate to say that the use of symmetrical gaits are examples of convergent evolution? There's less burden of evidence to state that these are shared behaviors, rather than suggesting they independently evolved across all those groups.

      We agree with this and corrected the sentence such as “Symmetrical gaits are also common in rodents, scandentians, etc..” Now found in line 107.

      (3) Line 198 - I am confused by how to interpret (-16,36 %) compared to how other numbers are presented in the rest of the paragraph.

      To avoid confusion, we rephrased this sentence such as: “In contrast, primates did not significantly reduce their speed compared to ascents when descending sideways or tail-first (Fig. 2A, SuppTables B).”  Now found in lines 207-209.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public review):

      Summary:

      In this study, the authors aim to understand how Rhino, a chromatin protein essential for small RNA production in fruit flies, is initially recruited to specific regions of the genome. They propose that asymmetric arginine methylation of histones, particularly mediated by the enzyme DART4, plays a key role in defining the first genomic sites of Rhino localization. Using a combination of inducible expression systems, chromatin immunoprecipitation, and genetic knockdowns, the authors identify a new class of Rhinobound loci, termed DART4 clusters, that may represent nascent or transitional piRNA clusters.

      Strengths:

      One of the main strengths of this work lies in its comprehensive use of genomic data to reveal a correlation between ADMA histones and Rhino enrichment at the border of known piRNA clusters. The use of both cultured cells and ovaries adds robustness to this observation. The knockdown of DART4 supports a role for H3R17me2a in shaping Rhino binding at a subset of genomic regions.

      Weaknesses:

      However, Rhino binding at, and piRNA production from, canonical piRNA clusters appears largely unaffected by DART4 depletion, and spreading of Rhino from ADMArich boundaries was not directly demonstrated. Therefore, while the correlation is clearly documented, further investigation would be needed to determine the functional requirement of these histone marks in piRNA cluster specification.

      The study identify piRNA cluster-like regions called DART4 clusters. While the model proposes that DART4 clusters represent evolutionary precursors of mature piRNA clusters, the functional output of these clusters remains limited. Additional experiments could help clarify whether low-level piRNA production from these loci is sufficient to guide Piwi-dependent silencing.

      In summary, the authors present a well-executed study that raises intriguing hypotheses about the early chromatin context of piRNA cluster formation. The work will be of interest to researchers studying genome regulation, small RNA pathways, and the chromatin mechanisms of transposon control. It provides useful resources and new candidate loci for follow-up studies, while also highlighting the need for further functional validation to fully support the proposed model.

      We sincerely thank Reviewer #1 for the thoughtful and constructive summary of our work. We appreciate the reviewer’s recognition that our study provides a comprehensive analysis of the relationship between ADMA-histones and Rhino localization, and that it raises intriguing hypotheses about the early chromatin context of piRNA cluster formation.

      We fully agree with the reviewer that our data primarily demonstrate correlation between ADMA-histones and Rhino localization, rather than direct causation. In response, we have carefully revised the text throughout the manuscript to avoid overstatements implying causality (details provided below).

      We also acknowledge the reviewer’s important point that the functional requirement of ADMA-histones for piRNA clusters specification remains to be further established. We have now added the discussion about our experimental limitations (page 18).

      Overall, we have revised the manuscript to present our findings more cautiously and transparently, emphasizing that our data reveal a correlation between ADMA-histone marks and the initial localization of Rhino, rather than proving a direct mechanistic requirement. We thank the reviewer again for highlighting these important distinctions.

      Reviewer #2 (Public review):

      This study seeks to understand how the Rhino factor knows how to localize to specific transposon loci and to specific piRNA clusters to direct the correct formation of specialized heterochromatin that promotes piRNA biogenesis in the fly germline. In particular, these dual-strand piRNA clusters with names like 42AB, 38C, 80F, and 102F generate the bulk of ovarian piRNAs in the nurse cells of the fly ovary, but the evolutionary significance of these dual-strand piRNA clusters remains mysterious since triple null mutants of these dual-strand piRNA clusters still allows fly ovaries to develop and remain fertile. Nevertheless, mutants of Rhino and its interactors Deadlock, Cutoff, Kipferl and Moonshiner, etc, causes more piRNA loss beyond these dual-strand clusters and exhibit the phenotype of major female infertility, so the impact of proper assembly of Rhino, the RDC, Kipferl etc onto proper piRNA chromatin is an important and interesting biological question that is not fully understood.

      This study tries to first test ectopic expression of Rhino via engineering a Dox-inducible Rhino transgene in the OSC line that only expresses the primary Piwi pathway that reflects the natural single pathway expression the follicle cells and is quite distinct from the nurse cell germline piRNA pathway that is promoted by Rhino, Moonshiner, etc. The authors present some compelling evidence that this ectopic Rhino expression in OSCs may reveal how Rhino can initiate de novo binding via ADMA histone marks, a feat that would be much more challenging to demonstrate in the germline where this epigenetic naïve state cannot be modeled since germ cell collapse would likely ensue. In the OSC, the authors have tested the knockdown of four of the 11 known Drosophila PRMTs (DARTs), and comparing to ectopic Rhino foci that they observe in HP1a knockdown (KD), they conclude DART1 and DART4 are the prime factors to study further in looking for disruption of ADMA histone marks. The authors also test KD of DART8 and CG17726 in OSCs, but in the fly, the authors only test Germ Line KD of DART4 only, they do not explain why these other DARTs are not tested in GLKD, the UAS-RNAi resources in Drosophila strain repositories should be very complete and have reagents for these knockdowns to be accessible.

      The authors only characterize some particular ADMA marks of H3R17me2a as showing strong decrease after DART4 GLKD, and then they see some small subset of piRNA clusters go down in piRNA production as shown in Figure 6B and Figure 6F and Supplementary Figure 7. This small subset of DART4-dependent piRNA clusters does lose Rhino and Kipferl recruitment, which is an interesting result.

      However, the biggest issue with this study is the mystery that the set of the most prominent dual-strand piRNA clusters. 42AB, 38C, 80F, and 102F, are the prime genomic loci subjected to Rhino regulation, and they do not show any change in piRNA production in the GLKD of DART4. The authors bury this surprising negative result in Supplementary Figure 5E, but this is also evident in no decrease (actually an n.s. increase) in Rhino association in Figure 5D. Since these main piRNA clusters involve the RDC, Kipferl, Moonshiner, etc, and it does not change in ADMA status and piRNA loss after DART4 GLKD, this poses a problem with the model in Figure 7C. In this study, there is only a GLKD of DART4 and no GLKD of the other DARTs in fly ovaries.

      One way the authors rationalize this peculiar exception is the argument that DART4 is only acting on evolutionarily "young" piRNA clusters like the bx, CG14629, and CG31612, but the lack of any change on the majority of other piRNA clusters in Figure 6F leaves upon the unsatisfying concern that there is much functional redundancy remaining with other DARTs not being tested by GLKD in the fly that would have a bigger impact on the other main dual-strand piRNA clusters being regulated by Rhino and ADMA-histone marks.

      Also, the current data does not provide convincing enough support for the model Figure 7C and the paper title of ADMA-histones being the key determinant in the fly ovary for Rhino recognition of the dual-strand piRNA clusters. Although much of this study's data is well constructed and presented, there remains a large gap that no other DARTs were tested in GLKD that would show a big loss of piRNAs from the main dual-strand piRNA clusters of 42AB, 38C, 80F, and 102F, where Rhino has prominent spreading in these regions.

      As the manuscript currently stands, I do not think the authors present enough data to conclude that "ADMA-histones [As a Major new histone mark class] does play a crucial role in the initial recognition of dual-strand piRNA cluster regions by Rhino" because the data here mainly just show a small subset of evolutionarily young piRNA clusters have a strong effect from GLKD of DART4. The authors could extensively revise the study to be much more specific in the title and conclusion that they have uncovered this very unique niche of a small subset of DART4-dependent piRNA clusters, but this niche finding may dampen the impact and significance of this study since other major dual-strand piRNA clusters do not change during DART4 GLKD, and the authors do not show data GLKD of any other DARTs. The niche finding of just a small subset of DART-4-dependent piRNA clusters might make another specialized genetics forum a more appropriate venue.

      We are deeply grateful to Reviewer #2 for the detailed and insightful review that carefully situates our study in the broader context of Rhino-mediated piRNA cluster regulation. We appreciate the reviewer’s recognition that our inducible Rhino expression system in OSCs provides a valuable model to explore de novo Rhino recruitment under a simplified chromatin environment.

      At the same time, we agree that the current data mainly support a role for DART4 in regulating a subset of evolutionarily young piRNA clusters, and do not demonstrate a requirement for ADMA-histones at the major dual-strand piRNA clusters such as 42AB or 38C. We have therefore revised the title and main conclusions to more accurately reflect the scope of our findings.

      We agree with the reviewer that functional redundancy among DARTs may explain why major dual-strand piRNA clusters are unaffected by DART4 GLKD. Indeed, we have tried DART1 GLKD in the germline, which shows collapse of Rhino foci in OSCs.For DART1 GLKD, two approaches were possible:

      (1) Crossing the BDSC UAS-RNAi line (ID: 36891) with nos-GAL4.

      (2) Crossing the VDRC UAS-RNAi line (ID: 110391) with nos-GAL4 and UAS-Dcr2.

      The first approach was not feasible because the UAS-RNAi line always arrived as dead on arrival (DOA) and could not be maintained in our laboratory. The second approach did not yield effective and stable knockdown (as follows).

      DART8 and CG17726 did not alter Rhino foci in OSC knockdown experiments; therefore, we did not attempt germline knockdown (GLKD) of these DARTs in the ovary.  We agree with the reviewer’s opinion that there are piRNA source loci where Rhino localization depends on DART1, and that simultaneous depletion of multiple DARTs may indeed reveal additional positive results because ADMA-histones such as H3R8me2a may be completely eliminated by the knockdown of multiple DARTs. At the same time, we note that many evolutionarily conserved piRNA clusters show a loss of ADMA accumulation compared with evolutionarily young piRNA clusters, with levels that are comparable to the background input in ChIP-seq reads. Therefore, conserved clusters such as 42AB and 38C may no longer be regulated by ADMA. Even if multiple DARTs function redundantly to regulate ADMA, it may be difficult to disrupt Rhino localization at such conserved piRNA clusters by depletion of DARTs. While disruption of Rhino localization at conserved clusters like 42AB and 38C may be challenging, we cannot exclude the possibility that DART depletion affects Rhino binding at less conserved piRNA clusters, where ADMA modification remains detectable. We added clarifications in the Discussion to acknowledge the potential redundancy with other DARTs and to note that further knockdown experiments in the germline will be necessary to test this model comprehensively (page 18).

      We appreciate the reviewer’s critical feedback, which has helped us refine the message and strengthen the interpretative balance of the paper.

      Reviewer #1 (Recommendations for the authors):

      In multiple places, the link between ADMA histones and Rhino recruitment is presented in terms that imply causality. Please revise these statements to reflect that, in most cases, the evidence supports correlation rather than direct functional necessity. Similarly, statements suggesting that ADMA histones promote Rhino spreading should be revised unless supported by direct evidence.

      We sincerely thank the reviewer for the insightful comments. We recognize that these suggestions are crucial for improving the manuscript, and we have revised it accordingly to address the concerns. The specific revisions we made are detailed below.

      (1) Page 1, line 14: The original sentence “in establishing the sites” was changed to “may establish the potential sites.”

      (2) Page 4, lines 11-12: The original sentence “genomic regions where Rhino binds at the ends and propagates in the areas in a DART4-dependent manner, but not stably anchored” was changed to “genomic regions that have ADMA-histones at their ends and exhibit broad Rhino spreading across their internal regions in a DART4dependent manner”

      (3) Page4, lines 12-15: The original sentence “Kipferl is present at the regions but not sufficient to stabilize Rhino-genomic binding after Rhino propagates.” was changed to “In contrast to authentic piRNA clusters, Kipferl was lost together with Rhino upon DART4 depletion in these regions, suggesting that Kipferl by itself is not sufficient to stabilize Rhino binding; rather, their localization depends on DART4.”

      (4) Page4, lines17-18: The original sentence “are considered to be primitive clusters” was changed to “might be nascent dual-strand piRNA source loci”.

      (5) Page 8, line 7: The original sentence “Involvement of ADMA-histones in the genomic localization of Rhino was implicated.” was changed to “Correlation of ADMA-histones in the genomic localization of Rhino was implicated.”

      (6) Page 8, lines 19-21: The original sentence “These results suggest that ADMAhistones, together with H3K9me3, contribute significantly and specifically to the recruitment of Rhino to the ends of dual-strand clusters in OSCs.” was changed to “These results raise the possibility that ADMA-histones, together with H3K9me3, may contribute specifically to the recruitment of Rhino to the ends of dual-strand clusters in OSCs.”

      (7) Page 10, lines 11-13: The original sentence “These results suggest that DART1 and DART4 are involved in Rhino recruitment at distinct genomic sites through the decreases in ADMA-histones in each of their KD conditions (H4R3me2a and H3R17me2a, respectively).” was changed to ”These results suggest that DART1 and DART4 could contribute to Rhino recruitment at distinct genomic sites through the decreases in ADMA-histones in each of their KD conditions (H4R3me2a and H3R17me2a, respectively).”

      (8) Page 13, line 2: The original sentence “Genomic regions where Rhino spreads in a DART4-dependent manner, but not stably anchored, produce some piRNAs“ was changed to “Genomic regions where Rhino binds broadly in a DART4-dependent manner, but not stably anchored, produce some piRNAs”

      (9) Page 13, lines 21-22: The original sentence “These results support the hypothesis that ADMA-histones are involved in the genomic binding of Rhino both before and after Rhino spreading, resulting in stable genome binding.” was changed to “These results raise the possibility that a subset of Rhino localized to genomic regions correlating with ADMA-histones may serve as origins of spreading.”

      (10) Page 16, lines 6-8: The original sentence “In this study, we took advantage of cultured OSCs for our analysis and found that chromatin marks (i.e., ADMA-histones) play a crucial role in the loading of Rhino onto the genome.” was changed to “In this study, we took advantage of cultured OSCs for our analysis and found that chromatin marks (i.e., bivalent nucleosomes containing H3K9me3 and ADMA-histones) appear to contribute to the initial loading of Rhino onto the genome.”

      (11) Page16, line 12: The original sentence “We propose that the process of piRNA cluster formation begins with the initial loading of Rhino onto bivalent nucleosomes containing H3K9me3 and ADMA-histones (Fig. 7C). In OSCs, the absence of Kipferl and other necessary factors means that Rhino loading into the genome does not proceed to the next step.” was removed.

      Major points

      (1)  Clarify the limited colocalization between Rhino and H3K9me3 in OSCs. The observation that FLAG-Rhino foci show minimal overlap with H3K9me3 in OSCs appears inconsistent with the proposed model by the authors in the discussion, in which Rhino is initially recruited to bivalent nucleosomes bearing both H3K9me3 and ADMA marks. This discrepancy should be addressed. 

      We thank the reviewer’s insightful comments. Indeed, ChIP-seq shows that Rhino partially overlaps with H3K9me3 (Fig. 1F), but immunofluorescence did not reveal any detectable overlap (Fig. 1A). We interpret this discrepancy as arising from the fact that immunofluorescence primarily visualizes H3K9me3 foci that are localized as broad domains in the genome, such as those at centromeres, pericentromeres, or telomeres (named chromocenters), whereas the sharp and interspersed H3K9me3 signals along chromosome arms are difficult to detect by immunofluorescence. We now have these explanations in the revised text (page 6).

      (2)  Please indicate whether the FLAG-Rhino used in OSCs has been tested for functionality in vivo-for example, by rescuing Rhino mutant phenotypes. This is particularly relevant given that no spreading is observed with this construct.

      We thank the reviewer for raising this important point. We have not directly tested the functionality of FLAG-Rhino construct used in OSCs in living Drosophila fly; i.e., it has not been used to rescue Rhino mutant phenotypes in flies. We acknowledge that FLAGRhino has not previously been expressed in OSCs, and that its localization pattern in OSCs differs from that observed in ovaries, where Rhino is endogenously expressed. However, several lines of evidence suggest that the addition of the N-terminal FLAG tag is unlikely to compromise Rhino function

      (1) In previous studies, N-terminally tagged Rhino (e.g., 3xFLAG-V5-Precision-GFPRhino) was expressed in a living Drosophila ovary and was shown to localize properly to piRNA clusters, indicating that the tag does not prevent Rhino from binding its genomic targets (Baumgartner et al., 2022; eLife. Fig. 3 supplement 1G).

      (2) In Drosophila S2 cells, FLAG-tagged tandem Rhino chromodomains construct was shown to bind H3K9me3/H3K27me3 bivalent chromatin, demonstrating that the FLAG tag does not impair this fundamental chromatin interaction (Akkouche et al., 2025; Nat Struct Mol Biol. Fig. 4b).

      (3) GFP-tagged Rhino has been demonstrated to rescue the transposon derepression phenotype of Rhino mutant flies, further supporting that the addition of tags does not abolish its in vivo function. (Parhad et al., 2017; Dev Cell. Fig.1D).

      Therefore, we interpret the partial localization of FLAG-Rhino in OSCs as reflecting the specific chromatin environment and regulatory context of OSCs rather than functional impairment due to the FLAG tag.

      (3) Given the low levels of piRNA production and the absence of measurable effects on transposon expression or fertility upon DART4 knockdown, the rationale for classifying these regions as piRNA clusters should be clearly stated. Additional experiments could help clarify whether low-level piRNA production from these loci is sufficient to guide Piwidependent silencing. The authors should also consider and discuss the possibility that some of these differences may reflect background-specific genomic variation rather than DART4-dependent regulation per see.

      We thank the reviewer for the insightful comments. As noted, DART4 knockdown did not measurably affect transposon expression or fertility. piRNAs generated from DART4associated clusters associate with Piwi but are insufficient for target repression. Although loss of DART4 largely eliminated piRNAs from these clusters, the cluster-derived transcripts themselves were unchanged. To clarify this point, we now refer to these regions as DART4-dependent piRNA-source loci (DART4 piSLs) in the revised text. We also acknowledge that some observed differences may reflect strain-specific genomic variation and have added this caveat on page 16.

      (4)  The authors should describe the genomic context of DART4 clusters in more detail. Specifically, it would be helpful to indicate whether these regions overlap with known transposable elements, gene bodies, or intergenic regions, and to report the typical size range of the clusters. Are any of the piRNAs produced from these clusters predicted to target known transcripts? 

      We thank the reviewer’s insightful comments. The overlap of DART4 piSL with transposable elements, gene bodies, and intergenic regions is shown in the right panel of Supplementary Fig. 6E (denoted as “Rhino reduced regions in DART4 GLKD” in the figure). The typical size range of these clusters is presented in Supplementary Fig. 6G. The annotation of piRNA reads derived from these piSL is shown in the right panel of Supplementary Fig. 6F, indicating that most of them appear to target host genes. The specific genes and transposons matched by the piRNAs produced from DART4 piSL are listed in Supplementary Table 8.

      (5)  While correlations between Rhino and ADMA histone marks (especially H3R8me2a,H3R17me2a, H4R3me2a) are robust, many ADMA-enriched regions do not recruit Rhino. Please discuss this observation and consider the possible involvement of additional factors.

      We thank the reviewer’s insightful comments. As pointed out, not all ADMA-enriched regions recruit Rhino; rather, Rhino is recruited only at sites where ADMAs overlap with H3K9me3. Furthermore, the combination of H3K9me3 and ADMAs alone does not fully account for the specificity of Rhino recruitment, suggesting the involvement of additional co-factors (for example, other ADMA marks such as H3R42me2a, or chromatininteracting proteins). In addition, since histone modifications—including arginine methylation—have the possibility that they are secondary consequences of modifications on other proteins rather than primary regulatory events, it is possible that DART1/4 contribute to Rhino recruitment not only through histone methylation but also via arginine methylation of non-histone chromatin-interacting factors. However, methylation of HP1a does not appear to be involved (Supplementary Fig. 3G). We have added new sentences about these points in the Discussion section (page 18).

      (6) The manuscript states that Kipferl is present at DART4 clusters but does not stabilize Rhino binding. Please specify which experimental results support this conclusion and explain.

      We apologize for the lack of clarity regarding Kipferl data. Supplementary Fig. 7A and 7B show that Kipferl localizes at major DART4 piSL. This Kipferl localization is lost together with Rhino upon DART4 GLKD, indicating that Rhino localization at DART4 piSL depends on DART4 rather than on Kipferl. From these results, we infer that, unlike at authentic piRNA clusters, Kipferl may not be sufficient to stabilize the association of Rhino with the genome at DART4 piSL. We have added this interpretation on page 14.

      Minor points

      (1) Figure 1D: Please specify which piRNA clusters are included in the metaplot - all clusters, or only the major producers? 

      We thank the reviewer for the question. The metaplot was not generated from a predefined list of “all” piRNA clusters or only the “major producers.” Instead, it was constructed from Rhino ChIP–seq peaks (“Rhino domains”) that are ≥1.5 kb in length.These Rhino domains mainly correspond to the subregions within major dual-strand clusters (e.g., 42AB, 38C) as well as additional clusters such as 80F, 102F, and eyeless, among others. We have provided the full list of domains and their corresponding piRNA clusters (with genomic coordinates) in Supplementary Table 9 and added the additional explanation in Fig. 1d legend.

      (2) Supplemental Figure 5E is referred to as 5D in the main text.

      We corrected the figure citations on pages 11-12: the reference to Supplementary Fig. 5E has been changed to 5D, and the reference to Supplementary Fig. 5F has been changed to 5E.

      (3) Supplemental Figure 7C: The color legend does not match the pie chart, which may confuse readers.

      We thank the reviewer for the helpful comment. We are afraid we were not entirely sure what specific aspect of the legend was confusing, but to avoid any possible misunderstanding, we revised Supplemental Fig. 7C so that the color boxes in the legend now exactly match the corresponding colors in the pie chart. We hope this modification improves clarity.

      (4) Since the manuscript focuses on the roles of DART1 and DART4, including their expression profiles in OSCs and ovaries would help contextualize the observed phenotypes. Please consider adding this information if available.

      We thank the reviewer for the suggestion. We have now included a scatter plot comparing RNA-seq expression in OSCs and ovaries (Supplementary Fig. 3H). In these datasets, DART1 is strongly expressed in both tissues, whereas DART4 shows no detectable reads. Notably, ref. 28 reports strong expression of both DART1 and DART4 in ovaries by western blot and northern blot. In our own qPCR analysis in OSCs, DART4 expression is about 3% of DART1, which, although low, may still be sufficient for functional roles such as modification of H3R17me2a (Fig. 3C, Supplementary Fig. 3F and 3I). We have added these new data and additional explanation in the revised manuscript (page 11).

      (5) Several of the genome browser snapshots, particularly scale and genome coordinates, are difficult to read. 

      We apologize for the difficulty in reading several of the genome browser snapshots in the original submission. We have re-generated the relevant figures using IGV, which provides clearer visualization of scale and genome coordinates. The previous images have been replaced with the improved versions in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors need to elaborate on what this sentence means, as it is very unclear what they are describing about Rhino residency: "The results show that Rhino in OSCs tends to reside in the genome where Rhino binds locally in the ovary (Fig. 1C)." 

      We apologize for the lack of clarity in the original sentence. The text has been revised as follows:

      ”Rhino expressed in OSCs bound predominantly to genomic sites exhibiting sharp and interspersed Rhino localization patterns in the ovary, while showing little localization within broad Rhino domains, including major piRNA clusters.”

      In addition, to clarify the behavior of Rhino at broad domains, we have added the phrase “the terminal regions of broad domains, such as major piRNA clusters” to the subsequent sentence.

      (2) The red correlation line is very confusing in Figure 5F. What sort of line does this mean in this scatter plot? 

      We apologize for the lack of clarity regarding the red line in Fig. 5F. The red line represents the least-squares linear regression fit to the data points, calculated using the lm() function in R, and was added with abline() to illustrate the correlation between ctrl GLKD and DART4 GLKD values. In the revised figure, we have clarified this in the legend by specifying that it is a regression line.

      (3) There is no confirmation of the successful knockdown of the various DARTs in the OSCs.

      We thank the reviewer for the comment. The knockdown efficiency of the various DARTs in OSCs was confirmed by RT–qPCR. The data are now shown in Supplementary Fig. 3J. 

      (4) What is the purpose of an unnumbered "Method Figure" in the supplementary data file? Why not just give it a number and mention it properly in the text? 

      We thank the reviewer for the suggestion. We have now assigned a number to the previously unnumbered "Method Figure" and have included it as Supplementary Fig. 9.

      The figure is now properly cited in the Methods section.

      (5) For Figure 5A, those fly strain numbers in the labels are better reserved in the Methods, and a more appropriate label is to describe the GAL4 driver and the UAS-RNAi construct by their conventional names.

      We thank the reviewer for the suggestion. The labels in Fig. 5A have been updated to use the conventional names of the GAL4 drivers and UAS-RNAi constructs. Specifically, they now read Ctrl GLKD (nos-GAL4 > UAS-emp) and DART4 GLKD (nos-GAL4 > UASDART4). The original fly strain numbers are listed in the Methods section.

    1. Reviewer #2 (Public review):

      Summary:

      In this paper, authors used MEFs expressing the R1441G mutant of leucine-rich repeat kinase 2 (LRRK2), a mutant associated with the early onset of Parkinson's disease. They report that in these cells LAMP2 fluorescence is higher but BMP fluorescence is lower, MVE size is reduced and that MVEs contain less ILVs. They also report that LAMP2-positive EVs are increased in mutant cells in a process sensitive to LRRK2 kinase inhibition but are further increased by glucocerebrosidase (GCase) inhibition, and that total di-22:6-BMP and total di-18:1-BMP are increased in mutant LRRK2 MEFs compared to WT cells by mass spectrometry. They also report that LRRK2 kinase inhibition partially restores cellular BMP levels, and that GCase inhibition further increased BMP levels, and that in EVs from the LRRK2 mutant, LRRK2 inhibition decreases BMP while GCase inhibition has the opposite effect. Moreover, they report that BMP increase is not due to increased BMP synthesis, although authors observe that CLN5 is increased in LRRK2 mutant cells. Finally, they report that GW4869 decreases EV release and exosomal BMP, while bafilomycin A1 increases EV release. They conclude that LRRK2 regulates BMP levels (in cells) and release (via EVs). They also conclude that the process is modulated by GCase in LRRK2 mutant cells, and that these studies may contribute to the use of BMP-positive EVs as a biomarker for Parkinson's disease and associated treatments.

      Strengths:

      This is a potentially interesting paper,. However, I had comments that authors needed to address to clarify some aspects of their study.

      Weaknesses:

      (1) The authors seem to have missed the point in their reply to my first comment. They mention the paper by Stuffers et al., who reports that endosome biogenesis continues without ESCRT. This is a nice paper, but it is irrelevant to the subject at hand. In my initial comment, I drew the author's attention to an apparent contradiction: higher LAMP2 staining in R1441G LRRK2 knock-in MEFs and yet smaller MVEs with a reduced surface area. LAMP2 being one of the major glycoproteins of MVE's limiting membrane, one would have expected lower LAMP2 staining if cells contain fewer and smaller MVEs. Authors now state that elevated LAMP2 expression in cells expressing R1441G reflects a cell type-specific effect (differential penetrance of LRRK2 signaling on lysosomal biogenesis), because amounts of LAMP1 and CD63 are similar in cells from LRRK2 G2019S PD patients and control cells (new Fig 7A-F). However, authors still conclude that LRRK2 modulates the lysosomal network, including LAMP2 and CLN5. Does it?

      Similarly, the mass spec analysis of BMP (Fig S1H) does not support the data in Fig 1. Does this Table include all major isoforms found in these cells? If so, the dominant isoform is by far the di-18:1 isoform in wt and R1441G cells (at least 10X more abundant than other isoforms). Now, di-18:1-BMP is roughly 4X more abundant in R1441G cells when compared to wt cells, while BMP is reduced by half in R1441G cells (light microscopy in Fig 1). Authors argue that light microscopy may only detects a so-called antibody accessible pool. What is this? And why would this pool decrease in R1441G cells when LAMP2 is higher? Alternatively, they argue that the anti-BMP antibody may be less specific and detect other analytes. As I had already mentioned, this makes no sense, since the observed signal is lower and not higher. If authors do not trust their light microscopy analysis, why show the data?

      (2) Cells contain 3 LAMP2 isoforms. Which one is upregulated and/or secreted in exosomes?

      (3) The new Fig S4A is far from convincing. How were cells fractionated and what are the gradients (not described in Methods)? CD63 (presumably endolysosomes) is spread over fractions 8 - 13. LRRK2 (fractions 8-9) does not copurify with CD63. The bulk of LRRK2 is at the bottom (presumably cytosol if this is a floatation gradient), and a minor fraction moves into the gradient. CLN5 is even less clear since the bulk is also at the bottom with a tiny fraction only between LRRK2 and CD63. Also, why do authors conclude that a considerable pool of newly synthesized CLN5 did not reach its final destination at the endolysosome and may instead be retained in the ER? Where is the ER on the gradient?

      (4) Fig S4B shows blots of whole cell lysates from CTRL and LRRK2 mutant-derived fibroblasts: 6 lanes are shown but without captions, containing varying amounts of calnexin and CD63. In addition, the blots look very dirty. Where is CD63? Is it the minor band at ≈37 kD (as in Fig S4A)? Or the major band below the 50kD marker? What are the other bands on these blots? As a result, the quantification shown in the bar graph does not mean much.

      (5) The cell content of 18.1-BMP is increased approx. 5X by BafA1 (Fig 6C) but amounts of 18.1-BMP secreted in EVs hardly changes (Fig 6E). Since BMP is mostly present as 18.1 isoform (22:6-BMP being only a minor species, Fig S1H), does it mean that BafA1 does not increase BMP secretion and/or only a minor fraction of total cellular BMP is secreted in exosomes?

      Comments on revisions:

      How come 0.2 mmol/L of 22:6 and 18:1 fatty acid both correspond to 65 µg/mL (Fig 4A)?

      It is stated in the Legend of Fig4 that long (B-C) and short (D) chase time points are shown as fold change. There is no panel D in the figure.

    2. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study presents the potentially interesting concept that LRRK2 regulates cellular BMP levels and their release via extracellular vesicles, with GCase activity further modulating this process in mutant LRRK2-expressing cells. However, the evidence supporting the conclusions remains incomplete, and certain statistical analyses are inadequate. This work would be of interest to cell biologists working on Parkinson's disease.

      Reviewer #1 (Public review):

      Summary:

      Even though mutations in LRRK2 and GBA1 (which encodes the protein GCase) increase the risk of developing Parkinson's disease (PD), the specific mechanisms driving neurodegeneration remain unclear. Given their known roles in lysosomal function, the authors investigate how LRRK2 and GCase activity influence the exocytosis of the lysosomal lipid BMP via extracellular vesicles (EVs). They use fibroblasts carrying the PDassociated LRRK2-R1441G mutation and pharmacologically modulate LRRK2 and GCase activity.

      Strengths:

      The authors examine both proteins at endogenous levels, using MEFs instead of cancer cells. The study's scope is potentially interesting and could yield relevant insights into PD disease mechanisms.

      Weaknesses:

      Many of the authors' conclusions are overstated and not sufficiently supported by the data. Several statistical errors undermine their claims. Pharmacological treatment is very long, leading to potential off-target effects. Additionally, the authors should be more rigorous when using EV markers.

      We thank the reviewer for these valuable observations. In the revised manuscript, we have addressed each of these points as follows:

      (1) Conclusions and data support – We carefully revised our text throughout the manuscript to ensure that all conclusions are better supported by the presented data. For instance, we now explicitly state that while pharmacological modulation supports the regulatory role of LRRK2 activity in EV-mediated BMP release, we have softened our conclusions concerning the contribution of GCase in this model (see revised Results and Discussion sections).

      (2) Statistical analyses – We reanalyzed experiments involving more than two groups and replaced simple t-tests with non-parametric Kruskal-Wallis tests followed by Dunn’s post hoc comparisons. This approach, described in the updated figure legends (e.g., Figure 2D-F and H-J), provides a more rigorous statistical framework that accounts for small sample sizes and variability typical of EV quantifications.

      (3) Pharmacological treatment duration – Prolonged MLi-2 treatments have been extensively used in the field without evidence of significant off-target effects. Several studies, including Fell et al. (2015, J Pharmacol Exp Ther 355:397-409), De Wit et al. (2019, Mol Neurobiol 56:5273-5286), Ho et al. (2022, NPJ Parkinson’s Dis 8:115),Tengberg et al. (2024, Neurobiol Dis 202:106728), and Jaimon et al. (2025, Sci Signal 18:eads5761), have applied long-term (24-48 h) MLi-2 treatments at comparable concentrations without detecting toxicity or off-target alterations, including in MEFs (Ho et al., 2022; Dhekne et al., 2018, eLife 7:e40202).  In our study, 48-hour incubations were necessary to sustain full LRRK2 inhibition throughout the extracellular vesicle (EV) collection period. EV biogenesis, BMP biosynthesis, and packaging into EVs are timedependent processes; therefore, extended incubation and collection periods (48 h) were required to allow downstream effects of LRRK2 inhibition on BMP production and release to manifest, and to obtain sufficient EV material for biochemical and lipidomic analyses. This experimental design also reflects our and others’ previous observations in humans and non-human primates, where urinary BMP changes are associated with chronic or subchronic LRRK2 inhibitor treatment (Baptista MAS, Merchant K, et al. Sci Transl Med. 2020, 12:eaav0820; Jennings D, et al. Sci Transl Med. 2022, 14:eabj2658; Maloney MT, et al. Mol Neurodegener. 2025, 20:89). Importantly, under these conditions, we did not observe significant changes in cell viability or morphology, supporting that the treatment was well tolerated.  We have clarified this rationale in the revised Methods section to emphasize that the prolonged incubation reflects the experimental design for EV isolation rather than a requirement for achieving LRRK2 inhibition.

      (4) EV markers – We and others have reported enrichment of Flotillin-1 and LAMP proteins in isolated small EV fractions (Kowal et al., 2016; Lu et al., 2018; Mathieu et al., 2021; Ferreira et al., 2022). Moreover, LAMP proteins have been reported to be more enriched in EVs of endolysosomal origin (Mathieu et al., 2021). To further strengthen this point, we performed new experiments using a CD63-pHluorin sensor combined with TIRF microscopy, which allowed real-time visualization of CD63-positive exosome release. These new data (now presented in Figure 7, Panels G-I; Videos 1 and 2) confirm increased CD63-positive EV release in LRRK2 mutant fibroblasts, which was reversed by LRRK2 inhibition with MLi-2. The CD63-positive compartment was also largely BMPpositive (new Figure 7D, F, G), reinforcing our conclusions and providing additional rigor in EV marker validation.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors used MEFs expressing the R1441G mutant of leucine-rich repeat kinase 2 (LRRK2), a mutant associated with the early onset of Parkinson's disease. They report that in these cells LAMP2 fluorescence is higher but BMP fluorescence is lower, MVE size is reduced, and that MVEs contain less ILVs. They also report that LAMP2-positive EVs are increased in mutant cells in a process sensitive to LRRK2 kinase inhibition but are further increased by glucocerebrosidase (GCase) inhibition, and that total di-22:6-BMP and total di-18:1-BMP are increased in mutant LRRK2 MEFs compared to WT cells by mass spectrometry. They also report that LRRK2 kinase inhibition partially restores cellular BMP levels, and that GCase inhibition further increases BMP levels, and that in EVs from the LRRK2 mutant, LRRK2 inhibition decreases BMP while GCase inhibition has the opposite effect. Moreover, they report that the BMP increase is not due to increased BMP synthesis, although the authors observe that CLN5 is increased in LRRK2 mutant cells. Finally, they report that GW4869 decreases EV release and exosomal BMP, while bafilomycin A1 increases EV release. They conclude that LRRK2 regulates BMP levels (in cells) and release (via EVs). They also conclude that the process is modulated by GCase in LRRK2 mutant cells, and that these studies may contribute to the use of BMP-positive EVs as a biomarker for Parkinson's disease and associated treatments.

      Strengths:

      This is an interesting paper, which provides novel insights into the biogenesis of exosomes with exciting biomedical potential. However, I have comments that authors need to address to clarify some aspects of their study.

      Weaknesses:

      (1) The intensity of LAMP2 staining is increased significantly in cells expressing the R1441G mutant of LRRK2 when compared to WT cells (Figure 1C). Yet mutant cells contain significantly smaller MVEs with fewer ILVs, and the MVE surface area is reduced (Figure 1D-F). This is quite surprising since LAMP2 is a major component of the limiting membrane of late endosomes. Are other proteins of endo-lysosomes (eg, LAMP1, CD63, RAB7) or markers (lysotracker) also decreased (see also below)?

      As referenced in our original manuscript, several previous studies have reported endolysosomal morphological and homeostatic defects in cells harboring pathogenic LRRK2 mutations. LAMP2 can be upregulated as part of a lysosomal biogenesis or stress response (e.g., via MiT/TFE transcription factors such as TFEB; Sardiello et al., Science 2009, 325:473-477), whereas ILV biogenesis is primarily controlled by ESCRT- and SMPD3-dependent pathways that are regulated independently of MiT/TFE-driven transcriptional programs. Indeed, Stuffers et al. (Traffic 2009, 10:925-937) demonstrated that depletion of key ESCRT subunits markedly inhibited ILV formation while concomitantly increasing LAMP2 expression, highlighting the mechanistic dissociation between LAMP2 abundance and ILV number. In our study, we observed a similar pattern in R1441G LRRK2 MEFs, in which elevated LAMP2 staining and protein levels occurred despite a reduction in MVE size and ILV number. We interpret this as a compensatory lysosomal biogenesis response.

      Our revised manuscript now includes new immunofluorescence data for BMP, LAMP1 and CD63 (New Figure 7, Panels A-F) together with biochemical analysis of CD63 protein levels (New Supplemental Figure 4, Panel B) in human skin fibroblasts derived from healthy donors and LRRK2 G2019S PD patients. Quantitative analysis of these experiments revealed no statistically significant differences in total cellular levels of either LAMP1 or CD63 between groups. However, we observed a consistent decrease in BMP immunostaining intensity (New Figure 7, Panel A and B), in agreement with our findings in mouse fibroblasts. We therefore propose that the elevated LAMP2 expression observed in the engineered MEF clone expressing R1441G may reflect a cell type-specific effect, potentially linked to differential penetrance of LRRK2 signaling on the lysosomal biogenesis response. We have updated the Results and Discussion section of the manuscript to incorporate and clarify these findings.

      (2) LRRK2 has been reported to interact with endolysosomal membranes. Does the R1441G mutant bind LAMP2- and/or BMP-positive membranes? 

      We agree that LRRK2 has been reported to associate dynamically with endolysosomal membranes, particularly under conditions of endolysosomal stress or damage (Eguchi T, et al. PNAS 2018, 115:E9115-E9124; Bonet-Ponce L, et al. Sci Adv. 2020, 6:eabb2454; Wang X, et al. Elife. 2023, 12:e87255).

      Nevertheless, to explore whether LRRK2 associates with BMP-positive endolysosomes, we performed subcellular fractionation followed by biochemical analysis of endolysosomal fractions, since our available LRRK2 antibodies did not provide reliable immunofluorescence signals. These experiments were carried out using human skin fibroblasts derived from both healthy controls and Parkinson’s disease patients carrying the LRRK2-G2019S mutation. In both control and mutant fibroblasts, a pool of LRRK2 was detected in fractions positive for the BMP synthase CLN5 and the endolysosomal marker CD63 (New Supplementary Figure 4, Panel A), supporting the localization of LRRK2 to endolysosomal membranes that are likely BMP-enriched. Our manuscript’s Results and Methods sections have been updated accordingly.

      Does the mutant affect endolysosomes?

      As referenced in our original manuscript, several studies have reported that pathogenic LRRK2 mutations can lead to endolysosomal defects. Consistent with these reports, we also observed morphological alterations in endolysosomes of cells expressing mutant LRRK2, including reduced MVE size and fewer ILVs, as shown in Figure 1D–F. These observations are in agreement with previously described phenotypes associated with pathogenic LRRK2 variants. Furthermore, in mutant LRRK2 MEFs, and now in humanderived fibroblasts (see new Figure 7, Panel A and B), we observed a decrease in BMP immunostaining signal.

      (3) Immunofluorescence data indicate that BMP is decreased in mutant LRRK2expressing cells compared to WT (Figure 1A-B), but mass spec data indicate that di-22:6BMP and di-18:1-BMP are increased (Figure 3). Authors conclude that the BMP pool detected by mass spec in mutant cells is less antibody-accessible than that present in wt cells, or that the anti-BMP antibody is less specific and that it detects other analytes. This is an awkward conclusion, since the IF signal with the antibody is lower (not higher): why would the antibody be less specific? Could it be that the antibody does not see all BMP isoforms equally well? Moreover, the observations that mutant cells contain smaller MVEs (Figure 1D-F) with fewer ILVs are consistent with the IF data and reduced BMP amounts. This needs to be clarified.

      As previously reported by us (Lu et al., J Cell Biol 2022;221:e202105060) and others (Berg AL, et al. Cancer Lett. 2023, 557:216090), discrepancies can occur between BMP levels detected by immunofluorescence and those quantified by mass spectrometry. This is because immunostaining reflects the pool of antibody-accessible BMP, whereas lipidomics measures the total cellular content of all BMP molecular species, irrespective of their distribution or accessibility.

      We agree that the anti-BMP antibody may not detect all BMP isoforms equally well. Differences in acyl chain composition (such as the degree of saturation or chain length) can alter the stereochemistry of BMP and, consequently, epitope accessibility to antibody binding.

      In addition, in a personal communication with Monther Abu-Remaileh (Stanford University), we were informed that the antibody may also cross-react with other lipid species in endolysosomes. Nevertheless, since there is no formal evidence supporting this, we have removed the sentence in the Discussion section stating “Alternatively, the antibody may also detect non-BMP analytes” to avoid any potential misinterpretations. In its place, we have added a short statement noting that “not all BMP isoforms may be detected equally well”.

      Mass spectrometry data are only shown for two BMP species (di-22:6, di-18:1). What are the major BMP isoforms in WT cells? The authors should show the complete analysis for all BMP species if they wish to draw quantitative conclusions about the amounts of BMP in wt and mutant cells. Finally, BMP and PG are isobaric lipids. Fragmentation of BMPs or PGs results in characteristic fingerprints, but the presence of each daughter ion is not absolutely specific for either lipid. This should be clarified, e.g., were BMP and PG separated before mass spec analysis? Was PG affected? The authors should also compare the BMP data with mass spec data obtained with a control lipid, e.g., PC.

      Regarding BMP isoforms, our targeted UPLC-MS/MS analyses revealed that 2,2′-di-22:6-BMP (sn2/sn2′) and 2,2′-di-18:1-BMP (sn2/sn2′) are the predominant BMP isoforms in MEF cells, consistent with previous reports showing docosahexaenoyl (22:6; DHA) and oleoyl (18:1) BMP as the most abundant isoforms. Across diverse mammalian cells and tissues, BMP typically exhibits a fatty acid composition dominated by oleoyl, with polyunsaturated fatty acids (particularly DHA) also contributing substantially. Enrichment of DHA-containing BMP species has been observed in multiple systems, including rat uterine stromal cells, PC12 cells, THP-1 and RAW macrophages, as well as in rat and human liver. This consistent presence of oleoyl- and docosahexaenoyl-containing BMP species across tissues indicates that these acyl chains are conserved features influencing the lipid’s structural and functional characteristics (Kobayashi et al. J Biol Chem, 2002; Hullin-Matsuda et al. Prostaglandins Leukotriens Essent Fatty Acids, 2009; Thompson et al. Int J Toxicol. 2012; Delton-Vandenbroucke et al. J Lipid Res, 2019).

      Nevertheless, we have included a Table (Panel H in updated Supplemental Figure 1) showing other BMP species that were also detected in our lipidomics analysis. Overall, dioleoyl (18:1)- and di-docosahexaenoyl (22:6)-BMP species were the most abundant in MEF cells, whereas di-arachidonoyl (20:4)- and di-linoleoyl (18:2)-BMP isoforms were present at lower levels. Consistently, R1441G LRRK2 MEFs displayed higher levels of dioleoyl- and di-docosahexaenoyl-BMP compared with WT cells, and these elevations were reduced following LRRK2 kinase inhibition with MLi-2. Data from three independent representative experiments are shown, and the manuscript has been revised accordingly to include these results.

      Regarding the separation of BMP and PG species, we confirm that BMP and PG were chromatographically resolved prior to MS/MS detection using a validated UPLC-MS/MS method developed by Nextcea, Inc. PG exhibits a substantially longer LC retention time than BMP, ensuring complete baseline separation. This approach (established by Nextcea nearly two decades ago and later validated through a multi-year collaboration with the U.S. FDA to clinically qualify di-22:6-BMP as a biomarker) prevents any ambiguity arising from the isobaric nature of BMP and PG species. No changes in PG levels were detected under any experimental conditions.

      Finally, we employed isotope-labeled BMP as an internal standard to ensure robust normalization across samples. These additional details and references cited above have been included in the revised Methods and References sections to further clarify the analytical rigor of our lipidomics workflow.

      (4) It is quite surprising that the amounts of labeled BMP continue to increase for up to 24h after a short 25min pulse with heavy BMP precursors (Figure 4B).

      In these isotope-labeling experiments, it is important to note (as described in our original manuscript) that two distinct pools of metabolically labeled BMP species were detected: semi-labeled BMP (with only one heavy isotope-labeled fatty acyl chain) and fully-labeled BMP (with both fatty acyl chains labeled). We consider the fully-labeled BMP pool to provide the most reliable readout for BMP turnover, as it showed a rapid decline after a 1h chase (decreasing by more than 50% within 8 h in all conditions), reaching its lowest levels at the end of the 48-h chase period.

      The apparent increase in semi-labeled BMP species over time may be explained by continued incorporation of labeled precursors following the initial pulse. Specifically, once existing semi-labeled and fully-labeled BMP molecules are degraded by PLA2G15 (Nyame K, et al. Nature 2025, 642:474-483), the resulting isotope-labeled lysophosphatidylglycerol (LPG) and fatty acids could be recycled and re-enter a new round of BMP biosynthesis, leading to a gradual accumulation of semi-labeled BMP such as di-18:1-BMP. Why would this reasoning not also apply to the fully-labeled species? Once the pulse is completed, newly incorporated non-labeled fatty acyl chains present in the cellular pool can compete with labeled ones during subsequent rounds of lipid remodeling or synthesis. As a result, the probability of generating semi-labeled BMP molecules becomes higher than that of forming fully-labeled species. Consistent with this, our data show an increase in only semi-labeled BMP species (but not in fully-labeled ones) up to 24 hours after the pulse. We have added a clarification regarding this point in the revised manuscript.

      (5) It is argued that upregulation of CLN5 may be due to an overall upregulation of lysosomal enzymes, as LAMP2 levels were also increased (Figure 2A, C, E). Again, this is not consistent with the observed decrease in MVE size and number (Figure 1D-F). As mentioned above, other independent markers of endo-lysosomes should be analyzed (eg, LAMP1, CD63, RAB7), and/or other lysosomal enzymes (e.g. cathepsin. D).

      Our revised manuscript now includes new immunofluorescence data for BMP, LAMP1 and CD63 (New Figure 7, Panels A-F) together with biochemical analysis of CD63 protein levels (New Supplemental Figure 4, Panel B) in human skin fibroblasts derived from healthy controls and LRRK2 G2019S PD patients. Quantitative analysis of these experiments revealed no statistically significant differences in total cellular levels of either LAMP1 or CD63 between groups. However, our results consistently show increased CLN5 protein levels in both mouse and human fibroblast cell lines harboring pathogenic LRRK2 mutations. Upregulation of CLN5 may reflect a compensatory effect from loss of BMP via EV exocytosis. As discussed above, the elevated LAMP2 signal observed in the engineered MEF clone expressing R1441G could represent a cell type-specific effect, potentially linked to differential penetrance of LRRK2 signaling on the lysosomal biogenesis response. Our Results and Discussion sections have been updated accordingly.

      (6) The authors report that the increase in BMP is not due to an increase in BMP synthesis (Figure 4), although they observe a significant increase in CLN5 (Figure 5A) in LRRK2 mutant cells. Some clarification is needed.

      In our original manuscript, we proposed that although CLN5 protein levels are increased in R1441G LRRK2 MEFs, the absence of significant changes in BMP synthesis rates (Figure 4B, C) may reflect either limited substrate availability or that CLN5 is already operating near its maximal enzymatic capacity. Our new subcellular fractionation data (new Figure 7, Panel A) further indicate that, despite a relative increase in total CLN5 levels in G2019S LRRK2 human fibroblasts, the amount of CLN5 associated with endolysosomes remains comparable between mutant LRRK2 and control cells. This suggests that a considerable fraction of upregulated CLN5 may not localize to endolysosomes, potentially accumulating in the endoplasmic reticulum due to enhanced translation or impaired trafficking. Unfortunately, the available anti-CLN5 antibody did not yield reliable immunofluorescence signals, preventing us from directly confirming this possibility. Nevertheless, in light of our new data (new Supplemental Figure 4A), we have included a clarification in the revised manuscript discussing this possibility as well.

      (7) Authors observe that both LAMP2 and BMP are decreased in EVs by GW4869 and increased by bafilomycin (Figure 6). Given my comments above on Figure 1, it would also be nice to illustrate/quantify the effects of these compounds on cells by immunofluorescence.

      We appreciate the reviewer’s suggestion. We have previously published immunofluorescence data showing increased BMP accumulation in endolysosomes following treatment with bafilomycin A1 Lu A, et al. J Cell Biol. 2009, 184:863-879). However, in the present study, our lipidomics analyses revealed a decrease in both di22:6-BMP and di-18:1-BMP species in cells treated with this compound. As discussed above, this apparent discrepancy likely reflects methodological differences between immunofluorescence, which detects only antibody-accessible BMP pools, and lipidomics, which quantifies total cellular BMP content. 

      Moreover, in a recent study (Andreu Z, et al. Nanotheranostics 2023, 7:1-21), BMP levels were analyzed by immunofluorescence in cells treated with spiroepoxide, a potent and selective irreversible inhibitor of nSMase (different from GW4869) known to block EV release. Spiroepoxide-treated cells showed decreased BMP immunostaining; a result that, again, does not align with mass spectrometry data revealing increased cellular BMP levels upon GW4869 treatment. Notably, in that study, spiroepoxide was used instead of GW4869 because the intrinsic autofluorescence of GW4869 could potentially interfere with the immunofluorescence BMP signal.

      We therefore consider lipidomics measurements to provide a more reliable and quantitative representation of BMP dynamics under these conditions.

      Reviewer #1 (Recommendations for the authors):

      Major concerns:

      (1) 48 h for MLi2 treatment seems too long. LRRK2 kinase activity is inhibited with much shorter incubation times. The longer the incubation, the more likely off-target effects are. The authors should repeat these experiments with 1-2 h of MLi2.

      We thank the reviewer for this valuable comment. We acknowledge that MLi-2 is a potent and selective LRRK2 kinase inhibitor that achieves near-complete target engagement within a few hours of treatment. However, prolonged exposure has been widely used in the field without evidence of significant off-target effects. Several studies, including Fell et al. (2015, J Pharmacol Exp Ther 355:397-409), De Wit et al. (2019, Mol Neurobiol 56:5273-5286), Ho et al. (2022, NPJ Parkinson’s Dis 8:115), Tengberg et al. (2024, Neurobiol Dis 202:106728), and Jaimon et al. (2025, Sci Signal 18:eads5761), have employed long-term (24-48 h) MLi-2 treatments at comparable concentrations without detecting toxicity or off-target alterations, including in MEFs (Ho et al., 2022; Dhekne et al., 2018, eLife 7:e40202).

      In our study, 48-hour incubations were necessary to sustain full LRRK2 inhibition throughout the extracellular vesicle (EV) collection period. EV biogenesis, BMP biosynthesis, and packaging into EVs are time-dependent processes; therefore, extended incubation and collection periods (48 h) were required to allow downstream effects of LRRK2 inhibition on BMP production and release to manifest, and to obtain sufficient EV material for biochemical and lipidomic analyses. This experimental design also reflects our and others’ previous observations in humans and non-human primates, where urinary BMP changes are associated with chronic or subchronic LRRK2 inhibitor treatment (Baptista MAS, Merchant K, et al. Sci Transl Med. 2020, 12:eaav0820; Jennings D, et al. Sci Transl Med. 2022, 14:eabj2658; Maloney MT, et al. Mol Neurodegener. 2025, 20:89). Importantly, under these conditions, we did not observe significant changes in cell viability or morphology, supporting that the treatment was well tolerated.

      We have clarified this rationale in the revised Methods section to emphasize that the prolonged incubation reflects the experimental design for EV isolation rather than a requirement for achieving LRRK2 inhibition.

      (2) Is there a reason why the authors don't include CD81, CD63, and Syntenin-1 in their study as an EV marker? Using solely Flotilin-1 does not seem to be enough to justify their claims.

      We actually used not only Flotillin-1 but also LAMP2 as EV markers in our study. While both Flotillin-1 and LAMP2 detection on EVs may vary depending on the cell type, we and others have reported enrichment of Flotillin-1 and LAMP proteins in isolated small EV fractions (Kowal et al., 2016; Lu et al., 2018; Mathieu et al., 2021; Ferreira et al., 2022). In particular, one of these studies reported that “LAMP1-positive subpopulations of EVs represent MVB/lysosome-derived exosomes, which also contain syntenin-1.” Therefore, our choice of EV markers (LAMP2 and Flotillin-1) is consistent with those previously and reliably used to characterize small EVs.

      Nevertheless, to further address the reviewer’s concern, we performed additional experiments using a CD63-based fluorescence sensor (CD63-pHluorin), which, combined with TIRF microscopy, enables real-time visualization of CD63-positive exosome release. These experiments were conducted in control and LRRK2-mutant fibroblasts, and the data are presented in new Figure 7 (Panels G-I; Videos 1 and 2). We have also included all relevant references and clarified this point in the revised manuscript.

      (3) Indeed, to quantify the amount of certain proteins in EVs, the authors should normalize them by CD63 or CD81.

      Protein normalization in isolated EV fractions is indeed challenging. Although tetraspanins such as CD63 and CD81 are commonly enriched in EVs, their abundance can vary considerably across EV subpopulations, cell types, and experimental conditions, making them unreliable as universal normalization markers (Théry et al., J Extracell Vesicles, 2018; Margolis & Sadovsky, Nat Rev Mol Cell Biol, 2019).  Current guidelines from the International Society for Extracellular Vesicles (ISEV), as described in the Minimal Information for Studies of Extracellular Vesicles 2018 (MISEV2018; Théry C, et al. JExtracell Vesicles. 2018, 7:1535750) and updated in MISEV2024 (Welsh JA, et al. J Extracell Vesicles. 2024, 13:e12404), recommend reporting multiple EV markers rather than relying on a single protein for normalization. They also suggest ensuring comparable experimental conditions by using the same number of cells at the start of the experiment and normalizing EV data to cell number or whole-cell lysate protein content at the end of the experiment, among other approaches.

      In our study, we normalized EV data to whole-cell lysate (WCL) protein content, as this approach accounts for differences in EV production due to variations in cell number or treatment conditions and is commonly used in the field (Kowal et al., PNAS, 2016; Mathieu et al., Nat Commun, 2021). We also included Flotillin-1 and LAMP2 as EV markers, both of which have been validated as molecular markers of small EV subpopulations.

      (4) Hyper normalization in WB quantification in Figure 2E-G is statistically incorrect, as it assumes that one group (in this case, R1441G ctrl) has no variability at all, which is not biologically possible. The authors should repeat the quantification without hypernormalizing one of their groups. This issue is prevalent across the whole manuscript.

      We understand the concern regarding “hyper-normalization” (i.e., expressing all values relative to one condition set to 1), which may mask variability in the reference group. However, it is standard practice in immunoblotting analysis to express data relative to a control condition for comparison, as variations in membrane transfer, exposure time, and signal development can differ across blots. In our case, the data are expressed as relative levels (arbitrary units) rather than absolute quantitative values. To facilitate comparison between datasets and account for inter-experimental variation, we continued to express values relative to the mutant LRRK2 MEF condition.

      On the other hand, in lipidomics experiments, despite using the same number of seeded cells and identical extraction and analysis protocols, minor biological and technical variability was observed across independent replicates. This variability is inherent to the experimental system and is now explicitly represented in the new table included in Supplemental Figure 1F, which compiles three independent representative lipidomics experiments showing quantitative BMP levels across different conditions.

      (5) The authors perform a t-test in Figure 2E-G when comparing more than 2 groups, which is wrong. The authors should use a two-way ANOVA as they are comparing genotype and treatment.

      We appreciate the reviewer’s comment and agree with this observation. The MLi-2 and CBE experiments were performed independently and in separate experimental runs; therefore, we have reanalyzed these datasets separately rather than combining them in a two-way ANOVA. To properly compare more than two groups within each dataset, we have now applied a Kruskal-Wallis test followed by an uncorrected Dunn’s post hoc test (Figure 2 D-F and H-J). This non-parametric approach is more appropriate for our data structure, as EV experiments are usually subject to high variability and immunoblot quantifications involving small sample sizes (n≈6) do not always meet the assumptions of normality or equal variance. The Kruskal-Wallis test does not assume normality or equal variances, making it more robust for small, variable biological datasets. The statistical analyses and figure legend have been updated in the revised manuscript accordingly.

      In addition, since our CBE treatments yielded statistically non-significant data, we have softened our conclusions throughout the manuscript concerning the contribution of GCase activity to EV-mediated BMP release modulation.

      (6) There is a very strong reduction in flotillin-1 in R1441G cells vs WT (Figure 2G) in the EV fraction. That reduction is further exacerbated with MLi2, which likely means it is not kinase activity dependent. Can the authors comment on that?

      We agree with the reviewer that Flotillin-1 showed a different behavior compared with LAMP2 in these experiments. As recommended by the MISEV guidelines (Théry C, et al. J Extracell Vesicles. 2018;  7:1535750; Welsh JA, et al. J Extracell Vesicles. 2024, 13:e12404), it is important to analyze more than one EV-associated protein marker. We examined LAMP2, which, together with LAMP1, has been reported to be specifically enriched in EVs of endolysosomal origin (exosomes; Mathieu et al., Nat Commun. 2021, 12:4389 ). In contrast, Flotillin-1 is also associated with small EVs but may represent a distinct EV subpopulation from those positive for LAMP proteins (Kowal J, et al. PNAS 2016, 113:E968-E977).

      Nevertheless, the biochemical analysis of isolated EV fractions was complemented by our lipidomics data and, in the revised version, by TIRF microscopy analysis of exosome release in control and G2019S LRRK2 human fibroblasts (new Figure 7, Panels G-I; Videos 1 and 2). In this analysis, we confirmed increased exocytosis of CD63-pHluorin– positive endolysosomes in G2019S LRRK2 human fibroblasts compared to controls, an effect that was reversed by MLi-2 treatment. The CD63-pHluorin–positive compartment of these cells was also largely positive for BMP (new Figure 7G). Collectively, these findings further support the regulatory role of LRRK2 activity in EV-mediated BMP secretion.

      (7) In Figure 2C, the authors should express that the LAMP2-EV and flotillin-1 EV fractions from the WB are highly exposed. As presently presented, it is slightly misleading.

      We thank the reviewer for this comment. In EV preparations, the amount of protein recovered is typically very low. Therefore, although we loaded all the EV protein obtained from each sample, the immunoblots for LAMP2 and Flotillin-1 in EV fractions required longer exposure times to visualize clear signals across all conditions. We have now indicated in the corresponding figure legend that these EV blots are long-exposure blots to facilitate signal detection and avoid any potential misunderstanding.

      (8) If Figure 2C and D are from two different experiments, they should not be plotted together in Figure 2E-G. You cannot compare the effect of MLi2 vs CBE if done in completely different experiments.

      We appreciate the reviewer’s comment and agree with this observation. The MLi-2 and CBE experiments were performed independently and in separate experimental runs; therefore, we have reanalyzed these datasets separately rather than combining them in a two-way ANOVA. To properly compare more than two groups within each dataset, we have now applied a Kruskal-Wallis test followed by an uncorrected Dunn’s post hoc test (Figure 2 D-F and H-J). This non-parametric approach is more appropriate for our data structure, as EV experiments are usually subject to high variability and immunoblot quantifications involving small sample sizes (n≈6) do not always meet the assumptions of normality or equal variance. The Kruskal-Wallis test does not assume normality or equal variances, making it more robust for small, variable biological datasets. The revised statistical analyses and figure legends have been updated accordingly in the manuscript.

      (9) The authors state that "For the R1441G MEF cells, MLi-2 decreased EV concentration while CBE increased EV particles per ml, in agreement with the effects observed in our biochemical analysis." As Figure S1D shows no statistical significance, the authors don't have sufficient evidence to make this claim.

      We apologize for this overstatement. We have revised the text to clarify that, although the differences did not reach statistical significance, a consistent trend toward decreased EV concentration upon MLi-2 treatment and increased EV release following CBE treatment was observed in R1441G MEF cells.

      (10) "Altogether, given that BMP is specifically enriched in ILVs (which become exosomes upon release), the data presented above support our biochemical analysis (Figure 2C, D, F) and suggest a role for LRRK2 and GCase in modulating BMP release in association with LAMP2-positive exosomes from MEF cells." As Figure 3E shows no statistical difference of BMP on EVs upon CBE treatment, this sentence is not accurate and should be reframed. Furthermore, the authors claim an increase in EV-LAMP2 in R1441G cells compared to WT, however, the amount of BMP in EVs of R1441G cells vs WT is unchanged with a non-significant reduction. This contradiction does not support the authors' conclusions and really puts into question their whole model.

      We thank the reviewer for this observation. After reanalyzing our biochemical data from isolated EV fractions (see new Panels D-F and H-J) using an improved statistical approach, we found that although EV-associated LAMP2 levels were consistently elevated in untreated R1441G LRRK2 MEFs compared to WT cells, CBE treatment only produced a non-significant trend toward increased EV-associated LAMP2 compared to untreated R1441G LRRK2 cells. Accordingly, we have revised the sentence to read as follows:

      “Altogether, given that BMP is specifically enriched in ILVs (which become exosomes upon release), the data presented above support our biochemical analysis (Figure 2C, E, G, I) and suggest that LRRK2 activity regulates BMP release in association with LAMP2positive exosomes, whereas GCase activity appears to have a more variable effect under the tested conditions.”

      We also agree with the reviewer that, in our MEF model, the amount of BMP in EVs of R1441G cells vs WT is unchanged with a non-significant reduction. However, pharmacological modulation supports our conclusion that BMP release is modulated by LRRK2 activity. Specifically, treatment with the LRRK2 inhibitor MLi-2 decreased EVassociated BMP and LAMP2 levels in R1441G LRRK2 MEFs, and our new data (new Figure 7, Panel G-I; Videos 1 and 2) show increased exocytosis of CD63-pHluorin– positive endolysosomes in G2019S LRRK2 human fibroblasts compared to controls, an effect that was reversed by MLi-2 treatment. The CD63-pHluorin–positive compartment of these cells was also largely positive for BMP (new Figure 7G).

      In light of the reviewer’s comment about CBE treatment, we have softened our conclusions throughout the manuscript concerning the contribution of GCase activity in this model.

      (11) In Figure 5, 16 h of MLi2 treatment is too long and can lead to off-target effects. I would advise reducing it to 1-4 h.

      Prolonged MLi-2 treatments have been extensively used in the field without evidence of significant off-target effects. Several studies, including Fell et al. (2015, J Pharmacol Exp Ther 355:397-409), De Wit et al. (2019, Mol Neurobiol 56:5273-5286), Ho et al. (2022, NPJ Parkinson’s Dis 8:115), Tengberg et al. (2024, Neurobiol Dis 202:106728), and Jaimon et al. (2025, Sci Signal 18:eads5761), have applied long-term (24-48 h) MLi-2 treatments at comparable concentrations without detecting toxicity or off-target alterations, including in MEFs (Ho et al., 2022; Dhekne et al., 2018, eLife 7:e40202). Moreover, the data presented in Figure 5 demonstrate a reduction in CLN5 protein levels in both MEFs and human fibroblasts following MLi-2 treatment, confirming the specificity of the observed effects in LRRK2 mutant cells.

      (12) "Our data suggest that BMP is exocytosed in association with EVs and that LRRK2 and GCase activities modulate BMP secretion." Again, cells carrying the R1441G mutation have the same amount of BMP in EVs than WT. This sentence is not factually accurate. Accordingly, CBE did not change the amount of BMP in EVs.

      We thank the reviewer for this observation and agree that, in our MEF model, the amount of BMP in EVs from R1441G LRRK2 cells is comparable to that observed in WT cells. However, pharmacological modulation supports our conclusion that BMP release is modulated by LRRK2 activity. Specifically, treatment with the LRRK2 inhibitor MLi-2 decreased EV-associated BMP levels in R1441G LRRK2 MEFs, and our new data (new Figure 7G-I; Videos 1 and 2) show increased exocytosis of CD63-pHluorin–positive endolysosomes in G2019S LRRK2 human fibroblasts compared to controls, an effect that was reversed by MLi-2 treatment. The CD63-pHluorin–positive compartment of these cells was also largely positive for BMP (new Figure 7G). These findings further support the regulatory role of LRRK2 activity in EV-mediated BMP secretion. In addition, in light of the reviewer’s comment about CBE treatment, we have softened our conclusions throughout the paper concerning the contribution of GCase activity in this model.

      (13) Figure 6; EV release should have been monitored by more accurate markers such as CD63 and CD81.

      We thank the reviewer for this comment. We and others (Kowal et al., 2016; Lu et al., 2018; Mathieu et al., 2021; Ferreira et al., 2022) have reported enrichment of Flotillin-1 and LAMP proteins in isolated small EV fractions. In particular, one of these studies (Mathieu et al., Nat Commun. 2021), in which bafilomycin A1 was also used (to boost exosome release), reported that “LAMP1-positive subpopulations of EVs represent MVB/lysosome-derived exosomes, which also contain syntenin-1.” Altogether, our choice of EV markers (LAMP2 and Flotillin-1) is consistent with those previously and accurately used to characterize EVs. We have now included all relevant references in the revised manuscript to further clarify this point.

      (14) Figure 6 suggests that exosomal BMP is controlled by EV release. I would think that is rather obvious.

      We agree that the finding that exosomal BMP release is influenced by EV secretion may appear “obvious.” However, our intention in Figure 6 was to provide direct experimental evidence confirming this relationship using pharmacological modulators of EV release. Specifically, inhibition of EV secretion with GW4869 reduced exosomal BMP levels, whereas stimulation with bafilomycin A1 increased them. These data were important to establish a causal link between EV trafficking and BMP export, thereby validating our model and supporting the interpretation that LRRK2 regulates BMP homeostasis through EV-mediated exocytosis, which is further modulated, to some extent, by GCase activity. 

      Minor concerns:

      (1) Figure 1: Change colors to be color blind friendly.

      We thank the reviewer for this helpful suggestion. We have adjusted the colors in Figure 1 to be color-blind friendly. In addition, we have applied the same color-blind friendly palette to the new immunofluorescence data presented in new Figure 7, Panel A and D.

      (2) More consistency on "Xmin" vs "X min" would be appreciated.

      We thank the reviewer for this observation. We have revised the manuscript to ensure consistent formatting of time indications throughout the text and figures, using the standardized format “X min.”

      Reviewer #2 (Recommendations for the authors):

      (1)  Figure 2C-D. Were equal amounts of protein loaded in each lane?

      Equal protein amounts were loaded in lanes corresponding to whole-cell lysate (WCL) fractions and normalized based on α-Tubulin levels.

      For the extracellular vesicle (EV) fractions, all protein recovered from EV pellets after isolation was loaded. In all EV-related experiments, we seeded the same number of EVproducing cells per condition, and the resulting EV-derived data (from both immunoblotting and lipidomics analyses) were normalized to the corresponding whole cell lysate (WCL) protein content to ensure comparability across conditions.

      All these technical details have been included in the Materials section of our revised manuscript.

      (2) The authors refer to the papers of Medoh et al (ref 43) and Singh et al. (44) for the key role of CLN5 in the BMP biosynthetic pathway. However, Medoh et al reported that CLN5 is the lysosomal BMP synthase. In contrast, Singh et al. reported that PLD3 and PLD4 mediate the synthesis of SS-BMP, and did not find any role for CLN5. 

      To avoid any confusion or misinterpretation of our findings regarding CLN5 and given that we do not analyze PLD3 or PLD4 in our study, we have decided to replace the reference to Singh et al. with Bulfon D. et al. (Nat. Commun. 2024, 15:9937) instead. This last work, conducted by an independent group distinct from the one that originally described CLN5, also validated CLN5 as the sole BMP synthase in cells.

      Also, authors mention that bafilomycin A1 (B-A1) dramatically boosts EV exocytosis, referring to Kowal et al., 2016 (ref 35) and Lu et al., 2018 (ref 45). However, this is not shown in Kowal et al.

      We thank the reviewer for pointing out this mistake. We apologize for the incorrect citation and have now corrected the reference. The statement regarding the effect of bafilomycin A1 on EV exocytosis now appropriately refers to Mathieu et al., 2021 and Lu et al., 2018.

      (3) Page 7, it is stated that "No statistically significant differences in intracellular BMP levels were observed in WT LRRK2 MEFs upon LRRK2 or GCase inhibition(Supplemental Figure 1D, E)". The authors probably mean "Supplemental Figure 1F, G"

      We thank the reviewer for noting this error. We have corrected the text to refer to panels F and G of Supplemental Figure 1, which correspond to the relevant data. We have also revised the reference to panel I of Supplemental Figure 1 accordingly.

    1. The HTTP response must return the file as a binary object, not as HTML or plain text. Set the Content-Type header to application/octet-stream to indicate this is a binary file download.

      to be changed to:

        1. Have Content-Type: text/plain in the header.
        1. Be externally accessible.
        1. Not be password protected.
        1. Not be behind a proxy or redirect.
    1. Reviewer #2 (Public review):

      Summary:

      The manuscript entitled "Adaptation of endothelial cells to microenvironment 1 topographical cues through lysyl oxidase like-2-mediated basement membrane scaffolding" by Marchand et al., aims to determine the impact of LOXL2 on the dynamic formation of vascular basement membranes (BMs).

      Strengths:

      This manuscript includes a nice combination of different methods and presents the results in an appropriate manner.

      Furthermore, the results clearly demonstrate an impact of LOXL2 on collagen IV-fibronectin organization and topography. Finally, the proper arrangement of collagen IV-fibronectin impacts cell alignment.

      Weaknesses:

      An open question for this reviewer is what the real take-home message of the present study is? Can the authors deliver novel insight into BM formation transferable to the in vivo situation? Why do the authors not see a "real" BM? Could it be that in vivo endothelial cells do not build the vascular BM alone? Thus, are additional cell types needed? And what will happen then if LOXL2 expression is altered?

      Major comments:

      (1) Can the authors show that LOXL2 cross-links fibronectin and collagen IV?

      (2) The authors stated that LOXL2 depletion affects cytoskeleton arrangements and cell shape. Could it be that this is simply a secondary effect mediated primarily through the altered cross-linking of fibronectin and collagen IV?

      (3) Can the authors perform cell adhesion studies on CDMs derived from wild-type versus LOXL2-deficient cells?

      (4) Line 226-230: Can the authors provide the proliferation data of wildtype and LOXL2-depleted cells supporting their Src and Akt activity findings?

      (5) Line 298-299: The authors made a statement about laminin. Can the authors think of a co-culture of wild-type versus LOXL2-depleted endothelial cells in combination with pericytes or fibroblasts, as these cells contribute to the efficient assembly of a functional vascular basement membrane (10.1182/blood-2009-05-222364). Here, you can determine the impact of altered fibronectin-collagen IV cross-linking on laminin network formation. This will affect their conclusion in lines 299-304, as these facts are solely based on endothelial cells.

      (6) Suggestion: can the authors supplement recombinant LOXL2 protein in its active version to the LOXL2-depleted endothelial cells to rescue the observed changes? And further compare LOXL2 enzymatic function with LOXL2 protein harbouring Zn instead of Cu, making it enzymatic inactive. Here, the authors might be able to strengthen their hypothesis that LOXL2 might bridge fibronectin and collagen IV or link both proteins.

      (7) There are grammatical errors in the manuscript that the authors should work on.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #3:

      Comments on revised version:

      This revised version is in large improved and the responses to reviewers' comments are generally relevant. However, the response regarding pre-nodes is not satisfactory. I understand that the authors prefer to avoid further experimentations, but I think this is an important point that needs to be clarified. Exploring stages between E12 and E15 are therefore of importance. When carefully examining some of the figures (Fig. 1E or 2D) I think that at E15 they may well be pre-nodes formation prior to myelin deposition, on structure the authors considered to be heminodes. To be convincing they should use double or triple labeling with, in addition to the nodal proteins (ankG and/or Nav pan), a good myelin marker such as antiPLP. The rat monoclonal developed by late Pr Ikenaka would give a sharper staining than the anti MAG they used. (I assume the clone must still be available in Okazaki ).

      We appreciate your insightful comment regarding the possible presence of pre-nodal clusters along NM axons and your kind suggestion to use the PLP antibody (clone AA3; Yamamura et al., J Neurochem, 1991). We have obtained this monoclonal antibody from Dr. Kenji Tanaka previously in Okazaki and confirmed that it works well in chicken tissues. However, since this clone recognizes both PLP and DM-20 isoforms, it labels not only myelin-forming oligodendrocytes (MFOLs) but also newly formed oligodendrocytes (NFOLs) (Yokoyama et al., J Neurochem, 2025). Therefore, it is not ideal for determining whether nodal protein clusters are formed before myelin deposition.

      Instead, we performed double immunostaining for MAG and AnkG between E12 and E15 to clarify the temporal relationship between myelin maturation and node formation. The results showed that detectable AnkG clusters along NM axons began to appear very sparsely around E13, coinciding with the emergence of MAG signals, and became more prominent with development. This temporal pattern does not match the definition of pre-nodal clusters, which are formed prior to myelination.

      Although we cannot completely rule out the possibility of undetectable pre-nodal clusters or those composed of molecules other than AnkG, our results support the view that pre-nodal clusters are unlikely to play a major role in determining the regional difference in nodal spacing along NM axons. These new data have been added as Figure 2—figure supplement 1, and the relevant sections in the Results, Discussion, and Figure legend have been revised accordingly (page 5, line 4; page 10, line 7; page 29, line 1).

    1. Source: georgephoto/Pixabay With the recent mass shooting in Germany, some people are again asking why anybody would hate refugees and aliens (i.e. foreigners). If you are an immigrant, particularly a recent refugee or asylum seeker, you may have already asked this question many times after having experienced prejudice, racism, and discrimination. If you are among those who hate refugees, do you know why you feel this way? Is it a vague feeling of hostility or does it stem from specific unpleasant experiences or future worries? For instance, do you worry about foreigners spreading diseases, committing violence, or taking away jobs and depressing wages? In this post, I discuss new research by Helen Landmann and colleagues in Germany, which has examined the reasons people use to justify anti-refugee hostility. The study’s findings are published in the December 2019 issue of the European Journal of Social Psychology.1 Why do we find immigrants threatening? Whether the threat is true or imagined, immigrants might be perceived as threatening in a number of ways. Refugees, for instance, pose an economic threat because they need jobs, low-cost housing, access to health care, etc. In addition, they pose a health threat because some refugees come from countries with comparably higher rates of certain diseases (e.g., tuberculosis, AIDS). Furthermore, immigrants pose an identity threat, especially if they have a “different cultural identity, religious identity, and value system than members of the host community.” Perceptions of threat, according to previous research, “are one of the most important predictors of attitudes and prejudice toward immigrants and other outgroups”(p. 82).2 Six reasons for hostility toward refugees Landmann and colleagues in Germany conducted a series of four related studies to examine hostility toward refugees. In the first of these investigations, they used a sample of 55 male and 121 female psychology students (average age of 32 years). The participants were initially asked how many refugees Germany could host per year and then asked what would happen if this number was exceeded. Six threat types emerged from the analysis of the responses:1 Symbolic threat (the migrants’ culture and religion being threatening to one’s way of life) Realistic threat (job availability and pay) Safety threat (immigrants committing crimes) Social functioning threat (the creation of ghettos) Prejudice threat (the potential rise of racist and right-wing views) Altruistic threat (the host nation failing to provide needed support for refugees) article continues after advertisement While the first three threats may be considered direct threats, the other three are extended threats. For example, a person who fears he might catch a deadly disease from refugees is reacting to a direct threat, but a person who fears negative changes in politics of the country, such as a significant increase in popular support for extremist right-wing and far-right parties, is reacting to an indirect or extended threat. Examining these six threat types, researchers tried to determine if only one or two of them might explain hostility toward refugees just as well or even better than all six factors combined. To answer this question, they conducted a second study using a sample of 289 female and 118 male students (average age of 32 years). They concluded that the six threat types explained the data better than one general threat factor or two factors (i.e. symbolic and realistic). In addition, they found that every threat type—even altruistic threat (concerns about the host country’s ability to care for refugees)—was linked with negative views of immigration and refugees. A third study, a replication of the second study, included a sample of 23 male and 108 female students (average age of 33 years) and concluded that, aside from the prejudice threat, every threat type was associated with unfavorable attitudes toward migrants. Bias Essential Reads When the Brain Shapes Belief Racism Is Not Innate Study 4 used a more representative sample, consisting of 111 women and 140 men (mean age of 50 years). Compared to college students in previous samples, these participants reported perceiving even stronger threats and experiencing more hostility toward refugees. And the results again showed support for the six threat types. Every threat type was correlated with unfavorable views of migration and refugees and with favoring more restrictive control of migration. article continues after advertisement Both direct and indirect threats were related to unfavorable attitudes toward refugees,

      This is relevant to the text because in "The Wretched and The Beautiful", human's different attitudes toward two groups of aliens contribute to different threats that alien met.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors attempted to clarify the impact of N protein mutations on ribonucleoprotein (RNP) assembly and stability using analytical ultracentrifugation (AUC) and mass photometry (MP). These complementary approaches provide a more comprehensive understanding of the underlying processes. Both SV-AUC and MP results consistently showed enhanced RNP assembly and stability due to N protein mutations.

      The overall research design appears well planned, and the experiments were carefully executed.

      Strengths:

      SV-AUC, performed at higher concentrations (3 µM), captured the hydrodynamic properties of bulk assembled complexes, while MP provided crucial information on dissociation rates and complex lifetimes at nanomolar concentrations. Together, the methods offered detailed insights into association states and dissociation kinetics across a broad concentration range. This represents a thorough application of solution physicochemistry.

      We thank the Reviewer for this positive assessment. 

      Weaknesses:

      Unlike AUC, MP observes only a part of the solution. In MP, bound molecules are accumulated on the glass surface (not dissociated), thus the concentration in solution should change as time develops. How does such concentration change impact the result shown here?

      We agree with the Reviewer that the concentration in solution above the surface will change with time; however, the impact of surface adsorption turns out to be negligible. To show this we have added a calculation as Supplementary Methods that is based on the number of imaged adsorption events, the fraction of imaged area to total surface area, and the initial sample volume and concentration. Under our experimental conditions the reduction is less than 1%, which is well within the range of experimental concentration errors.

      This is in line with the observation that surface adsorption of proteins to glass is critical and needs to be prevented when working at picomolar concentrations (Zhao H, Mayer ML, Schuck P. 2014. Analysis of protein interactions with picomolar binding affinity by fluorescence-detected sedimentation velocity. Anal Chem 86:3181–3187. doi:10.1021/ac500093m), but is ordinarily negligible when working at the mid nanomolar concentration range. The difference in the MP experiments is that where usually the surface adsorption to glass and plastic is invisible, it is being imaged and quantified in MP. The negligible impact of surface adsorption on solution concentration in typical MP experiments is also in line with the results of several studies that have successfully measured dissociation constants of binding equilibria by MP (Young G et al., Science 360 (2018) 432; Wu & Piszczeck, Anal Biochem 592 (2020) 113575; Solterman et al. Angewandte Chemie 59 (2020) 10774) with samples in the 5-50 nM range and similar experimental setup. It should be noted that in the MP experiments no surface functionalization is employed, in contrast to optical biosensors that utilize surface-immobilized ligands and polymeric matrices and thereby enhance the surface binding capacity.

      Even though this depletion effect is negligible under ordinary MP conditions, the Reviewer raises a good point and readers may have a similar question with this novel technique. For this reason, we have added in the MP section of the Methods the sentence “In either configuration, the impact of surface binding on the sample concentration is < 1% and negligible, as described in the Supplementary Methods S1.” and added the detailed calculations in the Supplement accordingly. The use of SV as a traditional, orthogonal technique and the observation of consistent results with those of MP should further dispel readers’ methodological concerns in this point.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors apply a variety of biophysical and computational techniques to characterize the effects of mutations in the SARS-CoV-2 N protein on the formation of ribonucleoprotein particles (RNPs). They find convergent evolution in multiple repeated independent mutations strengthening binding interfaces, compensating for other mutations that reduce RNP stability but which enhance viral replication.

      Strengths:

      The authors assay the effects of a variety of mutations found in SARS-CoV-2 variants of concern using a variety of approaches, including biophysical characterization of assembly properties of RNPs, combined with computational prediction of the effects of mutations on molecular structures and interactions. The findings of the paper contribute to our increasing understanding of the principles driving viral self-assembly, and increase the foundation for potential future design of therapeutics such as assembly inhibitors.

      Thank you for highlighting the strengths of our paper and the potential impact on future design of therapeutics.

      Weaknesses:

      For the most part, the paper is well-written, the data presented support the claims made, and the arguments are easy to follow. However, I believe that parts of the presentation could be substantially improved. I found portions of the text to be overly long and verbose and likely could be substantially edited; the use of acronyms and initialisms is pervasive, making parts of the exposition laborious to follow; and portions of the figures are too small and difficult to read/understand.

      We are glad the Reviewer concurs the data support our conclusions, and finds the arguments easy to follow.  We appreciate the comment that the work was not optimally presented. To address this point, we have identified multiple opportunities to streamline the text without jeopardizing the clarity. We have also rewritten the end of the Introduction.

      As recommended, we have reduced and harmonized the use of acronyms and abbreviations throughout the text to improve readability. Specifically, we have now spelled out nucleic acid (NA), intrinsically disordered regions (IDR), full-length (FL), AlphaFold (AF3), and variants of concern (VOC).

      Finally, we have improved the presentation of most figures, adding labels and new panels, and increased the label font sizes to facilitate more detailed inspections of the data.

      Reviewer #3 (Public Review):

      This manuscript investigates how mutations in the SARS-CoV-2 nucleocapsid protein (N) alter ribonucleoprotein (RNP) assembly, stability, and viral fitness. The authors focus on mutations such as P13L, G214C, and G215C, combining biophysical assays (SV-AUC, mass photometry, CD spectroscopy, EM), VLP formation, and reverse genetics. They propose that SARS-CoV-2 exploits "fuzzy complex" principles, where distributed weak interfaces in disordered regions allow both stability and plasticity, with measurable consequences for viral replication.

      Strengths:

      (1) The paper demonstrates a comprehensive integration of structural biophysics, peptide/protein assays, VLP systems, and reverse genetics.

      (2) Identification of both de novo (P13L) and stabilizing (G214C/G215C) interfaces provides a mechanistic insight into RNP formation.

      (3) Strong application of the "fuzzy complex" framework to viral assembly, showing how weak/disordered interactions support evolvability, is a significant conceptual advance in viral capsid assembly.

      (4) Overall, the study provides a mechanistic context for mutations that have arisen in major SARS-CoV-2 variants (Omicron, Delta, Lambda) and a mechanistic basis for how mutations influence phenotype via altered biomolecular interactions.

      We are grateful for these comments highlighting this work as a significant conceptual advance.

      Weaknesses:

      (1) The arrangement of N dimers around LRS helices is presented in Figure 1C, but the text concedes that "the arrangement sketched in Figure 1C is not unique" (lines 144-146) and that AF3 modeling attempts yielded "only inconsistent results" (line 149).

      The authors should therefore present the models more cautiously as hypotheses instead. Additional alternative arrangements should be included in the Supplementary Information, so the readers do not over-interpret a single schematic model.

      We agree that in the absence of high-resolution structures the RNP models are hypothetical, and have now emphasized this in the Results, following the Reviewer’s recommendation. To present alternative arrangements that satisfy the biophysical constraints upfront, we have promoted the previous Supplementary Figure 11 showing different models to the first Supplementary Figure, and expanded it with examples of different oligomers. In this way it is referenced early on in the Results and in the legend to Figure 1C. We agree this strengthens the manuscript, as one of the take-home messages is the inherent polydispersity of the RNPs.

      The fact that AF3 can only provide inconsistent results will not come as a surprise, given the substantial disordered regions of the complex, and is a drawback of AF3 rather than our structural model. We slightly emphasized this point so as to clarify that the presentation of the AF3-based RNP structure serves solely as supporting evidence that our hypothetical model is sterically reasonable.

      The new Results paragraph reads:

      “As suggested in the cartoon of Figure 1C, this supports the hypothesis of a three-dimensional arrangement with a central LRS oligomer with symmetry properties and dimensions similar to low resolution EM images of model RNPs (Carlson et al., 2022, 2020) and cryo-ET of RNPs in virions (Klein et al., 2020; Yao et al., 2020).  It should be noted, however, that the arrangement sketched in Figure 1C is not unique and other subunit orientations could be envisioned that satisfy all constraints from experimentally observed binding interfaces, including different oligomers and anti-parallel subunits as illustrated in Supplementary Figure S1. Extending previous ColabFold structural predictions that show multiple N-protein dimers self-assembled via the LRS coiled-coils (Zhao et al., 2023), we attempted the AlphaFold modeling of RNPs combining multiple N dimers with SL7 RNA ligands, mimicking our biophysical assembly model. Current AlphaFold restrictions limit the prediction to pentamers of N-protein dimers with 10 copies of SL7 RNA. While only inconsistent results were obtained – which is not surprising given the large intrinsically disordered regions exceed the predictive power of AlphaFold – some models did produce an overall RNP organization similar to Figure 1C, suggesting such an arrangement is at least sterically reasonable with regard to possible N-protein subunit orientations in an RNP (Supplementary Figure S2)”

      (2) Negative-stained EM fibrils (Figure 2A) and CD spectra (Figure 2B) are presented to argue that P13L promotes β-sheet self-association. However, the claim could benefit from more orthogonal validation of β-sheet self-association. Additional confirmation via FTIR spectra or ThT fluorescence could be used to further distinguish structured β-sheets from amorphous aggregation.

      We completely agree that the application of multiple orthogonal biophysical methods can strengthen the conclusions. In addition to EM fibrils and CD spectra (a classical gold standard technique for protein secondary structure in solution), we already have support from ColabFold modeling, as well as NMR results from the Zweckstetter lab showing the potential for for β-sheet-like conformations.

      Furthermore, we believe the evidence for the absence of ‘amorphous aggregates’ is very strong, as this would be inconsistent with the long-range order required to create the visibly fibrillar morphology in EM, and amorphous aggregates would be inconsistent with the increased solution viscosity. In this context, it is also highly relevant that the β-sheet-like secondary structure recorded by CD is concentration-dependent and reversible upon dilution. The long-range spatial order of fibrils is consistent with the formation of secondary structure in solution.

      In addition, it must be kept in mind that what we see is specific to N-arm peptides carrying the P13L mutation (in EM, CD, and structural prediction) and does not occur in the other two N-arm peptides (ancestral N-arm and N-arm with deletion of 31-33), linker peptides, or C-arm peptides.

      Most importantly, as elaborated in more detail below, we do not claim that fibril formation is physiologically relevant. At the heart of this – in the context of the evolution of fuzzy complexes – is that the P13L mutation creates additional weak protein-protein interactions. Indeed, the assembly of fibrils geometrically requires at least two interfaces for each subunit. These weak interactions are at play physiologically in the context of the disordered RNP particles, and in macromolecular condensates, but not in the formation of fibrils. Therefore, while we appreciate the suggestion for FTIR spectra ThT staining, we are afraid further emphasis on the fibril structure might confuse the reader, and therefore we would rather clarify upfront that these fibrillar assemblies are not thought to form in vivo from full-length protein, but merely demonstrate the presence of N-arm self-association interfaces in the model of truncated peptides.

      Accordingly, we have amended the Results paragraph reporting the fibrils:

      “Thus, the N-arm mutation P13L is responsible for the formation of fibrils in N-arm peptides after prolonged storage. Some of these N-arm fibrils exhibit a twisted morphology with width of »5 nm (Figure 2A), in some instances exhibiting patterns of strand breaks. Such fibrils are frequently encountered in proteins that can stack β-sheets, such as in amyloids (Paravastu et al., 2008). While we have not observed fibril formation in the context of full-length N, and have no evidence such fibrils are physiologically relevant, their occurrence in solutions of truncated N-arm peptide nonetheless demonstrates the introduction of ordered N-arm self-association interfaces in conformations of P13L mutants.”

      And more completely summarized experimental evidence prior to describing the ColabFold prediction results (which previously did not include mention of the NMR):

      “Finally, confirming the interpretation of the EM images and the CD data, as well as the b-structure propensity reported from NMR data (Zachrdla et al., 2022), the structural prediction of N[10-20]:P13L in ColabFold displayed oligomers with stacking b-sheets …”

      (3) In the main text, the authors alternate between emphasizing non-covalent effects ("a major effect of the cysteines already arises in reduced conditions without any covalent bonds," line 576) and highlighting "oxidized tetrameric N-proteins of N:G214C and N:G215C can be incorporated into RNPs". Therefore, the biological relevance of disulfide redox chemistry in viral assembly in vivo remains unclear. Discussing cellular redox plausibility and whether the authors' oxidizing conditions are meant as a mechanistic stress test rather than physiological mimicry could improve the interpretation of these results.

      The paper could benefit if the authors provide a summary figure or table contrasting reduced vs. oxidized conditions for G214C/G215C mutants (self-association, oligomerization state, RNP stability). Explicitly discuss whether disulfides are likely to form in infected cells.

      We thank the Reviewer for raising this most interesting point.  The reason why the biological relevance of N dilsulfides remains unclear is simply that this is still unknown, unfortunately. Recently, Kubinski et al. have strongly argued for the formation of disulfides in infected cells, but in our view the evidence remains weak since the majority of disulfide bonds in that work presented as post-lysis artifacts, and it appears the non-covalent effects alone could explain the physiological observations. We aimed for a balanced presentation and wrote in the relevant Results section:

      “Covalent disulfide bonds in the LRS in non-reducing conditions were found to further promote LRS oligomerization. However, there is no conclusive data yet whether covalent bonds in the LRS occur in vivo, or any G215C effect is entirely non-covalent due to the significant strengthening of LRS helix oligomerization (see Discussion).”

      Despite the uncertainty regarding physiological disulfide bond formation, we believe it is useful to ask whether covalently crosslinked N dimers would aid or constrain RNP assembly in our biophysical model. We have now better explained this motivation in the Results section describing the RNP experiments:

      “Even though it is still unclear whether disulfide bonds of N cysteine mutants form in vivo, we were curious about the impact of disulfide-linked oligomers of the cysteine mutants on their RNP structure and stability in our biophysical assembly model.”

      The referenced paragraph from the Discussion reads:

      “Regarding the cysteine mutations that have been repeatedly introduced in the LRS prior to the rise of the Omicron VOCs, it is an open question whether they lead to covalent bonds in vivo or in the VLP assay. While examples of disulfide-linked viral nucleocapsid proteins have been reported (Kubinski et al., 2024; Prokudina et al., 2004; Wootton and Yoo, 2003), a methodological difficulty in their detection is artifactual disulfide bond formation post-lysis of infected cells (Kubinski et al., 2024; Wootton and Yoo, 2003).  However, our results clearly show that a major effect of the cysteines already arises in reduced conditions without any covalent bonds, through extension of the LRS helices, and concomitant redirection of the disordered N-terminal sequence. While oxidized tetrameric N-proteins of N:G214C and N:G215C can be incorporated into RNPs, the covalent bonds provided only marginally improved RNP stability.  Interestingly, the introduction of cysteines imposes preferences of RNP oligomeric states dependent on oxidation state, consistent with our MD simulations highlighting the impact of cysteine orientation of 214C versus 215C relative to the hydrophobic surface of the LRS helices. Overall, considering potentially detrimental structural constraints from covalent bonds on LRS clusters seeding RNPs, energetic penalties on RNP disassembly, as well as the required monomeric state of the LRS helix for interaction with the NSP3 Ubl domain (Bessa et al., 2022), at present it is unclear to what extent the formation of disulfide linkages between LRS helices would be beneficial or detrimental in the viral life cycle.”

      We feel that this text addresses the Reviewer’s comment, and that expanding the existing discussion further would conflict with other recommendations to shorten and focus the text.

      Finally, we have addressed the valuable suggestion of a new table summarizing the oligomeric state and self-association of the different cysteine mutants by inserting a new column in the existing Table 1 reporting all species’ oligomeric state at low micromolar concentrations. In this way they can be compared at a glance with the other mutants as well. A more detailed comparison of the concentration-dependent size-distribution is provided in Figure 4.

      (4) VLP assays (Figure 7) show little enhancement for P13L or G215C alone, whereas Figure 8 shows that P13L provides clear fitness advantages. This discrepancy is acknowledged but not reconciled with any mechanistic or systematic rationale. The authors should consider emphasizing the limitations of VLP assays and the sources of the discrepancy with respect to Figure 8.

      We thank the Reviewer for this comment, which highlights a very important point. 

      For clarification and to improve the cohesion of the manuscript we have inserted a reference to the Discussion after the presentation of the VLP results, which provides a natural transition to the following description of the reverse genetics experiments:

      “As expanded on in the Discussion, the failure to observe enhancement by P13L alone may be related to limitations of the VLP assay in sensitivity, including the restriction to a single round of infection, and protein expression levels.”

      This references a paragraph in the Discussion about the limitations of the VLP assay in general and the reasons we believe the enhancement by P13L alone was not picked up:

      “…While this assay has been widely used for rapid assessment of spike protein and N variants (Syed et al., 2021), it has limitations due to the addition of non-genomic RNA and the lack of double membrane vesicles from which gRNA emerges through the NSP3/NSP4 pore complex potentially poised for packaging (Bessa et al., 2022; Ke et al., 2024; Ni et al., 2023). It should also be recognized that the results do not directly reflect the relative efficiency of RNP assembly only, since protein expression levels, their localization, and their posttranslational modifications are not controlled for. Susceptibility for such factors might be exacerbated with mutations that modulate weak protein interactions. For example, as shown previously (Syed et al., 2024; Zhao et al., 2024), a GSK3 inhibitor inhibiting N-protein phosphorylation significantly enhances VLP formation and eliminates the advantage provided for by the N:G215C mutation relative to the ancestral N – presumably due to an increase in assembly-competent, non-phosphorylated N-protein erasing an affinity advantage. A similar process may be underlying the absent or marginal improvement in VLP readout from the cysteine LRS mutants and P13L at the achieved transfection level in the present work, and the enhanced signal from R203K/G204R and R203M (the latter being consistent with previous reports (Li et al., 2025; Syed et al., 2021)) modulating protein phosphorylation. Nonetheless, mirroring the results of the biophysical in vitro experiments, the addition of RNP-stabilizing P13L and G214C mutations on top of R203K/G204R led to a significantly larger VLP signal.

      The VLP assay may be limited in sensitivity to mutation effects due to its restriction to a single round of infection. To avoid this and other potential limitations of the VLP assay for the study of viral packaging, for the key mutation N:P13L we carried out reverse genetics experiments. These showed the sole N:P13L mutation significantly increases viral fitness (Figure 8).”

      (5) Figures 5 and 6 are dense, and the several overlays make it hard to read. The authors should consider picking the most extreme results to make a point in the main Figure 5 and move the other overlays to the Supplementary. Additionally, annotating MP peaks directly with "2×, 4×, 6× subunits" can help non-experts.

      We completely agree with the Reviewer – these figures were very dense.  To mitigate this problem without having the reader to switch back-and-forth to the supplement, we subdivided the panels of Figure 5 and showed only a subset of curves in each.  In this way the data are easier to read while still readily compared. It is a large figure, but it contains the key data for the present work and is therefore worthwhile to have in one place. For the MP histogram data we also have inserted the suggested peak labels. Similarly, we have split Figure 6A into two panels for clarity.

      (6) The paper has several names and shorthand notations for the mutants, making it hard to keep up. The authors could include a table that contains mutation keys, with each shorthand (Ancestral, Nο/No, Nλ, etc.) mapped onto exact N mutations (P13L, Δ31-33, R203K/G204R, G214C/G215C, etc.). They could then use the same glyphs (Latin vs Greek) consistently in text and figure labels.

      Yes, we agree this is a problem and we apologize for the confusion. However, it is not possible to refer exclusively to either Latin or Greek terminology, which we feel would be even more detrimental to readability (the former being exhaustively lengthy and the latter being imprecise). But we have used a rational system: If the complete set of mutations of a variant are present, then its Greek letter will be used as an abbreviation, and otherwise we use Latin amino acid/position indicators for individual mutations or combinations thereof. Unfortunately, previously we inadvertently failed to explicitly mention this, and we are most grateful for the Reviewer to point this out.

      We have now rectified this by including upfront the sentence:

      “We will adopt a nomenclature where the complete set of defining mutations of a variant will be referred to by its Greek letter, i.e., N:P13L/R203K/G204R/G214C is N<sub>­­λ</sub>, and analogously the set of Omicron mutations N:P13L/Δ31-33/R203K/G204R are referred to as N<sub>ο</sub>; see Table 1”

      This will define the two shorthands N<sub>λ</sub> and N<sub>ο</sub> used. Furthermore, as suggested and pointed to in the text, Table 1 does provide the keys to mutation and variants, including the information in which variant any of the other mutations studied here occur.

      (7) The EM fibrils (Figure 2A) and CD spectra (Figure 2B) were collected at mM peptide concentrations. These are far above physiological levels and may encourage non-specific aggregation. Similarly, the authors mention" ultra-weak binding energies that require mM concentrations to significantly populate oligomers". On the other hand, the experiments with full-length protein were performed at concentrations closer to biologically relevant concentrations in the micromolar range. While I appreciate the need to work at high concentrations to detect weak interactions, this raises questions about physiological relevance.

      This is indeed an important point to clarify. We agree that much lower nucleocapsid protein concentrations are present in the cytosol on average, and these were used in our RNP assembly experiments. However, there are at least two important physiologically relevant cases where high local N concentrations do occur:

      (1) Once assembled in RNPs, the disordered N-terminal extensions are locally at a very high concentration within the volume they can explore while tethered to the NTD. A back-of-the-envelope calculation assuming 12 N-protein subunits confining 12 N-terminal extensions to the volume of a single RNP (≈14x14x14 nm<sup>3</sup> by cryoEM; Klein et al 2020) leads to an effective concentration of 7.4 mM. Obviously the N-arm peptides are not completely free and there will be constraints that would hinder or promote encounter complex probability, but interfaces with mM Kd are clearly strong enough to populate Narm-Narm contacts extending from N-protein in the RNP.

      Additionally, any interaction where N-proteins are brought in close proximity could allow weak N-arm interactions to provide additional stability. Besides the RNP, we demonstrate this in our Results for nucleic-acid liganded N tetramers (Figure 4B), but this might similarly occur in complexes with NSP3 or host proteins. Generally, it is quite common that small additional binding energies play important roles in the modulation of multivalent protein complexes.

      (2) Within the macromolecular condensate the local concentration will be substantially higher than on average within the infected cell.  While we do not know its precise concentration, it is well-established that the sum of many ultra-weak interactions is driving the formation of this dense liquid phase. In our previous eLife paper (Nguyen et al., 2024) we have shown LLPS is suppressed with the R203K/G204R mutation, but it is ‘rescued’ with the additional P13L/del31-33 mutation of the Omicron variant showing strong LLPS. Similarly, LLPS is suppressed by the LRS mutant L222P, but rescued in conjunction with P13L. This is another biologically relevant scenario where weak interactions are critical.

      We have emphasized these points in the revised manuscript as described below.

      Specifically:

      (a) Could some of the fibril/β-sheet features attributed to P13L (Figure 2A-C) reflect non-specific aggregation at high concentrations rather than bona fide self-association motifs that could play out in biologically relevant scenarios?

      We understand this concern from the experience with proteins that often have limited solubility and tendencies to aggregate, sometimes accompanied by unfolding and driven by hydrophobic interactions, or clustering on the path to LLPS. However, we are struggling to reconcile the picture of non-specific aggregation with the context of our P13L N-arm peptides. The term ‘non-specific aggregation’ implies the idea of amorphous aggregates, which we would contend is inconsistent with the observed geometry of fibrils, which exhibit long-range order. In addition, non-specific aggregation does not lead to increased solution viscosity, which we describe, but fibril formation does. Another connotation of ‘aggregates’ is irreversibility.  However, we find the beta-sheet-like conformation seen at 1 mM becomes significantly more disordered when the same sample is diluted to 0.4 mM peptide. This is consistent with a reversible self-association driven by a conformational change toward ordered secondary structure.

      To highlight the reversibility, we have clarified the description: “Interestingly, diluting the 1 mM sample (solid) to a concentration of 0.4 mM (dashed) reveals a large shift in the far-UV spectra … both indicative of a significant increase of disorder upon dilution. This is consistent with the stabilization of b-sheets in a reversible, strongly cooperative self-association process with an effective K<sub>D</sub> in the high mM to low mM range.”

      We have also inserted a concentration conversion to mg/ml units, which shows even 1 mM of peptides is only ~5 mg/ml, i.e. not excessively high. “While the ancestral N-arm at »1 mM (» 4.6 mg/ml) concentrations exhibits CD spectra with a minimum at »200 nm typical of disordered conformations (black)”

      With regard to the question of specificity, we have studied similar N-arm peptides without P13L mutations and with the 31-33 deletion under equivalent conditions. But we observe the reversible self-association, conformational change, and fibril formation only for those containing the P13L mutation, consistent with ColabFold predictions. Neither did we observe fibrils with disordered C-arm peptides.

      How these weak self-association motifs in the N-arm can be physiologically relevant in the context of full-length protein modulating the stability of multi-molecular complexes and enhancing LLPS was outlined above, and further clarified in the manuscript as detailed below.

      (b) How do the authors justify extrapolating from the mM-range peptide behaviors to the crowded but far lower effective concentrations in cells?

      As pointed out above, the key to this question is the local preconcentration as the N-arm peptides are tethered to the rest of protein in the context of flexible multi-molecular assemblies. Another mechanism to consider is the formation of condensates. The response to the next comment will expand on this.

      The authors should consider adding a dedicated section (either in Methods or Discussion) justifying the use of high concentrations, with estimation of local concentrations in RNPs and how they compare to the in vitro ranges used here. For concentration-dependent phenomena discussed here, it is vital to ensure that the findings are not artefacts of non-physiological peptide aggregation..

      The use of high concentration in biophysical experiments is quite common, for example, in NMR or crystallography, insofar as they elucidate molecular properties. We believe this is obvious; the Reviewer will certainly agree with us, and this does not require further elaboration. The property observed in this case is the existence of specific, weak protein self-association interfaces in the N-arm.

      Our response to the Reviewer’s point 7(a) addresses the distinction between artefactual aggregation and self-association of N-arm peptides. The relevance of these weak protein self-association interfaces in the context of the full-length protein is the second underlying question.

      As we have previously stated in a dedicated Results paragraph:

      “In contrast to the modulation of the coiled-coil LRS interfaces, the de novo creation of the N-arm self-association interface through beta-sheet interactions enabled by P13L cannot be readily observed in full-length N-protein at low M concentrations. Similar to the ancestral LRS interface, it provides only ultra-weak binding energies that require mM concentrations to significantly populate oligomers. This is fully consistent with the previous observation by SV-AUC that neither N:P13L,31-33 nor N<sub>o</sub> with the full set of Omicron mutations show any significant higher-order self-association at low M concentrations, whereas at high local concentrations – as observed in phase-separated droplets – they can modulate and cooperatively enhance self-association processes (Nguyen et al., 2024). (If fact, P13L can substitute for the LRS promoting LLPS, as observed in the rescue of LLPS by N:P13L,31-33/L222P mutants whereas N:L222P LRS-abrogating mutants are deficient in LLPS.) Another process that increases the local concentration of N-arm chains is the tetramerization of full-length N-protein. As described earlier, occupancy of the NA-binding site in the NTD allosterically promotes self-assembly of the LRS into higher oligomers (Zhao et al., 2021). We hypothesized that these oligomers may be cooperatively stabilized by additional N-arm interactions in P13L mutants.”

      To state completely unambiguously why weak interfaces are important, we have followed the Reviewer’s suggestion and added an additional clarification already earlier, at the end of the P13L Results section:

      “While this self-association interface in the P13L N-arm is weak and its direct observation in biophysical experiments requires mM concentrations, which far exceed average intracellular concentration of N, such  weak interactions can become highly relevant physiologically when high local concentrations are prevailing, for example, when the disordered extension is preconcentrated while tethered within macromolecular assemblies as in the RNP, or in macromolecular condensates.”

      Furthermore, we have added early in the Discussion:

      “Even though the solution affinity of the N-arm P13L interface is ultra-weak, the average local concentration of N-arm chains across the RNP volume (in a back-of-the-envelope calculation assuming a ≈14 nm cube (Klein et al., 2020) with a dodecameric N cluster) is ≈7.4 mM, such that disordered N-arm peptides could well create populations of N-arm clusters stabilizing RNPs through this interface.  However, besides the RNP-stabilizing mutants we have also observed unexpected RNP destabilization by the ubiquitous R203K/G204R double mutation, which may be caused by the introduction of additional charges close to the self-association interface in the LRS. In our experiments, this destabilization is more than compensated for by the P13L mutation. (Another scenario where ultra-weak interactions can have a critical impact is in molecular condensates. We previously reported the suppression of LLPS by the R203K/G204R mutation, which is rescued by the additional P13L/Δ31-33 mutation (Nguyen et al., 2024). This is consistent with compensatory weak stabilizing and destabilizing impacts of weak interactions on the RNP observed here.)”

      Reviewer #1 (Recommendations for the Authors):

      In Figure 1B, it is unclear what the orange lines connecting polypeptides represent, as well as the zig-zag orange lines in the N-arm.

      We thank the Reviewer for this comment. We intended this to represent regions of self-association but recognize the patterned background is confusing. We have changed this now to solid-colored backgrounds, and indicated this in the figure legend:

      “Regions of self-association are indicated by shaded backgrounds.”

      Regarding presentation, in Figure 5 (MP), the relationship between mass and oligomer size should be shown more clearly.

      We agree. To this end we have labeled the peaks in the MP histograms in Figure 5 with the oligomeric state of the 2N/2SL7 subunits.

      Reviewer #2 (Recommendations for the Authors):

      I find the science of the paper to be convincing and compellingly supported.

      Thank you for this positive statement.

      My primary complaints are with presentation or minor technical questions that, honestly, primarily arise due to my own ignorance and unfamiliarity with some of the techniques employed.

      My primary issue is with the figures. I find, generally, the text in axes labels, ticks, and legends to be too small to comfortably read. This is particularly true in the CD spectra and

      other data presented in Figures 1D, 2B, 4, 5, 6, and 8.

      We agree and have increased the font size of all text and labels of the plots in Figure 1, 2, 4, 5, 6, and 8.

      I also found the use of initialisms to be a bit overbearing and inconsistent. For example, the authors repeatedly switch between spelling out "nucleic acid" and the initialism "NA" (which is also never explicitly spelled out in the text). With the already substantial length of the text, my own personal opinion would be to suggest spelling out all initialisms in the interest of making the reading easier.

      This is a valid criticism. To improve the readability, we have followed this advice and systematically spelled out “nucleic acid” instead of using “NA”.  Similarly, we have now written out full-length instead of the abbreviation FL, and omitted the abbreviation IDR for intrinsically disordered regions, as well as VOC for variant of concern, and AF3 for AlphaFold.

      Regarding the reference to mutants, we have now explained upfront the system of Latin and Greek nomenclature we consistently applied.

      “We will adopt a nomenclature where the complete set of defining mutations of a variant will be referred to by its Greek letter, i.e., N:P13L/R203K/G204R/G214C is N­­<sub>l</sub>, and analogously the set of Omicron mutations N:P13L/Δ31-33/R203K/G204R are referred to as N<sub>ο</sub>; see Table 1”

      I found the text to be verbose, bordering on overly so; the Introduction is more than two pages long. The section "Enhanced oligomerization of the leucine-rich sequence through cysteine mutations" has two long paragraphs of introduction before the present results are discussed, et cetera. An (admittedly, very rough) estimation of the length of the paper places it at ~9,000 -10,000 words long, and I think that the presentation might benefit from significant editing and

      shortening.

      We agree the manuscript is longer than would be desirable, and we generally prefer not to insert mini-introductions into Results sections. On the other hand, in order to make a solid contribution to understanding the big picture of fuzzy complexes in molecular evolution of RNA virus proteins it is indispensable to go into the details of RNP assembly and several of the interfaces. Therefore, we feel the length is in the range that it needs to be without losing clarity. In addition, other Reviewer suggestions to extend the discussion, for example, of limitations of VLP assays and the in vivo state of cysteines, conflict with significant shortening.

      In the particular case of the cysteine mutations, cited by the Reviewer, we believe it is important to add detailed background on G215C, because the Results proceed in a comparison of the self-association mode between G215C and G214C. This is of significant interest in the present context not only for the independent introduction of interface-enhancing mutations highlighting the evolution of fuzzy complexes, but also because it illustrates the pleomorphic ability of RNPs.

      Nonetheless, we have slightly shortened this text and merged the background into a single paragraph. More generally, we have critically reread the text to remove tangential sentences where possible and to make it more concise.

      I have a few more specific comments.

      In Figure 1A, I suggest explicitly labeling the location of the LRS, as it comes up repeatedly.

      Yes, we thank the Reviewer for this suggestion and have introduced this label in Figure 1A.

      In Figure 1B, the legend indicates that the red lines indicate "new inter-dimer interactions." However, these red lines are overlayed on a vertical stripe of red squiggles; it is unclear to me and not explicitly described in the legend what these squiggles are meant to illustrate.

      We agree this background was confusing. As mentioned in our Response to Reviewer #1 we have replaced the structured background with a solid background and explained in the figure legend that these areas depict regions of self-association.

      On lines 44-45, the authors state, "The IDRs amount to 45%, ..." 45% of what?

      Thank you, this was unclear.  We have now clarified “The IDRs amount to ≈45% of total residues”

      In lines 244 - 246, the authors compare the sizes of complexes in reducing versus non- reducing conditions as measured by dynamic light scattering, stating, "However, dynamic light scattering (DLS) revealed the presence of N210-246:G214C complexes with hydrodynamic radii 244 ranging from 6 to 40 nm (in comparison to 1-2 nm for N210- 246:G215C(Zhao et al., 2022)) in reducing conditions, and slightly larger in non-reducing conditions (Supplementary Figure S4)." Using this single statistic seems to me to be a less-than-ideal way of characterizing what seems to me to be happening here. In Supplementary Figure 4, it appears to me that what is happening is that in non-reduced conditions, the sample is monodisperse, whereas in reducing conditions, the distribution becomes polydisperse/bimodal, with two clearly separate populations. I feel that this could use a more

      thorough description rather than just stating the overall range of particle sizes.

      Yes, the Reviewer is correct – it is indeed a good idea to be more precise here. To this end we have carried out cumulant analyses on the autocorrelation functions, as a time-honored method to quantify the polydispersity.  Both samples are polydisperse, but more so in reducing conditions. We have now added “For N210-246:G214C a cumulant analysis results in radii of 8.8 nm and 10.6 nm and polydispersity indices of 0.40 and 0.35 for reducing and non-reducing conditions, respectively”

      Finally, I have one remaining comment that is a result of my own inexperience with circular dichroism and interpreting the spectra. For me personally, I would appreciate a more thoroughdescription/illustration of the statistics involved in the CD spectra, but perhaps this is not necessary for people who are more familiar with interpreting these kinds of data. For example, in Figure 1D, it is not clear to me what the error bars/confidence intervals for the CD data look like. I see many squiggles, some of which the authors claim are significant (e.g., the differences between ~215 - 230 nm), and others are not worthy of comment. Let's say, for example, that I fit a smoothed spline through these data and then measure the magnitude of the fluctuations from that spline to define/quantify confidence intervals. What does that distribution look like? Or maybe the confidence intervals are so small that all squiggles are significant?

      Thank you, this is a good question. As mentioned in the methods section, the CD spectra shown are averages of triplicate scans. Therefore, it is straightforward to extract the standard deviation at each wavelength from the three measurements (although a spline would probably work just as well). The values are what one would expect for the squiggles to be random noise. In the region 215 – 220 nm characteristic for helical secondary structure the standard deviations are small relative to the separation between curves, which indicates that the differences are highly significant. Naturally, the curves do overlap in other spectral regions, which would make a plot including the wavelength-dependent error bars or confidence bands too crowded. Therefore, we have kept the plot of the averaged triplicate scans, but have now provided the average standard deviations for all species in the figure legend and mentioned their significant separation:

      “Triplicate scans yield average standard deviations of 0.13 (N), 0.17 (N+SL7), 0.16 (N<sub>l</sub>), and 0.21 (N<sub>l</sub> +SL7) 10<sup>3</sup> deg cm<sup>2</sup>/dmol, respectively, with non-overlapping confidence bands for the different species, for example, between 215-220 nm.”

      Reviewer #3 (Recommendations for the Authors):

      (1) The Discussion reiterates much of the background (mutational tolerance, fuzziness, SLiMs) already covered in the Introduction, diluting focus on the key new findings. The authors should consider shortening and refocusing the discussion on the main contributions in light of existing knowledge of viral assembly.

      In the Introduction we have provided background on intrinsically disordered proteins in general and their mutational tolerance, as well as the concept of fuzzy complexes. The first several paragraphs of the Discussion have a different focus, which is protein binding interfaces between viral proteins (obviously key in fuzzy complexes), specifically their modulation and the remarkable de novo introduction of binding interfaces. We believe this deserves emphasis, since this highlights a novel aspect of fuzziness, for the mutant spectrum of RNA viruses to encode a range and of assembly stabilities and architectures. 

      To reduce redundancy between the end of the Introduction and the beginning of the Discussion, we have shortened the last paragraph of the Introduction and removed its preview of the conclusions, as described in the response to the next comment of the Reviewer (see below).

      Unfortunately, the length of the Discussion is dictated in part also by the need to discuss methodological aspects, among them the limitations of VLP assays, and the redox state of the cysteine in the LRS mutants, which were important points recommended by other suggestions of the Reviewers. Similarly, we believe the discussion of other potential functions of Omicron N-arm mutations is warranted, as well as the background of the R203K/G204R double mutation that has attracted significant attention in the field due to its effects on phosphorylation and expression of truncated N species that also form RNPs. Our goal was to integrate the results by us and other laboratories regarding specific mutation effects into a comprehensive picture of molecular evolution of N, which we believe the framework of fuzzy complexes can provide.

      (2) The Abstract and early Introduction set a broad stage (IDPs, fuzziness), but don't explicitly state the concrete hypotheses that the experiments test. Please add 2-3 sentences in the Introduction that enumerate testable hypotheses, e.g.:

      (a) P13L creates a new N-arm interface that increases RNP stability.

      (b) G214C/G215C strengthens LRS oligomerization to stabilize higher-order N assemblies.

      We agree the introduction can be improved.  However, it seems to us that it cannot be neatly framed in the hypothesis – answer dichotomy, without losing a lot of nuances and without requiring an even longer and more detailed introduction.

      One of the main questions is to test whether the framework of fuzzy complexes can be applied to understand molecular evolution of N, and we feel the introduction is already flowing well towards this:

      “ … In fuzzy complexes the total binding energy is distributed into multiple distinct ultra-weak interaction sites (Olsen et al., 2017). Similar to individual RNA virus proteins with loose or absent structure, maintaining disorder and a spatial distribution of low-energy interactions in the protein complexes may increase the tolerance for mutations and improve evolvability of protein complexes.\

      The unprecedented worldwide sequencing effort of SARS-CoV-2 genomes during its rapid evolution in humans provides a unique opportunity to examine these concepts. ...”

      To bring this to a more concrete set of questions in the end, we have shortened and rewritten the last paragraph in the Introduction:

      “To examine how architecture and energetics of RNP assemblies can be impacted by N-protein mutations we study a panel of N-proteins derived from ancestral Wuhan-Hu-1 and different VOCs, including Alpha, Delta, Lambda, and Omicron (see Table 1), in biophysical experiments, VLP assays, and mutant virus. Specifically, we ask how the RNP size distribution and life-time is modulated by: (1) the novel binding interface created by the P13L mutation of Omicron; (2) enhancements of other weak self-association interfaces through G215C of Delta and G214C of Lambda; (3) the ubiquitous R203K/G204R double mutation of Alpha, Lambda, and Omicron.  We also test whether the P13L mutation improves viral fitness, similar to G215C and R203K/G204R. The results are discussed in the framework of fuzzy complexes and molecular evolution of N in the course of viral adaptation to the human host. Understanding the salient features of the binding interfaces in viral assembly and their evolution expands our foundation for the design of therapeutics such as assembly inhibitors.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment:

      Glioblastoma is one of the most aggressive cancers without a cure. Glioblastoma cells are known to have high mitochondrial potential. This useful study demonstrates the critical role of the ribosome-associated quality control (RQC) pathway in regulating mitochondrial membrane potential and glioblastoma growth. Some assays are incomplete; further revision will improve the significance of this study.

      For clarity, we propose revising the second sentence to: "It is well-established that certain cancer cells, such as glioblastoma cells, exhibit elevated mitochondrial membrane potential."

      Reviewer #1 (Public Review):

      Summary:

      Cai et al have investigated the role of msiCAT-tailed mitochondrial proteins that frequently exist in glioblastoma stem cells. Overexpression of msiCAT-tailed mitochondrial ATP synthase F1 subunit alpha (ATP5) protein increases the mitochondrial membrane potential and blocks mitochondrial permeability transition pore formation/opening. These changes in mitochondrial properties provide resistance to staurosporine (STS)-induced apoptosis in GBM cells. Therefore, msiCAT-tailing can promote cell survival and migration, while genetic and pharmacological inhibition of msiCAT-tailing can prevent the overgrowth of GBM cells.

      Strengths:

      The CAT-tailing concept has not been explored in cancer settings. Therefore, the present provides new insights for widening the therapeutic avenue. 

      Your acknowledgment of our study's pioneering elements is greatly appreciated.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not directly demonstrated. The conclusions of this paper are mostly well-supported by data, but some aspects of image acquisition and data analysis need to be clarified and extended.

      We are grateful for your acknowledgment of our study’s innovative approach and its possible influence on cancer therapy. We sincerely appreciate your valuable feedback. In response, this updated manuscript presents substantial new findings that reinforce our central argument. Moreover, we have broadened our data analysis and interpretation, as well as refined our methodological descriptions.

      Reviewer #2 (Public Review):

      This work explores the connection between glioblastoma, mito-RQC, and msiCAT-tailing. They build upon previous work concluding that ATP5alpha is CAT-tailed and explore how CAT-tailing may affect cell physiology and sensitivity to chemotherapy. The authors conclude that when ATP5alpha is CAT-tailed, it either incorporates into the proton pump or aggregates and that these events dysregulate MPTP opening and mitochondrial membrane potential and that this regulates drug sensitivity. This work includes several intriguing and novel observations connecting cell physiology, RQC, and drug sensitivity. This is also the first time this reviewer has seen an investigation of how a CAT tail may specifically affect the function of a protein. However, some of the conclusions in this work are not well supported. This significantly weakens the work but can be addressed through further experiments or by weakening the text.

      We appreciate the recognition of our study's novelty. To address your concerns about our conclusions, we have revised the manuscript. This revision includes new data and corrections of identified issues. Our detailed responses to your specific points are outlined below.

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 1B, please replace the high-exposure blots of ATP5 and COX with representative results. The current results are difficult to interpret clearly. Additionally, it would be helpful if the author could explain the nature of the two different bands in NEMF and ANKZF1. Did the authors also examine other RQC factors and mitochondrial ETC proteins? I'm also curious to understand why CAT-tailing is specific to C-I30, ATP5, and COX-V, and why the authors did not show the significance of COX-V.

      We appreciate your inquiry regarding the data.  Additional attempts were made using new patient-derived samples; however, these results did not improve upon the existing ATP5⍺, (NDUS3)C-I30, and COX4 signals presented in the figure.  This is possibly due to the fact that CAT-tail modified mitochondrial proteins represent only a small fraction of the total proteins in these cells.  It is acknowledged that the small tails visible above the prominent main bands are not particularly distinct. To address this, the revised version includes updated images to better illustrate the differences. We believe the assertion that GBM/GSCs possess CAT-tailed proteins is substantiated by a combination of subsequent experimental findings. The figure (refer to new Fig. 1B) serves primarily as an introduction. It is important to note that the CAT-tailed ATP5⍺ plays a vital role in modulating mitochondrial potential and glioma phenotypes, a function which has been demonstrated through subsequent experiments.

      It is acknowledged that the CAT-tail modification is not exclusive to the ATP5⍺protein.  ATP5⍺ was selected as the primary focus of this study due to its prevalence in mitochondria and its specific involvement in cancer development, as noted by Chang YW et al.  Future research will explore the possibility of CAT tails on other mitochondrial ETC proteins. Currently, NDUS3 (C-I30), ATP5⍺, and COX4 serve as examples confirming the existence of these modifications. It remains challenging to detect endogenous CAT-tailing, and bulk proteomics is not yet feasible for this purpose. COX4 is considered significant.  We hypothesize that CAT-tailed COX4 may function similarly to the previously studied C-I30 (Wu Z, et al), potentially causing substantial mitochondrial proteostasis stress.  

      Concerning RQC proteins, our blotting analysis of GBM cell lines now includes additional RQC-related factors. The primary, more prominent bands (indicated by arrowheads) are, in our assessment, the intended bands for NEMF and ANKZF1.  Subsequent blotting analyses showed only single bands for both ANKZF1 and NEMF, respectively. The additional, larger molecular weight band of NEMF, which was initially considered for property analysis (phosphorylation, ubiquitination, etc.), was not examined further as it did not appear in subsequent experiments (refer to new Fig. S1C).

      References:

      Chang YW, et al. Spatial and temporal dynamics of ATP synthase from mitochondria toward the cell surface. Communications biology. 2023;6(1).

      Wu Z, et al. MISTERMINATE Mechanistically Links Mitochondrial Dysfunction With Proteostasis Failure. Molecular cell. 2019;75(4).

      (2) In addition to Figure 1B, it would be interesting to explore CAT-tailed mETC proteins in cancer tissue samples.

      This is an excellent point, and we appreciate the question. We conducted staining for ATP5⍺ and key RQC proteins in both tumor and normal mouse tissues. Notably, ATP5⍺ in GBM exhibited a greater tendency to form clustered punctate patterns compared to normal brain tissue, and not all of it co-localized with the mitochondrial marker TOM20 (refer to new Fig. S3C-E). Crucially, we observed a significant increase in NEMF expression within mouse xenograft tumor tissues, alongside a decrease in ANKZF1 expression (refer to new Fig. S1A, B). These findings align with our observations in human samples.

      (3) Please knock down ATP5 in the patient's cells and check whether both the upper band and lower band of ATP5 have disappeared or not.

      This control was essential and has been executed now. To validate the antibody's specificity, siRNA knockdown was performed. The simultaneous elimination of both upper and lower bands upon siRNA treatment (refer to new Fig. S2A) confirms they represent genuine signals recognized by the antibody.

      (4) In Figure 1C and ID, add long exposure to spot aggregation and oligomer. Figure 1D, please add the blots where control and ATP5 are also shown in NHA and SF (similar to SVG and GSC827).

      New data are included in the revised manuscript to address the queries. Specifically, the new Fig 1D now displays the full queue as requested, featuring blots for Control, ATP5α, AT3, and AT20. Our analysis reveals that AT20 aggregates exhibit higher expression and accumulation rates in GSC and SF cells.

      Fig. 1C has been updated to include experimental groups treated with cycloheximide and sgNEMF. Our results show that sgNEMF effectively inhibits CAT-tailing in GBM cell lines, whereas cycloheximide has no impact. After consulting with the Reporter's original creator and optimizing expression conditions, we observed no significant aggregates with β-globin-non-stop protein, potentially due to the length of endogenous CAT-tail formation (as noted by Inada, 2020, in Cell Reports). Our analysis focused on the ratio of CAT-tailed (red box blots) and non-CAT-tailed proteins (green box blots). Comparing these ratios revealed that both anisomycin treatment and sgNEMF effectively hinder the CAT-tailing process, while cycloheximide has no effect.

      (5) In Figure 1E, please double-check the results with the figure legend. ATP5A aggregated should be shown endogenously. The number of aggregates shown in the bar graph is not represented in micrographs. Please replace the images. For Figure 1E, to confirm the ATP5-specific aggregates, it would be better if the authors would show endogenous immunostaining of C-130 and Cox-IV.

      Labels in Fig. 1E were corrected to reflect that the bar graph in Fig. 1F indicates the number of cells with aggregates, not the quantity of aggregates per cell. The presence

      (6) Figure 3A. Please add representative images in the anisomycin sections. It is difficult to address the difference.

      We appreciate your feedback. Upon re-examining the Calcein fluorescence intensity data in Fig. 3A, we believe the images accurately represent the statistical variations presented in Fig. 3B. To address your concerns more effectively, please specify which signals in Fig. 3A you find potentially misleading. We are prepared to revise or substitute those images accordingly.

      (7) Figure 3D. If NEMF is overexpressed, is the CAT-tailing of ATP 5 reversed?

      Thank you. Your prediction aligns with our findings. We've added data to the revised Fig. S6A, B, which demonstrates that both NEMF overexpression and ANKZF1 knockdown lead to elevated levels of CRC. This increase, however, was not statistically significant in GSC cells. A plausible explanation for this discrepancy is that the MPTP of GSC cells is already closed, thus any additional increase in CAT-tailing activity does not result in further amplification.

      (8) Figure 3G. Why on the BN page are AT20 aggregates not the same as shown in Figure 2E?

      We appreciate your inquiry regarding the ATP5⍺ blots, specifically those in the original Fig. 3G (left) and 2E (right). Careful observation of the ATP5⍺ band placement in these figures reveals a high degree of similarity. Notably, there are aggregates present at the top, and the diffuse signals extend downwards. Given that this is a gradient polyacrylamide native PAGE, the concentration diminishes towards the top. Consequently, the non-rigid nature of the Blue Native PAGE gel may lead to slight variations in the aggregate signals; however, the overall patterns are very much alike. To mitigate potential misinterpretations, we have rearranged the blot order in the new Fig. 3M.

      (9) Figure 4D. The amount of aggregation mediated by AT20 is more compared to AT3. Why are there no such drastic effects observed between AT3 and AT20 in the Tunnel assay?

      The previous Figure 4D presents the quantification of cell migration from the experiment depicted in Figure 4C. But this is a good point. TUNEL staining results are directly influenced by mitochondrial membrane potential and the state of mitochondrial permeability transition pores

      (MPTP), not by the degree of protein aggregation. Our previous experiments showed comparable effects of AT3 and AT20 on mitochondria (Fig. 2E, 3K), which aligns with the expected similar outcomes on TUNEL staining. As for its biological nature, this could be very complicated. We hope to explore it in future studies.

      (10) Figure 5C: The role of NEMF and ANKZF1 can be further clarified by conducting Annexin-PI assays using FACS. The inclusion of these additional data points will provide more robust evidence for CAT-tailing's role in cancer cells.

      In response to your suggestion, we have incorporated additional data into the revised version.Using the Annexin-PI kit, we labeled apoptotic cells and detected them using flow cytometry (FACS). Our findings indicate that anisomycin pretreatment, NEMF knockdown (sgNEMF), and ANZKF1 upregulation (oeANKZF1) significantly increase the rate of STS-induced apoptosis compared to the control group (refer to new Fig. S9D-G).

      (11) Figure 5F: STS is a known apoptosis inhibitor. Why it is not showing PARP cleavage? Also, cell death analysis would be more pronounced, if it could be shown at a later time point. What is the STS and Anisomycin at 24h or 48h time-point? Since PARP is cleaved, it would also be better if the authors could include caspase blots.

      I guess what you meant to say here is "Staurosporine is a protein kinase inhibitor that can induce apoptosis in multiple mammalian cell lines." Our study observed PARP cleavage even in GSCs, which are typically more resistant to staurosporine-induced apoptosis (C-PARP in Fig. S9B). The ratio of C-PARP to total PARP increased. We selected a 180-minute treatment duration because longer treatments with STS + anisomycin led to a late stage of apoptosis and non-specific protein degradation (e.g., at 24 or 48 hours), making PARP comparisons less meaningful. Following your suggestion, we also examined caspase 3/7 activity in GSC cells treated with DMSO, CHX, and anisomycin. We found that anisomycin treatment also activated caspases (Fig. S9A).

      (12) In Figure 5, the addition of an explanation, how CAT-tailing can induce cell death, would add more information such as BAX-BCL2 ratio, and cytochrome-c release from the mitochondria.

      Thank you for your suggestion. In this study, we state that specific CAT-tails inhibit GSC cell death/apoptosis rather than inducing it. Therefore, we do not expect that examining BAX-BCL2 and mitochondrial cytochrome c release would offer additional insights.

      (13) To confirm the STS resistance, it would be better if the author could do the experiments in the STS-resistant cell line and then perform the Anisomycin experiments.

      Thank you. We should emphasize that our data primarily originates from GSC cells. These cells already exhibit STS-resistance when compared to the control cells (Fig. S8A-C).

      (14) It would be more advantageous if the author could show ATP5 CATailed status under standard chemotherapy conditions in either cell lines or in vivo conditions.

      This is an interesting question. It's worth exploring this question; however, GSC cells exhibit strong resistance to standard chemotherapy treatments like temozolomide (TMZ).

      Additionally, we couldn't detect changes in CAT-tailed ATP5⍺ and thus did not include that data.

      (15) In vivo (cancer mouse model or cancer fly model) data will add more weight to the story.

      We appreciate your intriguing question. An effective approach would be to test the RQC pathway's function using the Drosophila Notch overexpression-induced brain tumor model. However, Khaket et al. have conducted similar studies, stating, "The RNAi of Clbn, VCP, and Listerin (Ltn), homologs of key components of the yeast RQC machinery, all attenuated NSC over-proliferation induced by Notch OE (Figs. 5A and S5A–D, G)." This data supports our theory, and we have incorporated it into the Discussion. While the mouse model more closely resembles the clinical setting, it is not covered by our current IACUC proposal. We intend to verify this hypothesis in a future study.

      Reference:

      Khaket TP, Rimal S, Wang X, Bhurtel S, Wu YC, Lu B. Ribosome stalling during c-myc translation presents actionable cancer cell vulnerability. PNAS Nexus. 2024 Aug 13;3(8):pgae321.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1B, C: To demonstrate that Globin, ATP5alpha, and C-130 are CAT-tailed, it is necessary to show that the high mobility band disappears after NEMF deletion or mutagenesis of the NFACT domain of NEMF. This can be done in a cell line. The anisomycin experiment is not convincing because the intensity of the bands drops and because no control is done to show that the effects are not due to translation inhibition (e.g. cycloheximide, which inhibits translation but not CAT tailing). Establishing ATP5alpha as a bonafide RQC substrate and CAT-tailed protein is critical to the relevance of the rest of the paper.

      Thank you for suggesting this crucial control experiment. To confirm the observed signal is indeed a bona fide CAT-tail, it's essential to demonstrate that NEMF is necessary for the CAT-tailing process. We have incorporated data from NEMF knockdown (sgNEMF) and cycloheximide treatment into the revised manuscript. Our findings show that both sgNEMF and anisomycin treatment effectively inhibit the formation of CAT-tailing signals on the reporter protein (Fig. 1C). Similarly, NEMF knockdown in a GSC cell line also effectively eliminated CAT-tails on overexpressed ATP5⍺ (Fig. S2B).

      In general, the text should be weakened to reflect that conclusions were largely gleaned from artificial CAT tails made of AT repeats rather than endogenously CAT-tailed ATP5alpha. CAT tails could have other sequences or be made of pure alanine, as has been suggested by some studies.

      Thank you for your reminder. We have reviewed the recent studies by Khan et al. and Chang et al., and we found their analysis of CAT tail components to be highly insightful. We concur with your suggestion regarding the design of the CAT tail sequence. We aimed to design a tail that maintained stability and resisted rapid degradation, regardless of its length. In the revised version, we clarify that our conclusions are based on artificial CAT tails, specifically those composed of AT repeat sequences (p. 9). We acknowledge that the presence of other sequence components may lead to different outcomes (p. 19).

      Reference:

      Khan D, Vinayak AA, Sitron CS, Brandman O. Mechanochemical forces regulate the composition and fate of stalled nascent chains. bioRxiv [Preprint]. 2024 Oct 14:2024.08.02.606406. Chang WD, Yoon MJ, Yeo KH, Choe YJ. Threonine-rich carboxyl-terminal extension drives aggregation of stalled polypeptides. Mol Cell. 2024 Nov 21;84(22):4334-4349.e7. 

      Throughout the work (e.g. 3B, C), anisomycin effects should be compared to those with cycloheximide to observe if the effects are specific to a CAT tail inhibitor rather than a translation inhibitor.

      We agree that including cycloheximide control experiments is crucial. The revised version now incorporates new data, as depicted in Fig. S5A, B, illustrating alterations in the on/off state of MPTP following cycloheximide treatment. Furthermore, Fig. S6A, B present changes in Calcium Retention Capacity (CRC) under cycloheximide treatment. The consistency of results across these experiments, despite cycloheximide treatment, suggests that anisomycin's role is specifically as a CAT tail inhibitor, rather than a translation inhibitor.

      Line 110, it is unclear what "short-tailed ATP5" is. Do you mean ATP5alpha-AT3? If so this needs to be introduced properly. Line 132: should say "may indicate accumulation of CAT-tailed protein" rather than "imply".

      We acknowledge your points. We have clarified that the "short-tailed ATP5α" refers to ATP5α-AT3 and incorporated the requested changes into the revised manuscript.

      Figure 1C: how big are those potential CAT-tails (need to be verified as mentioned earlier)?They look gigantic. Include a ladder.

      In the revised Fig. 1D, molecular weight markers have been included to denote signal sizes. The aggregates in the previous Fig. 1C, also present in the control plasmid, are likely a result of signal overexposure. The CAT-tailed protein is observed just above the intended band in these blots. These aggregates have been re-presented in the updated figures, and their signal intensities quantified.

      Line 170: "indicating that GBM cells have more capability to deal with protein aggregation". This logic is unclear. Please explain.

      We appreciate your question and have thoroughly re-evaluated our conclusion. We offer several potential explanations for the data presented in Fig. 1D: (1) ATP5α-AT20 may demonstrate superior stability. (2) GSC (GBM) cells might lack adequate mechanisms to monitor protein accumulation. (3) GSC (GBM) cells could possess an increased adaptive capacity to the toxicity arising from protein accumulation. This discussion has been incorporated into the revised manuscript (lines 166-169).

      Line 177: how do you know the endogenous ATP5alpha forms aggregates due to CAT-tailing? Need to measure in a NEMF hypomorph.

      We understand your concern and have addressed it. Revised Fig. 3G, H demonstrates that a reduction in NEMF levels, achieved through sgNEMF in GSC cells, significantly diminishes ATP5α aggregation. This, in conjunction with the Anisomycin treatment data presented in revised Fig. 3E, F, confirms the substantial impact of the CAT-tailing process on this aggregation.

      Line 218: really need a cycloheximide or NEMF hypomorph control to show this specific to CAT-tailing.

      We have revised the manuscript to include data from sgNEMF and cycloheximide treatments, specifically Fig. 3G, H, and Fig. S5C, D, as detailed in our response above.

      Lines 249,266, Figure 5A: The mentioned experiments would benefit from controls including an extension of ATP5alpha that was not alanine and threonine, perhaps a gly-ser linker, as well as an NEMF hypomorph.

      We sincerely appreciate your insightful comments. In response, the revised manuscript now incorporates control data for ATP5α featuring a poly-glycine-serine (GS) tail. This data is specifically presented in Figs. S2E-G, S4E, S7A, D, E, and S8F, G. Our experimental findings consistently demonstrate that the overexpression of ATP5α, when modified with GS tails, had no discernible impact on protein aggregation, mitochondrial membrane potential, GSC cell mobility, or any other indicators assessed in our study.

      Figure S5A should be part of the main figures and not in the supplement.

      This has been moved to the main figure (Fig. 5C).

    1. Reviewer #2 (Public review):

      Summary:

      The authors decomposed response times into component processes and manipulated the duration of these processes in opposing directions by varying contrast, and overall by manipulating speed-accuracy tradeoffs. They identify different processes and their durations by identifying neural states in time and validate their functional significance by showing that their properties vary selectively as expected with predicted effects of the contrast manipulation. They identify 4 processes: stimulus encoding, attention orienting, decision and motor execution. These map onto 5 classical event related potentials. The decision-making component matched the CPP and its properties varied with contrast and predicted decision-accuracy.

      Strengths:

      The design of the experiment is remarkable and offers crucial insights. The analyses techniques are beyond-state-of-the art and the analyses are well motivated and offer clear insights.

      Weaknesses:

      The number of identified events depends on the parameter setting of the analysis. While the authors discuss weaknesses of the approach this needs to be made explicit as well. It is also unclear to what extent topographies map onto processes since e.g., different combinations of sources can lead to the same scalp topography.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      From my reading, this study aimed to achieve two things:  

      (1) A neurally-informed account of how Pieron's and Fechner's laws can apply in concert at distinct processing levels.  

      (2) A comprehensive map in time and space of all neural events intervening between stimulus and response in an immediately-reported perceptual decision.  

      I believe that the authors achieved the first point, mainly owing to a clever contrast comparison paradigm, but with good help also from a new topographic parsing algorithm they created. With this, they found that the time intervening between an early initial sensory evoked potential and an "N2" type process associated with launching the decision process varies inversely with contrast according to Pieron's law. Meanwhile, the interval from that second event up to a neural event peaking just before response increases with contrast, fitting Fechner's law, and a very nice finding is that a diffusion model whose drift rates are scaled by Fechner's law, fit to RT, predicts the observed proportion of correct responses very well. These are all strengths of the study.   

      We thank the reviewer for their comments that added context to the events we detected in relation to previous findings. We also believe that the change in the HMP algorithm suggested by the reviewer improved the precision of our analyses and the manuscript. We respond to the reviewer’s specific comments below.

      (1) The second, generally stated aim above is, in the opinion of this reviewer, unconvincing and ill-defined. Presumably, the full sequence of neural events is massively task-dependent, and surely it is more in number than just three. Even the sensory evoked potential typically observed for average ERPs, even for passive viewing, would include a series of 3 or more components - C1, P1, N1, etc. So are some events being missed? Perhaps the authors are identifying key events that impressively demarcate Pieron- and Fechner-adherent sections of the RT, but they might want to temper the claim that they are finding ALL events. In addition, the propensity for topographic parsing algorithms to potentially lump together distinct processes that partially co-evolve should be acknowledged.  

      We agree with the reviewer that the topographical solutions found by HMP will be dependent on the task and the quality and type of data. We address this point in the last section of the discussion (see also response to R3.5). We would also like to add that the events detected by HMP are, by construction, those that contribute to the RT and not necessarily all ERPs elicited by a stimulus.

      In addition to the new last section of the discussion we also make these points clear in the revised manuscript at the discussion start: 

      “By modeling the recorded single-trial EEG signal between stimulus onset and response as a sequence of multivariate events with varying by-trial peak times, we  aimed to detect recurrent events that contribute to the duration of the reaction time in the present perceptual decision-making task”.

      Regarding the typical visual ERPs, in response to this comment but also comments R1.2, R1.3 and R2.1, we aimed for a more precise description of the topographies and thus reduced the width of the HMP expected events to 25ms. This ensures that we do not miss events shorter than the initial expectations of 50ms (see Appendix B of Weindel et al., 2024 and also response to  R1.3). This new estimation provides evidence for at least two of the visual ERPs that, based on their timings and topographies (in relation with the spatial frequency of the stimulus), we interpret as the N40 and the P100 (see response to R1.5 for the justification of this categorization). We provide a description and justification of the interpretations in the result section “Five trial-recurrent sequential events occur in the EEG during decisions” and the discussion section “Visual encoding time”.

      (2) To take a salient example, the last neural event seems to blend the centroparietal positivity with a more frontal midline negativity, some of which would capture the CNV and some motor-execution related components that are more tightly time-locked to, of course, the response. If the authors plotted the traditional single-electrode ERP at the frontal focus and centroparietal focus separately, they are likely to see very different dynamics and contrast- and SAT-dependency. What does this mean for the validity of the multivariate method? If two or more components are being lumped into one neural event, wouldn't it mean that properties of one (e.g., frontal burstiness at response) are being misattributed to the other (centroparietal signal that also peaks but less sharply at response)?

      Using the new HMP parameterization described above we show that the reviewer's intuition was correct. Using an expected pattern duration of 25ms the last event in the original manuscript splits in two events. The before-last event, now referred to the lateralized readiness potential (LRP) presents a strong lateralization (Figure 3) with an increased negativity over the motor cortex contralateral to the right hand. The effect of contrast is mostly on the last event that we interpret as the CPP (Figure 5). Despite the improved precision of the topographies of the identified events, it is however to be noted that some components will overlap. If the LRP is generated when a certain amount of evidence is accumulated (e.g. that the CPP crosses a certain value) then a time-based topography will necessarily include that CPP activity in addition to the lateralized potential. We discuss this in the section “Motor execution” of the discussion:

      “Adding the abrupt onset of this potential, we believe that this event is the start of motor execution, engaged after a certain amount of evidence. The evidence for this interpretation is manifest in the fact that the event's topography shares some activity with the CPP event that follows, an expected result if the LRP is triggered at a certain amount of evidence, indexed by the CPP”.

      (3) Also related to the method, why must the neural events all be 50 ms wide, and what happens if that is changed? Is it realistic that these neural events would be the same duration on every trial, even if their duration was a free parameter? This might be reasonable for sensory and motor components, but unlikely for cognitive.  

      The HMP method is sensitive to the event's duration as shown in the manuscript about the method (Appendix B of Weindel et al., 2024). Nevertheless as long as the topography in the real data is longer than the expected one it shouldn't be missed (i.e. same goes for by-trial variations in the event width). For this reason we halved the expected event width of 50ms (introduced by the original HsMM-MVPA paper by Anderson and colleagues) in the revision. This new estimation with 25ms thus is much less likely to miss events as evidenced by the new visual and motor events. In the revised manuscript this is addressed at the start of the Results section:

      “Contrary to previous applications (Anderson et al.,2016; Berberyan et al., 2021; Zhang et al., 2018; Krause et al., 2024) we assumed that the multivariate pattern was represented by a 25ms half-sine as our previous research showed that a shorter expected pattern width increases the likelihood of detecting cognitive events (see Appendix B of Weindel et al., 2024)”.

      Regarding the event width as a free parameter this is both technically and statistically difficult to implement as the amount of computing capacity, flexibility and trade-offs among the HMP parameters would, given the current implementation, render the model unfit for most computers and statistically unidentifiable.

      (4) In general, I wonder about the analytic advantage of the parsing method - the paradigm itself is so well-designed that the story may be clear from standard average event-related potential analysis, and this might sidestep the doubts around whether the algorithm is correctly parsing all neural events.  

      Average ERP analysis suffers from an impossibility to differentiate between an effect of an experimental factor on the amplitude vs. on the timing of the underlying components (Luck, 2005). Furthermore the overlap of components across trials bluries the distinction between them. For both reasons we would not be able to reach the same level of certainty and precision using ERP analyses. Furthermore the relatively low number of trials per experimental cell (contrast level X SAT X participant = 6 trials) makes the analyses hard to perform on ERP which typically require more trials per modality. From the reviewer’s comment we understand that this point was not clear. We therefore discuss this in the revision, Section “Functional interpretation of the events” of the results:

      “Nevertheless identifying neural dynamics on these ERPs centered on stimulus is complicated by the time variation of the underlying single-trial events (see probabilities displayed in Figure 3 for an illustration and Burle et al., 2008, for a discussion). The likely impact of contrast on both amplitude and time on the underlying single-trial event does not allow one to interpret the average ERP traces as showing an effect in one or the other dimension without strong assumptions (Luck, 2005)”.

      (5) In particular, would the authors consider plotting CPP waveforms in the traditional way, across contrast levels? The elegant design is such that the C1 component (which has similar topography) will show up negative and early, giving way to the CPP, and these two components will show opposite amplitude variations (not just temporal intervals as is this paper's main focus), because the brighter the two gratings, the stronger the aggregate early sensory response but the weaker the decision evidence due to Fechner. I believe this would provide a simple, helpful corroborating analysis to back up the main functional interpretation in the paper.  

      We agree with the suggestion and have introduced the representation on top of Figure 5 for sets of three electrodes in the occipital, posterior and frontal regions. The new panels clearly show an inversion of the contrast effect dependent on the time and locus of the electrodes. We discuss this in Section “Functional interpretation of the events” of the results:

      “This representation shows that there is an inversion of the contrast effect with higher contrasts having a higher amplitude on the electrodes associated with visual potentials in the first couple of deciseconds (left panel of Figure 5A) while parietal and frontal electrodes shows a higher amplitude for lower contrasts in later portions of the ERPs (middle and right panel of Figure 5A)”.

      To us, this crucially shows that we cannot achieve the same decomposition using traditional ERP analyses. In these plots it appears that while, as described by the reviewer, there is an inversion, the timing and amplitude of the changes due to contrast can hardly be interpreted.

      (6) The first component is picking up on the C1 component (which is negative for these stimulus locations), not a "P100". Please consult any visual evoked potential study (e.g., Luck, Hillyard, etc). It is unexpected that this does not vary in latency with contrast - see, for example. Gebodh et al (2017, Brain Topography) - and there is little discussion of this. Could it be that nonlinear trends were not correctly tested for?  

      We disagree with the reviewer on the interpretation of the ERP. The timing of the detected component is later than the one usually associated with a C1. Furthermore the central display does not create optimal conditions to detect a C1

      We do agree that the topography raises the confusion but we believe that this is due to the spatial frequency of the stimulus that generates a high posterior positivity (see references in the following extract). The new HMP solution also now happens to show an effect of contrast on the P100 latencies, we believe this is due to the increased precision in the time location of the component. We discuss this in the “Visual encoding time” section of the discussion:

      “The following event, the P100, is expressed around 70ms after the N40, its topography is congruent with reports for stimuli with low spatial frequencies as used in the current study (Kenemans et al., 2002, 2000; Proverbio et al., 1996). The timing of this P100 component is changed by the contrast of the stimulus in the direction expected by the Piéron law (Figure 4A)”. 

      (7) There is very little analysis or discussion of the second stage linked to attention orientation - what would the role of attention orientation be in this task? Is it spatial attention directed to the higher contrast grating (and if so, should it lateralise accordingly?), or is it more of an alerting function the authors have in mind here?  

      We agree that we were not specific enough on the interpretation of this attention stage. We now discuss our hypothesis in the section “Attention orientation” of the discussion:  

      “We do however observe an asymmetry in the topographical map Figure 3. This asymmetry might point to an attentional bias with participants (or at least some participants) allocating attention to one side over the other in the same way as the N2pc component (Luck and Hillyard, 1994, Luck et al., 1997). Based on this collection of observations, we conclude that this third event represents an attention orientation process. In line with the finding of Philiastides et al. (2006), this attention orientation event might also relate to the allocation of resources. Other designs varying the expected cognitive load or spatial attention could help in further interpreting the functional role of this third event”.

      We would like to add that it is unlikely that the asymmetry we mention in the discussion cannot stem from the redirection towards higher contrast as the experimental design balanced the side of presentation. We therefore believe that this is a behavioral bias rather than a bias toward the highest contrast stimulus as suggested by the reviewer. We hope that, while more could be tested and discussed, this discussion is sufficient given the current manuscript's goal.

      Reviewer #2 (Public review):  

      Summary:  

      The authors decomposed response times into component processes and manipulated the duration of these processes in opposing directions by varying contrast, and overall by manipulating speed-accuracy tradeoffs. They identify different processes and their durations by identifying neural states in time and validate their functional significance by showing that their properties vary selectively as expected with the predicted effects of the contrast manipulation. They identify 3 processes: stimulus encoding, attention orienting, and decision. These map onto classical event-related potentials. The decision-making component matched the CPP, and its properties varied with contrast and predicted decision-accuracy, while also exhibiting a burst not characteristic of evidence accumulation.  

      Strengths:  

      The design of the experiment is remarkable and offers crucial insights. The analysis techniques are beyond state-of-the-art, and the analyses are well motivated and offer clear insights.  

      Weaknesses:  

      It is not clear to me that the results confirm that there are only 3 processes, since e.g., motor preparation and execution were not captured. While the authors discuss this, this is a clear weakness of the approach, as other components may also have been missed. It is also unclear to what extent topographies map onto processes, since, e.g., different combinations of sources can lead to the same scalp topography.  

      We thank the reviewer for their kind words and for the attention they brought on the question of the missing motor preparation event. In light of this comment (and also R1.1, R3.3) the revised manuscript uses a finer grained approach for the multivariate event detection. This preciser estimation comes from the use of a shorter expected pattern in which the initial expectation of a 50ms half-sine was halved, therefore ensuring that we do not miss events shorter than the initial expectations (see Appendix B of Weindel et al., 2024 and also response to  R1.3). In the new solution the motor component that the reviewer expected is found as evidenced by the topography of the event, its lateralization and a time-to-response congruent with a response execution event. This is now described in the section “Motor execution” of the revised manuscript: 

      “The before last event, identified as the LRP, shows a strong hemispheric asymmetry congruent with a right hand response. The peak of this event is approximately 100 ms before the response which is congruent with reports that the LRP peaks at the onset of electromyographical activity in the effector muscle (Burle et al., 2004), typically happening 100ms before the response in such decision-making tasks (Weindel et al., 2021). Furthermore, while its peak time is dependent on contrast, its expression in the EEG is less clearly related to the contrast manipulation than the following CPP event”.

      Reviewer #3 (Public review):  

      Summary:  

      In this manuscript, the authors examine the processing stages involved in perceptual decision-making using a new approach to analysing EEG data, combined with a critical stimulus manipulation. This new EEG analysis method enables single-trial estimates of the timing and amplitude of transient changes in EEG time-series, recurrent across trials in a behavioural task. The authors find evidence for three events between stimulus onset and the response in a two-spatial-interval visual discrimination task. By analysing the timing and amplitude of these events in relation to behaviour and the stimulus manipulation, the authors interpret these events as related to separable processing stages for stimulus encoding, attention orientation, and decision (deliberation). This is largely consistent with previous findings from both event-related potentials (across trials) and single-trial estimates using decoding techniques and neural network approaches.  

      Strengths:  

      This work is not only important for the conceptual advance, but also in promoting this new analysis technique, which will likely prove useful in future research. For the broader picture, this work is an excellent example of the utility of neural measures for mental chronometry.  

      We appreciate the very positive review and thank the reviewer for pointing out important weaknesses in our original manuscript and also providing resources to address them in the recommendations to authors. Below we comment on each identified weakness and how we addressed them.   

      Weaknesses:  

      (1) The manuscript would benefit from some conceptual clarifications, which are important for readers to understand this manuscript as a stand-alone work. This includes clearer definitions of Piéron's and Fechner's laws, and a fuller description of the EEG analysis technique.

      We agree that the description of both laws were insufficient, we therefore added the following text in the last paragraph of the introduction:

      “Piéron’s law predicts that the time to perceive the two stimuli (and thus the choice situation) should follow a negative power law with the stimulus intensity (Figure 1, green curve). In contradistinction, Fechner’s law states that the perceived difference between the two patches follows the logarithm of the absolute contrast of the two patches (Figure 1, yellow curve). As the task of our participants is to judge the contrast difference, Piéron’s law should predict the time at which the comparison starts (i.e. the stimuli become perceptible), while Fechner’s law should implement the comparison, and thus decision, difficulty”.

      Regarding the EEG analysis technique we added a few elements at the start of the result:

      “The hidden multivariate pattern model (HMP) implemented assumed that a task-related multivariate pattern event is represented by a half-sine whose timing varies from trial to trial based on a gamma distribution with a shape parameter of 2 and a scale, controlling the average latency of the event, free-to-vary per event (Weindel et al., 2024)”.

      We also made the technique clearer at the start of the discussion:

      “By modeling the recorded single-trial EEG signal between stimulus onset and response as a sequence of multivariate events with varying by-trial peak times, we aimed to detect recurrent events that contribute to the duration of the reaction time in the present perceptual decision-making task. In addition to the number of events, using this hidden multivariate pattern approach (Weindel et al., 2024) we estimated the trial-by-trial probability of each event’s peak, therefore accessing at which time sample each event was the most likely to occur”.

      Additionally, we added a proper description in the method section (see the new first paragraph of the “Hidden multivariate pattern” subsection). 

      (2) The manuscript, broadly, but the introduction especially, may be improved by clearly delineating the multiple aims of this project: examining the processes for decision-making, obtaining single-trial estimates of meaningful EEG-events, and whether central parietal positivity reflects ramping activity or steps averaged across trials.

      For the sake of clarity we removed the question of the ramping activity vs steps in the introduction and focused on the processes in decision-making and their single-trial measurement as this is the main topic of the paper. Furthermore the references provided by the reviewer allowed us to write a more comprehensive review of previous studies and how the current study is in line with those. These changes are mainly manifested in these new sentences:

      “As an example Philiastides et al. (2006) used a classifier on the EEG activity of several conditions to show that the strength of an early EEG component was proportional to the strength of the stimulus while a later component was related to decision difficulty and behavioral performance (see also Salvador et al., 2022; Philiastides and Sajda, 2006). Furthermore the authors interpreted that a third EEG component was indicative of the resource allocated to the upcoming decision given the perceived decision difficulty. In their study, they showed that it is possible to use single-trial information to separate cognitive processes within decision-making. Nevertheless, their method requires a decoding approach, which requires separate classifiers for each component of interest and restrains the detection of the components to those with decodable discriminating features (e.g. stimuli with strong neural generators such as face stimuli, see Philiastides et al., 2006)”.

      (3) A fuller discussion of the limitations of the work, in particular, the absence of motor contributions to reaction time, would also be appreciated. 

      As laid out in responses to comments R1.1 and R2 the new estimates now include evidence for a motor preparation component. We discuss this in the new “motor execution” paragraph in the discussion section. Additionally we discuss the limitation of the study and the method in the two last paragraphs of the discussion (in the new Section “Generalization and limitation”).

      (4) At times, the novelty of the work is perhaps overstated. Rather, readers may appreciate a more comprehensive discussion of the distinctions between the current work and previous techniques to gauge single-trial estimates of decision-related activity, as well as previous findings concerning distinct processing stages in decision-making. Moreover, a discussion of how the events described in this study might generalise to different decision-making tasks in different contexts (for example, in auditory perception, or even value-based decision-making) would also be appreciated.  

      We agree that the original text could be read as overstating. In addition to the changes linked to R3.2 we also now discuss the link with the previous studies in the before-last paragraph of the discussion before the conclusion in the new “Generalization and limitations” section:

      “The present study showed what cognitive processes are contributing to the reaction time and estimated single-trial times of these processes for this specific perceptual decision-making task. The identified processes and topographies ought to be dependent on the task and even the stimuli (e.g. sensory events will change with the sensory modality). More complex designs might generate a higher number of cognitive processes (e.g. memory retrieval from a cue, Anderson et al., 2016) and so could more natural stimuli which might trigger other processes in the EEG (e.g. appraisal vs. choice as shown by Frömer et al., 2024). Nevertheless, the observation of early sensory vs. late decision EEG components is likely to generalize across many stimuli and tasks as it has been observed in other designs and methods (Philiastides et al., 2006; Salvador et al., 2022). To these studies we add that we can evaluate the trial-level contribution, as already done for specific processes (e.g. Si et al., 2020; Sturm et al., 2016), for the collection of events detected in the current study”.

      Reviewing Editor Comments:  

      As you will see, all three reviewers agree that the paper makes a valuable contribution and has many strengths. You will also see that they have provided a range of constructive comments highlighting potential issues with the interpretation of the outcomes of your signal decomposition method. In particular, all three reviewers point out that your results do not identify separate motor preparation signals, which we know must be operating on this type of task. The reviewers suggest further discussion of this issue and the potential limitations of your analysis approach, as well as suggesting some additional analyses that could be run to explore this further. While making these changes would undoubtedly enhance the paper and the final public reviews, I should note that my sense is that they are unlikely to change the reviewers' ratings of the significance of the findings and the strength of evidence in the final eLife assessment  

      Reviewer #1 (Recommendations for the authors):  

      (1) Abstract: "choice onset" is ill-defined and not the label most would give the start of the RT interval. Do you mean stimulus onset?  

      We replaced with "choice onset" with "stimulus onset" in the abstract

      (2) Similarly "choice elements" in the introduction seem to refer to sensory attributes/objects being decided about?  

      We replaced "choice-elements" with "choice-relevant features of the stimuli"

      (3) "how the RT emerges from these putative components" - it would be helpful to specify more what level of answer you're looking for, as one could simply answer "when they're done."  

      We replaced with "how the variability in RTs emerges from these putative components"

      (4) Line 61-62: I'm not sure this is a fully correct characterisation of Frömer et al. It was not similar in invoking a step function - it did not invoke any particular mechanism or function, and in that respect does not compare well to Latimer et al. Also, I believe it was the overlap of stimulus-locked components, not response-locked, that they argued could falsely generate accumulator-like buildup in the response-locked ERP.  

      We indeed wrongly described Frömer et al. The sentence is now "In human EEG data, the classical observation of a slowly evolving centro-parietal positivity, scaling with evidence accumulation, was suggested to result from the overlap of time-varying stimulus-related activity in the response-locked event related potential"

      (5) Line 78: Should this be single-trial *latency*?  

      This referred to location in time but we agree that the term is confusing and thus replaced it with latencies.

      (6) The caption of Figure 1 should state what is meant by the y-axis "time"  

      We added the sentence "The y-axis refers the time predicted by each law given a contrast value (x-axis) and the chosen set of parameters." in the caption of Figure 1

      (7) Line 107: Is this the correct description of Fechner's law? If the perceived difference follows the log of the physical difference, then a constant physical difference should mean a constant perceived difference. Perhaps a typo here.  

      This was indeed a typo we replaced the corresponding part of the sentence with "the perceived difference between the two patches follows the logarithm of the absolute contrast of the two patches"

      (8) Line 128: By scale, do you mean magnitude/amplitude?  

      No, this refers to the parameter of a gamma distribution. To clarify we edited the sentence:  "based on a gamma distribution with a shape parameter of 2 and a scale parameter, controlling the average latency of the event, free-to-vary per event"

      (9) The caption of Figure 3 is insufficient to make sense of the top panel. What does the inter-event interval mean, and why is it important to show? What is the "response" event?  

      We agree that the top panel was insufficiently described. To keep the length of the paper short and because of the relatively low amount of information provided by these panels we replaced them for a figure only showing the average topographies as well as the asymmetry tests for each event.

      (10) Figure 4: caption should say what the top vs bottom row represents (presumably, accuracy vs speed emphasis?), and what the individual dots represent, given the caption says these are "trial and participant averaged". A legend should be provided for the rightmost panels.  

      We agree and therefore edited Figure 4. The beginning of the caption mentioned by the reviewer now reads: “A) The panels represent the average duration between events for each contrast level, averaged across participants and trials (stimulus and response respectively as first and last events) for accuracy (top) and speed instructions (bottom).”. Additionally we added legends for the SAT instructions and the model fits.

      (11) Line 189: argued for a decision-making role of what?  

      Stafford and Gurney (2004) proposed that Pieron’s law could reflect a non-linear transformation from sensory input to action outcomes, which they argued reflected a response mechanism. We (Van Maanen et al., 2012) specified this result by showing that a Bayesian Observer Model in which evidence for two alternative options was accumulated following Bayes Rule indeed predicted a power relation between the difference in sensory input of the two alternatives, and mean RT. However, the current data suggest that such an explanation cannot be the full story, as also noted by R3. To clarify this point we replaced the comment by the following sentence:

      “Note that this observation is not necessarily incongruent with theoretical work that argued that Piéron’s law could also be a result of a response selection mechanism (Stafford and Gurney, 2004; Van Maanen et al., 2012; Palmer et al., 2005). It could be that differences in stimulus intensity between the two options also contribute to a Piéron-like relationship in the later intervals, that is convoluted with Fechner’s law (see Donkin and Van Maanen, 2014 for a similar argument). Unfortunately, our data do not allow us to discriminate between a pure logarithmic growth function and one that is mediated by a decreasing power function”.

      (12) Table 2: There is an SAT effect even on the first interval, which is quite remarkable and could be discussed more - does this mean that the C1 component occurs earlier under speed pressure? This would be the first such finding.  

      The original event we qualified as a P100 was sensitive to SAT but the earliest event is now the N40 and isn’t statistically sensitive to speed pressure in this data. We believe that the fact that the P100 is still sensitive to SAT is not a surprise and therefore do not outline it.

      (13) Line 221: "decrease of activation when contrast (and thus difficulty) increases" - is this shown somewhere in the paper?  

      The whole section for this analysis was rewritten (see comment below)

      (14) I find the analysis of Figure 5 interesting, but the interpretation odd. What is found is that the peak of the decision signal aligns with the response, consistent with previous work, but the authors choose to interpret this as the decision signal "occurring as a short-lived burst." Where is the quantitative analysis of its duration across trials? It can at least be visually appraised in the surface plot, and this shows that the signal has a stimulus-locked onset and, apart from the slowest RTs, remains present and for the most part building, until response. What about this is burst-like? A peak is not a burst.  

      This was the residue of a previous version of the paper where an analysis reported that no evidence accumulation trace was found. But after proper simulations this analysis turned out to be false because of a poor statistical test. Thus we removed this paragraph in the revised manuscript and Figure 5 has now been extended to include surface plots for all the events.

      Reviewer #2 (Recommendations for the authors):  

      Overall, I really enjoyed reading this paper. However, in some places the approach is a bit opaque or the results are difficult to follow. As I read the paper, I noted:  

      Did you do a simple DDM, or did you do a collapsing bound for speed?  

      The fitted DDM was an adaptation of the proportional rate diffusion model. We make this clearer at the end of the introduction: "Given that Fechner’s law is expected to capture decision difficulty we connected this law to the classical diffusion decision models by replacing the rate of accumulation with Fechner’s law in the proportional rate diffusion model of Palmer et al.(2005).”

      It is confusing that the order of intervals in the text doesn't match the order in the table. It might be better to say what events the interval is between rather than assuming that the reader reconstructs.  

      We agree and adapted the order in both the text and the table. The table is now also more explicit (e.g. RT instead of S-R)

      Otherwise, I do wonder to what extent the method is able to differentiate processes that yield similar scalp topographies and find it a bit concerning that no motor component was identified.  

      We believe that the new version with the LRP/CPP is a demonstration that the method can handle similar topographies. The method can handle events with close topographies as long as they are separate in time, however if they are not sequential to one another the method cannot capture both events. We now discuss this, in relation with the C1/P100 overlap, in the discussion section “Visual encoding time”:

      “Nevertheless this event, seemingly overlapping with the P100 even at the trial level (Figure 5C), cannot be recovered by the method we applied. The fact that the P100 was recovered instead of the C1 could indicate that only the timing of the P100 contributes to the RT (see Section 3 of Weindel et al., 2024)”.

      And we more generally address the question of overlap in the new section “Generalization and limitation”.

      Reviewer #3 (Recommendations for the authors):  

      Major Comments:  

      (1) If we agree on one thing, it is that motor processes contribute to response time. Line 364: "In the case of decision-making, these discrete neural events are visual encoding, attention-orientation, and decision commitment, and their latency make up the reaction time." Does the third event, "decision commitment", capture both central parietal positivity (decision deliberation) and motor components? If so, how can the authors attribute the effects to decision deliberation as opposed to motor preparation?  

      Thanks to the suggestions also in the public part. This main problem is now addressed as we do capture both a motor component and a decision commitment.

      Line 351 suggests that the third event may contain two components.  

      This was indeed our initial, badly written, hypothesis. Nevertheless the new solution again addresses this problem.

      The time series in Figure 6 shows an additional peak that is not evident in the simulated ramp of Appendix 1.  

      This was probably due to the overlap of both the CPP and the LRP. It is now much clearer that the CPP looks mostly like a ramp while the LRP looks much more like a burst-like/peaked activity. We make this clear in the “Decision event” paragraph of the discussion section:

      “Regarding the build-up of this component, the CPP is seen as originating from single-trial ramping EEG activities but other work (Latimer et al., 2015; Zoltowski et al., 2019) have found support for a discrete event at the trial-level. The ERPs on the trial-by-trial centered event in Figure 5 show support for both accounts. As outlined above, the LRP is indeed a short burst-like activity but the build-up of the CPP between high vs low contrast diverges much earlier than its peak”.

      Previous analyses (Weindel et al., 2024) found motor-related activity from central parietal topographies close to the response by comparing the difference in single-trial events on left- vs right-hand response trials. The authors suggest at line 315 that the use of only the right hand for responding prevented them from identifying a motor event.  

      The use of only the right hand should have made the event more identifiable because the topography would be consistent across trials (rather than inverting on left vs right hand response trials).  

      The reviewer is correct, in the original manuscript we didn’t test for lateralization, but the comment of the reviewer gave us the idea to explicitly test for the asymmetry (Figure 3). This test now clearly shows what would be expected for a motor event with a strong negativity over the left motor cortex.

      The authors state on line 422 that the EEG data were truncated at the time of the response.  

      Could this have prevented the authors from identifying a motor event that might overlap with the timing of the response?  

      We thank the reviewer for this suggestion. This would have been a possibility but the problem is that adding samples after the response also adds the post-response processes (error monitoring, button release, stimulus disappearance, etc.). While increasing the samples after the response is definitely something that we need to inspect, we think that the separation we achieved in this revision doesn’t call for this supplementary analysis.

      The largest effects of contrast on the third event amplitude appear around the peak as opposed to the ramp. If the peak is caused by the motor component, how does this affect the conclusions that this third event shows a decision-deliberation parietal processes as opposed to a motor process (a number of studies suggest a causal role for motor processes in decision-making e.g. Purcell et al., 2010 Psych Rev; Jun et al., 2021 Nat Neuro; Donner et al., 2009 Curr Bio).  

      This result now changed and it does look like the peak capturing most of the effect is no longer true. We do however think that there might be some link to theories of motor-related accumulation. We therefore added this to the discussion in the Motor execution section:

      “Based on all these observations, it is therefore very likely that this LRP event signs the first passage of a two-step decision process as suggested by recent decision-making models (Servant et al., 2021; Verdonck et al., 2021; Balsdon et al., 2023)”.

      I would suggest further investigation into the motor component (perhaps by extending the time window of analysed EEG to a few hundred ms after the response) and at least some discussion of the potential contribution of motor processes, in relation to the previous literature.  

      We believe that the absence of a motor component is sufficiently addressed in the revised manuscript and in the responses to the other comments.    

      (2) What do we learn from this work? Readers would appreciate more attention to previous findings and a clearer outline of how this work differs. Two points stand out, outlined below. I believe the authors can address these potential complaints in the introduction and discussion, and perhaps provide some clarification in the presentation of the results.  

      In the introduction, the authors state that "... to date, no study has been able to provide single-trial evidence of multiple EEG components involved in decision-making..." (line 64). Many readers would disagree with this. For example, Philiastides, Ratcliff, & Sadja (2006) use a single-trial analysis to unravel early and late EEG components relating to decision difficulty and accuracy (across different perceptual decisions), which could be related to the components in the current work. Other, network-based single-trial EEG analyses (e.g., Si et al., 2020, NeuroImage, Sturn et al., 2016 J Neurosci Methods) could also be related to the current component approach. Yet other approaches have used inverse encoding models to examine EEG components related to separable decision processes within trials (e.g., Salvador et al., 2022, Nat Comms). The results of the current work are consistent with this previous work - the two components from Philiastides et al., 2006 can be mapped onto the components in the current work, and Salvador et al., 2022 also uncover stimulus- and decision-deliberation related components.  

      We completely agree with the reviewer that the link to previous work was insufficient. We now include all references that the reviewer points out both in the introduction (see response R3.2) and in the discussion (see response R3.4). We wish to thank the reviewer for bringing these papers to our attention as they are important for the manuscript.

      The authors relate their components to ERPs. This prompts the question of whether we would get the same results with ERP analyses (and, on the whole, the results of the current work are consistent with conclusions based on ERP analyses, with the exception of the missing motor component). It's nice that this analysis is single-trial, but many of the follow-up analyses are based on grouping by condition anyway. Even the single-trial analysis presented in Figure 4 could be obtained by median splits (given the hypotheses propose opposite directions of effects, except for the linear model). 

      We do not agree with the reviewer in the sense that classical ERP analyses would require much more data-points. The performance of the method is here to use the information shared across all contrast levels to be able to model the processing time of a single contrast level (6 trials per participant). Furthermore, as stated in the response to R1.4 and R1.5, the aim of the paper is to have the time of information processing components which cannot be achieved with classical ERPs without strong, and likely false, assumptions.

      Medium Comments:  

      (1) The presentation of Piéron's law for the behavioural analysis is confusing. First, both laws should be clearly defined for readers who may be unfamiliar with this work. I found the proposal that Piéron's law predicts decreasing RT for increasing pedestal contrast in a contrast discrimination paradigm task surprising, especially given the last author's previous work. For example, Donkin and van Maanen (2014) write "However, the commonality ofPiéron's Law across so many paradigms has lead researchers (e.g., Stafford & Gurney, 2004; Van Maanen et al., 2012) to propose that Piéron's Law is unrelated to stimulus scaling, but is a result of the architecture of the response selection (or decision making) process." The pedestal contrast is unrelated to the difficulty of the contrast discrimination task (except for the consideration of Fechner's law). Instead, Piéron's law would apply to the subjective difference in contrast in this task, as opposed to the pedestal contrast. The EEG results are consistent with these intuitions about Piéron's law (or more generally, that contrast is accumulated over time, so a later EEG component for lower pedestal contrast makes sense): pedestal contrast should lead to faster detection, but not necessarily faster discrimination. Perhaps, given the complexity of the manuscript as a whole, the predictions for the behavioural results could be simplified?  

      We agree that the initial version was confusing. We now clarified the presentation of Piéron's law at the end of the introduction (see also response to R2).

      Once Fechner's law is applied, decision difficulty increases with increasing contrast, so Piéron's law on the decision-relevant intensity (perceived difference in contrast) would also predict increasing RT with increasing pedestal contrast. It is unlikely that the data are of sufficient resolution to distinguish a log function from a power of a log function, but perhaps the claim on line 189 could be weakened (the EEG results demonstrate Piéron's law for detection, but do not provide evidence against Piéron's law in discrimination decisions).  

      This is an excellent observation, thank you for bringing it to our attention. Indeed, the data support the notion that Pieron’s law is related to detection, but do not rule out that it is also related to decision or discrimination. In earlier work, we (Donkin & Van Maanen, 2014) addressed this question as well, and reached a similar conclusion. After fitting evidence accumulation models to data, we found no linear relationship between drift rates and stimulus difficulty, as would have been the case if Pieron's law could be fully explained by the decision process (as -indirectly- argued by Stafford & Gurney, 2004; Van Maanen et al., 2012). The fact that we observed evidence for a non-linear relationship between drift rates and stimulus difficulty led us to the same conclusion, that Pieron’s law could be reflected in both discrimination and decision processes. We added the following comment to the discussion about the functional locus of Pieron's law to clarify this point:

      “Note that this observation is not necessarily incongruent with theoretical work that argued that Piéron’s law could also be a result of a response selection mechanism (Stafford and Gurney, 2004; Van Maanen et al., 2012; Palmer et al., 2005). It could be that differences in stimulus intensity between the two options also contribute to a Piéron like relationship in the later intervals, that is convoluted with Fechner’s law (see Donkin and Van Maanen, 2014, for a similar argument). Unfortunately, our data do not allow us to discriminate between a pure logarithmic growth function and one that is mediated by a decreasing power function”.

      (2) Appendix 1 shows that the event detection of the HMP method will also pick up on ramping activity. The description of the problem in the introduction is that event-like activity could look like ramping when averaged across trials. To address this problem, the authors should simulate events (with some reasonable dispersion in timing such that they look like ramping when averaged) and show that the HMP method would not pull out something that looked like ramping. In other words, the evidence for ramping in this work is not affected by the previously identified confounds.  

      We agree that this demonstration was necessary and thus added the suggested simulation to Appendix 1. As can be seen in the Figure 1 of the appendix, when we simulate a half-sine the average ERP based on the timing of the event looks like a half-sine.

      (3) Some readers may be interested in a fuller discussion of the failure of the Fechner diffusion model in the speed condition.  

      We are unsure which failure the reviewer refers to but assumed it was in relation to the behavioral results and thus added: 

      It is unlikely that neither Piéron nor Fechner law impact the RT in the speed condition. Instead this result is likely due to the composite nature of the RT where both laws co-exist in the RT but cancel each other out due to their opposite prediction.

      Minor Comments:  

      (1) "By-trial" is used throughout. Normally, it is "trial-by-trial" or "single-trial" or "trial-wise".

      We replaced all occurrences of “by-trial”  with the three terms suggested were appropriate.

      (2) Line 22: "The sum of the times required for the completion of each of these precessing steps is the reaction time (RT)." The total time required. Processing.  

      Corrected for both.

      (3) Line 26/27: "Despite being an almost two century old problem (von Helmholtz, 2021)." Perhaps the citation with the original year would make this point clearer.  

      We agree and replaced the citation.

      (4) Line 73: "accounted by estimating". Accounted for by estimating.  

      Corrected.

      (5) Line 77 "provides an estimation on the." Of the.  

      Corrected.

      (6) Line 86: "The task of the participants was to answer which of two sinusoidal gratings." The picture looks like Gabor's? Is there a 2d Gaussian filter on top of the grating? Clarify in the methods, too.  

      We incorrectly described the stimuli as those were indeed just Gabor’s. This is now corrected both in the main text and the method section.

      (7) Figure 1 legend: "The Fechner diffusion law" Fechner's law or your Fechner diffusion model?  

      Law was incorrect so we changed to model as suggested.

      (8) Line 115: "further allows to connects the..." Allows connecting the.  

      Corrected.

      (9) Line 123: "lower than 100 ms or higher than..." Faster/slower.  

      Corrected.

      (10) Line 131: "To test what law." Which law.?  

      Corrected to model.

      (11) Figure 2 legend: "Left: Mean RT (dot) and average fit (line) over trials and participants for each contrast level used." The fit is over trials and participants? Each dot is? Average trials for each contrast level in each participant?  

      This sentence was corrected to “Mean RT (dot) for each contrast level and averaged predictions of the individual fits (line) with Accuracy (Top) and Speed (Bottom) instructions.”.

      (12) Line 231: "A comprehensive analysis of contrast effect on". The effect of contrast on.  

      This title was changed to “functional interpretation of the events”.

      (13) Line 23: "the three HMP event with". Three HMP events.

      The sentence no longer exists in the revised manuscript.

      (14) Line 270: "Secondly, we computed the Pearson correlation coefficient between the contrast averaged proportion of correct." Pearson is for continuous variables. Proportion correct is not continuous. Use Spearman, Kendall, or compute d'.  

      The reviewer rightly pointed out our error, we corrected this by computing Spearman correlation.

      (15)  Line 377: "trial 𝑛 + 1 was randomly sampled from a uniform distribution between 0.5 and 1.25 seconds." It's just confusing why post-response activity in Figure 5 does look so consistent. Throughout methods: "model was fitted" should be "was fit", and line 448, "were split".  

      We do not have a specific hypothesis of why the post-response activity in the previous Figure 5 was so consistent. Maybe the Gaussian window (same as in other manuscripts with a similar figure, e.g. O’Connell et al. 2012) generated this consistency. We also corrected the errors mentioned in the methods.

      (16) The linear mixed models paragraph is a bit confusing. Can it clearly state which data/ table is being referred to and then explain the model? "The general linear mixed model on proportion of correct responses was performed using a logit link. The linear mixed models were performed on the raw milliseconds scale for the interval durations and on the standardized values for the electrode match." We go directly from proportion correct to raw milliseconds...  

      The confusion was indeed due to the initial inclusion of a general linear mixed model on proportion correct which was removed as it was not very informative. The new revision should be clearer on the linear mixed models (see first sentence of subsection ‘linear mixed models' in the method section).

      (17) A fuller description of the HMP model would be appreciated.  

      We agree that this was necessary and added the description of the HMP model in the corresponding method section “Hidden multivariate pattern” in addition to a more comprehensive presentation of HMP in the first paragraph of the Result and Discussion sections.

      (18) Line 458: "Fechner's law (Fechner, 1860) states that the perceived difference (𝑝) between the two patches follows the logarithm of the difference in physical intensity between..." ratio of physical intensity.  

      Corrected.

      (19) P is defined in equations 2 and 4. I would include the beta in equation 4, like in equation 2, then remove the beta from equations 3 and 5 (makes it more readable). I would also just include the delta in equation 2, state that in this case, c1 = c+delta/2 or whatever.  

      This indeed makes the equation more readable so we applied the suggestions for equations 2, 3, 4 and 5. The delta was not added in equation 2 but instead in the text that follows:

      “Where 𝐶1 = 𝐶0 + 𝛿, again with a modality and individual specific adjustment slope (𝛽).” 

      (20) The appendix suggests comparing the amplitudes with those in Figure 3, but the colour bar legend is missing, so the reader can only assume the same scale is used?  

      We added the color bar as it was indeed missing. Note though that the previous version displayed the estimation for the simulated data while this plot in the revised manuscript shows the solution on real data obtained after downsampling the data (and therefore look for a larger pattern as in the main text). We believe that this representation is more useful given that the solution for the downsampled data is no longer the same as the one in the main text (due to the difference in pattern width).

    1. Santé Mentale et Addictions : De l'Intime au Populationnel

      Résumé Exécutif

      Ce document de synthèse analyse les thèmes centraux de la leçon inaugurale de Maria Melchior, épidémiologiste et titulaire de la chaire Santé Publique 2025-2026 au Collège de France.

      La santé mentale, désignée grande cause nationale pour 2025 et 2026, est présentée comme un défi majeur qui nécessite une double approche : une compréhension empathique de la souffrance intime et une analyse rigoureuse des dynamiques populationnelles.

      L'épidémiologie offre un regard distancié mais essentiel pour quantifier l'ampleur du phénomène, identifier les facteurs de risque et éclairer les politiques publiques.

      Les données révèlent une prévalence élevée en France : un adulte sur dix souffre de dépression ou d'anxiété, et une part significative de la population, y compris les jeunes, est touchée par des conduites addictives (tabac, alcool, cannabis, mais aussi jeux et internet).

      Un constat central est celui des inégalités sociales "massives" qui se manifestent dès l'enfance, creusant un fossé entre les populations défavorisées, plus à risque et ayant moins accès aux soins, et les plus privilégiées.

      L'étude de la santé mentale se heurte à des défis de taille, notamment une forte stigmatisation persistante dans la société et des difficultés métrologiques dues à l'absence de marqueurs biologiques objectifs.

      La stratégie de santé publique la plus efficace, selon le "paradoxe de la prévention" de Geoffrey Rose, ne consiste pas uniquement à cibler les individus les plus à risque, mais à améliorer la santé mentale de l'ensemble de la population en agissant sur les déterminants sociaux.

      Le concept d' "universalisme proportionné" affine cette approche en combinant des actions universelles avec un soutien renforcé pour les groupes les plus vulnérables.

      En conclusion, l'amélioration de la santé mentale collective passe par des interventions qui dépassent le système de soins pour s'attaquer aux racines du mal-être : l'isolement, les inégalités sociales, et les conditions de vie et de travail.

      --------------------------------------------------------------------------------

      1. Le Double Regard sur la Santé Mentale : Intime et Populationnel

      L'analyse de la santé mentale exige une articulation constante entre la souffrance individuelle et les dynamiques collectives. L'épidémiologie, bien que centrée sur l'étude des populations, ne peut ignorer la dimension subjective et intime du mal-être psychique.

      L'Impératif de l'Empathie : L'Intime Derrière les Chiffres

      Maria Melchior insiste sur la nécessité de ne jamais oublier que "derrière les concepts, les théories et les chiffres, il y a de vraies personnes et des histoires singulières".

      Cette prise de conscience, issue d'une expérience personnelle durant ses études de psychologie, souligne que toute démarche de recherche sur la santé mentale doit conserver une forme d'empathie et s'interroger sur le vécu des personnes concernées.

      S'intéresser à la santé mentale, même à grande échelle, requiert d'imaginer une personne réelle et ce qui se passe en elle.

      L'Approche Épidémiologique : Monter en Généralité

      L'épidémiologie se distingue par sa démarche observationnelle et intégrative.

      Elle ne se limite pas aux mécanismes biologiques, mais englobe une large gamme de facteurs de risque : psychologiques, médicaux, comportementaux, sociaux et économiques.

      Objectif : Identifier les facteurs qui augmentent ou diminuent le risque de troubles psychiques et d'addictions à l'échelle d'une population.

      Méthode : Mettre en place des enquêtes de grande ampleur pour dégager des tendances concernant les variations de risque dans le temps, l'espace et entre les sous-groupes.

      Finalité : Passer de situations particulières à des points communs pour "monter en généralité" et identifier les forces qui régissent les comportements humains. Les chiffres produits peuvent ainsi éclairer les politiques publiques et, en retour, aider à mieux saisir des situations individuelles.

      2. Panorama de la Santé Mentale et des Addictions en France

      Les grandes enquêtes épidémiologiques menées en France, notamment par Santé publique France et l'Observatoire français des drogues et des tendances addictives (OFDT), permettent de dresser un tableau précis de la prévalence des troubles psychiques et des addictions.

      Population Cible

      Trouble / Addiction

      Statistique Clé et Source

      Adultes

      Épisode dépressif caractérisé

      1 personne sur 10 (Baromètre SPF, 2021)

      États anxieux

      1 personne sur 10 (Baromètre SPF, 2021)

      Consommation d'alcool à risque

      Plus d'1 personne sur 5

      Consommation de cannabis (année)

      1 personne sur 10

      Tabagisme quotidien

      1 personne sur 4 (taux en baisse)

      Toute population

      Addiction comportementale (jeux d'argent)

      1 personne sur 10 a un comportement problématique (OFDT, 2023)

      Adolescents

      Risque de dépression (modéré à sévère)

      14 % des collégiens, 15 % des lycéens

      (17 ans)

      Usage excessif des réseaux sociaux

      1 jeune sur 5 (ESCAPADE, 2017)

      (17 ans)

      Jeux d'argent et de hasard (année)

      1/3 des jeunes de 17 ans, bien qu'interdit aux mineurs (ESCAPADE)

      Enfants

      Trouble probable de la santé mentale

      13 % des enfants (Étude Enabee, 2002)

      Les addictions comportementales, notamment liées à l'usage d'internet (réseaux sociaux, jeux vidéo) et aux jeux d'argent en ligne, sont un phénomène en hausse, particulièrement chez les jeunes.

      3. Facteurs de Risque et Inégalités Sociales Massives

      L'épidémiologie permet d'identifier des groupes plus vulnérables et des facteurs de risque spécifiques.

      Différences de genre : Les filles et les femmes présentent des niveaux plus élevés de dépression et d'anxiété, tandis que les garçons et les hommes sont plus touchés par les troubles du comportement, l'hyperactivité/inattention et les conduites addictives.

      Inégalités sociales : Qualifiées de "massives", elles apparaissent dès l'enfance et se creusent avec le temps. Les enfants issus des familles et des quartiers les plus défavorisés ont les risques les plus élevés tout en ayant l'accès aux soins le plus faible.

      Un rapport de la Cour des comptes de 2023 illustre cette disparité : le recours aux soins en pédopsychiatrie est deux fois plus élevé à Paris qu'en Seine-Saint-Denis.

      Facteurs environnementaux : De nouvelles recherches explorent l'impact de facteurs comme l'absence d'espaces verts ou l'exposition aux nuisances sonores sur la santé mentale.

      4. Les Défis de l'Étude de la Santé Mentale

      Étudier la santé mentale présente des obstacles uniques, tant sur le plan social qu'éthique et méthodologique.

      La Stigmatisation et la Peur

      Les troubles psychiques continuent de faire peur et d'être associés à des représentations négatives.

      Dangerosité perçue : 74 % des personnes interrogées en 2014 estimaient que les "malades mentaux" sont dangereux.

      Discrimination : Dans un sondage de 2023, 80 % des personnes estiment qu'avoir un trouble psychique réduit les opportunités de trouver un emploi ou un logement, et 63 % pensent que les personnes concernées sont moins bien traitées dans le système éducatif ou au travail.

      Les Enjeux Éthiques de la Recherche

      La nature intime de la santé mentale suscite des questionnements éthiques fréquents dans la recherche.

      La crainte principale est que poser des questions sur la souffrance psychique, et notamment sur les pensées suicidaires, pourrait inciter à un passage à l'acte.

      Cependant, la science invalide cette crainte :

      "De méta-analyses [...] montrent qu'interroger des personnes [...] sur leurs pensées ou sur leurs intentions suicidaires non seulement n'entraîne pas de passage à l'acte mais n'est pas non plus perçu de manière négative et pourrait même parfois être associé à une légère diminution des comportements suicidaires."

      L'Exemple de la Cohorte Tempo

      L'étude de cohorte Tempo, qui suit plus de 1000 personnes depuis l'enfance jusqu'à l'âge adulte, illustre la faisabilité et la richesse de la recherche longitudinale en santé mentale.

      Originalité : C'est l'une des rares études au monde à disposer de données sur trois générations (les participants, leurs parents via la cohorte Gazel, et bientôt leurs propres enfants), permettant d'étudier la transmission intergénérationnelle.

      Résultats clés :

      ◦ Le trouble de l'hyperactivité/inattention (TDAH) de l'enfance persiste sur près de 30 ans et est associé à des conduites addictives, des difficultés scolaires et un risque de chômage accru.   

      ◦ La consommation de cannabis à l'adolescence a des effets délétères sur le parcours scolaire et professionnel 20 ans plus tard.   

      ◦ La consommation ponctuelle importante d'alcool à l'adolescence prédit un trouble de l'usage à l'âge adulte dans 25 % des cas.

      5. La Mesure en Santé Mentale : De la Subjectivité à la Catégorisation

      L'un des plus grands défis de l'épidémiologie psychiatrique est la mesure des troubles.

      L'Absence de "Gold Standard" Biologique

      Contrairement à de nombreuses maladies, il n'existe pas de test biologique (sanguin, cérébral) pour diagnostiquer un trouble psychique.

      L'évaluation repose entièrement sur la parole et le comportement rapportés par les personnes, ce qui introduit une part d'incertitude.

      L'Évolution des Classifications (DSM/CIM)

      Pour standardiser l'évaluation, des classifications ont été développées.

      Historique : Les premières nosographies (Pinel, Kraepelin) se concentraient sur les pathologies les plus sévères observées en asile.

      Le tournant du DSM : La nécessité d'évaluer les conscrits américains lors des guerres mondiales a accéléré le développement de manuels standardisés.

      Une révolution a eu lieu dans les années 1970 sous l'égide de Robert Spitzer : le Diagnostic and Statistical Manual (DSM) est passé d'une approche basée sur les causes psychanalytiques (difficiles à observer) à une définition basée sur des symptômes observables et leurs répercussions sur la vie des personnes.

      Conséquence : Cette approche a rendu possible la création de questionnaires standardisés, pierre angulaire de l'épidémiologie psychiatrique moderne.

      Définir le "Normal" et le "Pathologique"

      Selon la réflexion du philosophe Georges Canguilhem, un état n'est pas pathologique simplement parce qu'il est statistiquement rare ou jugé négativement par la société (l'exemple de l'homosexualité, autrefois listée comme un trouble mental, en est une illustration frappante).

      La définition moderne d'un état pathologique se centre sur la souffrance psychique exprimée par la personne et l'impact négatif des symptômes sur sa vie.

      6. La Perspective de Santé Publique : Stratégies et Paradoxes

      La santé publique considère que les caractéristiques d'une population influencent en retour la santé de chaque individu qui la compose.

      Le Paradoxe de la Prévention et l'Universalisme Proportionné

      Le Paradoxe de Geoffrey Rose : Les maladies et leurs facteurs de risque se distribuent sur un continuum dans la population.

      Par conséquent, la stratégie de prévention la plus efficace ne consiste pas à cibler uniquement les quelques individus à très haut risque, mais à décaler légèrement la distribution de l'ensemble de la population.

      Autrement dit, une petite amélioration de la santé mentale de tous a un impact collectif plus grand qu'une grande amélioration pour quelques-uns.

      L'Universalisme Proportionné de Michael Marmot : Cette approche moderne combine la vision populationnelle de Rose avec une attention particulière pour les plus vulnérables.

      Il s'agit de mettre en place des actions universelles bénéfiques à tous, tout en modulant l'intensité de l'aide en fonction des besoins. Le programme Improva de promotion de la santé mentale dans les collèges en est un exemple.

      L'Importance des Symptômes "Intermédiaires"

      Le fardeau sociétal le plus lourd n'est pas le fait des cas les plus sévères (qui sont peu nombreux), mais de la masse de personnes présentant des symptômes intermédiaires ou "infracliniques".

      Même sans correspondre à un diagnostic formel, ces symptômes causent de la souffrance et altèrent significativement la qualité de vie, la capacité à travailler ou à nouer des liens.

      7. Conclusion et Perspectives d'Action

      Pour améliorer la santé mentale de la population, il est impératif d'agir sur ses déterminants, qui se situent en grande partie en dehors du système de santé.

      Agir sur les déterminants sociaux : Suivant les travaux d'Émile Durkheim sur l'isolement et de Lisa Berkman sur les réseaux sociaux, il est crucial d'améliorer la densité et la qualité des liens relationnels.

      Cela passe par une action sur leurs causes profondes : les inégalités sociales, les conditions de travail, l'accès au logement et les politiques de protection des familles.

      La Grande Cause Nationale 2025-2026 : Cet engagement politique vise à améliorer les perceptions collectives des troubles psychiques pour faciliter l'accès aux soins et réduire la stigmatisation.

      Améliorer la littératie en santé mentale : La diffusion à grande échelle des connaissances issues de la recherche épidémiologique est fondamentale pour que chacun puisse mieux reconnaître les signes de mal-être (chez soi ou chez les autres) et accepter les personnes qui souffrent.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful to the reviewers for their thoughtful and constructive evaluations of our manuscript. Their comments helped us clarify key aspects of the study and strengthen both the presentation and interpretation of our findings. The central goal of this work is to dissect how the opposing activities of GATA4 and CTCF coordinate chromatin topology and transcriptional timing during human cardiomyogenesis. The reviewers’ feedback has allowed us to refine this message and better contextualize our results within the broader framework of chromatin regulation and cardiac development.

      In response to the reviews, in our preliminary revision we have already implemented substantial improvements to the manuscript, including additional analyses, clearer data visualization, and revisions to the text to avoid overinterpretation. These refinements enhance the robustness of our conclusions without altering the overall scope of the study. A small number of additional analyses and experiments are ongoing and will be added to the full revision, as detailed below.

      We believe that the revised manuscript, together with the planned updates, fully addresses the reviewers’ concerns and substantially strengthens the contribution of this work to the field.

      Reviewer 1 – Point 1:

      In the datasets you are examining, what are the relative percentages in each of the four groups relating compartmentalization change to expression change (A→B, expression up; A→B, down; B→A, up; B→A, down)?

      We quantified compartment–expression relationships using Hi-C and bulk RNA-seq from H9 ESCs and CMs. The percentages for each category are shown below and incorporated into updated Figure S2H.

      Group

      Downregulated in CM

      Upregulated in CM

      A-to-A

      11.92%

      8.44%

      A-to-B

      18.20%

      2.79%

      B-to-A

      7.96%

      18.07%

      B-to-B

      14.36%

      6.44%

      A chi-squared test comparing observed vs. expected distributions (based on gene density across bins) confirmed a strong association between compartment dynamics and transcriptional behavior. B-to-A genes are significantly enriched among genes upregulated in CMs, while A-to-B genes are enriched among those downregulated (updated Figure S2H).

      We next assessed with GSEA how these gene classes respond to GATA4 and CTCF knockdown. In 2D CMs, GATA4 knockdown reduces expression of CM-upregulated B-to-A genes and increases expression of CM-downregulated A-to-B genes, whereas CTCF knockdown produces the opposite pattern (updated Figure 2F).

      Applying the same analysis to cardioid bulk RNA-seq (updated Figure 4E) revealed the strongest effects in SHF-RV organoids, consistent with monolayer data. In SHF-A organoids, only GATA4 knockdown had a measurable impact on CM-upregulated B-to-A and CM-downregulated A-to-B genes. Because the subsets of CM-downregulated B-to-A and CM-upregulated A-to-B genes were very small and showed no consistent trends, Figure 4 focuses on the two informative categories only. The full classification is provided in Reviewer Figure 1 below.

      (The figure cannot be rendered in this text-only format)

      Reviewer Figure 1. GSEA for CM-upregulated B-to-A and CM-downregulated A-to-B genes. p-values by Adaptive Monte-Carlo Permutation test.

      Reviewer 1 – Point 2

      This phrase in the abstract is imprecise: ‘whereas premature CTCF depletion accelerates yet confounds cardiomyocyte maturation.’


      The abstract has been revised to: “whereas premature CTCF depletion accelerates yet alters cardiomyocyte maturation.” (lines 29-30).

      Reviewer 1 – Point 3

      Regarding this statement: "Disruption of [3D chromatin architecture] has been linked to genetic dilated cardiomyopathy (DCM) caused by lamin A/C mutations8,9, and mutations in chromatin regulators are strongly enriched in de novo congenital heart defects (CHD)10, underscoring their pathogenic relevance11." The first studies to implicate chromatin structural changes in heart disease, including the role of CTCF in that process, were PMID: 28802249, a model of acquired, rather than genetic, disease.

      We added the following sentence to the paragraph introducing CTCF: “Moreover, depletion of CTCF in the adult cardiomyocytes leads to heart failure28,29.” (line 72)

      Reviewer 1 – Point 4

      Can you quantify this statement: ‘the compartment switch coincided with progressive reduction of promoter–gene body interactions’?

      We quantified promoter–gene body contacts by calculating the area under the curve (AUC) of the virtual 4C signal derived from H9 Hi-C data across differentiation. As a result of this analysis we added the following sentence: “Quantitatively, interactions between the TTN promoter and its gene body decreased by ~55% from the pluripotent stage to day 80 cardiomyocytes.” (lines 89-91).


      Reviewer 1 – Point 5

      Regarding this statement: "six regions became less accessible in CMs, correlating with ChIP-seq signal for the ubiquitous architectural protein CTCF." I don't see 6 ATAC peaks in either TTN trace in Figure 1A.

      We corrected the text as it follows: “TTN experienced clear changes in chromatin accessibility during CM differentiation: ATAC-seq identified two CM-specific peaks that correlated with ChIP-seq signal for the cardiac pioneer TF GATA4 at the two promoters, one driving full length titin and the other the shorter cronos isoform. In contrast, two regions became less accessible in CMs, correlating with two of the six ChIP-seq peaks for the ubiquitous architectural protein CTCF” (lines 93-97). We attribute the differences between ChIP-seq and ATAC-seq profiles to methodological sensitivity and/or biological variability between datasets generated in different laboratories and cell batches.

      Reviewer 1 – Point 6

      Western blots need molecular weight markers.

      We edited the relevant panels accordingly (updated Figures 1E and 2B).

      Reviewer 1 – Point 7

      Regarding this statement: "The decrease in CTCF protein levels may explain its selective detachment from TTN during cardiomyogenesis." At face value, these findings suggest the opposite: i.e. that a massive downregulation of CTCF at protein level should affect its binding across the genome, which is not tested and is hard to evaluate between ChIP-seq studies from different groups and from different developmental timeframes.

      We revised the text to avoid implying selective detachment and performed a genome-wide analysis of CTCF occupancy using ENCODE ChIP-seq datasets generated by the same laboratory with matched protocols in hESCs and hESC-derived CMs. This analysis shows that 43.2% of CTCF sites present in ESCs are lost in CMs, whereas only 5.7% are gained, confirming a broad reduction in CTCF binding during differentiation. These results are now included in__ updated Figure 1B__.

      Reviewer 1 – Point 8a

      A couple thoughts on the FISH experiments in Figure 2. A claim of 'impaired B-A transition' would be more convincing if you show, by FISH, that the relative distance of TTN from lamin B increases with differentiation.

      Although prior work from us and others has established that TTN transitions from the nuclear periphery in hESCs to a more internal position during cardiomyogenesis (Poleshko et al. 2017; Bertero et al. 2019a), we are reproducing this trajectory in WTC11 hiPSCs as part of the FISH experiments for the full revision.

      __Reviewer 1 – Point 8b __

      In the [FISH] images: are you showing a total projection of all z planes? One assumes the quantitation is relative to a 3D reconstruction in which the lamin B signal is restricted to the periphery. Have you shown this? __

      Quantification was performed on full 3D reconstructions from Z-stacks, as detailed in the Methods (lines 721-727). While the original submission displayed maximum-intensity projections, updated Figure 2D and Figure S2E now show representative single optical sections, which more clearly highlight the spatial relationship between the TTN locus and the nuclear lamina.

      Reviewer 1 – Point 8c

      Lastly, these data are very interesting and important, provoking reexamination of your interpretation of the results in Figure 1. Figure 1 was interpreted to show that less CTCF binding led to decreased lamina (and thus B compartment) association during development. Figure 2 shows that depleting CTCF does not change association of TTN with lamina.

      Our interpretation is that by day 25 of hiPSC-CM differentiation the TTN locus may have reached its maximal radial repositioning even in control cells, limiting the ability to detect earlier effects of CTCF depletion. To test whether CTCF knockdown accelerates lamina detachment at earlier stages, we are repeating the FISH analysis for the inducible CTCF knockdown line at multiple time points during differentiation.

      Reviewer 1 – Point 9

      A thought about this statement: "Altogether, these results suggest that GATA4 and CTCF function as positive and negative regulators of B-to-A compartment switching, likely acting through global and local chromatin remodeling, respectively." GATA4 induces TTN expression and its knockdown prevents TTN expression-the evidence that GATA4 affects compartmentalization is unclear. By activating the gene, GATA4 may shift TTN to B classification.

      Our current data do not allow us to disentangle whether GATA4-driven transcriptional activation precedes or follows the B-to-A compartment shift. We have therefore removed the mechanistic speculation from this sentence to avoid overinterpretation. Nevertheless, the analyses in updated Figure 2F, discussed in the response to Reviewer 1 - Point 1, show that GATA4 knockdown preferentially reduces expression of CM-upregulated B-to-A genes, while CTCF knockdown has the opposite effect, supporting the conclusion that both factors influence the transcriptional programs associated with B-to-A transitions.

      Reviewer 1 – Point 10

      __I'm not sure what I am looking at in Figure 3C. Are those traces integration of interactions over a defined window? "Each [mutant is] clearly different from WT" is not obvious from the presentation. The histograms are plotting AUC of what? Interactions of those peaks with the mutated region? I genuinely appreciate how laborious this experiment must have been and encourage you to explain better what you are showing. __

      We revised the main text to avoid overstating the differences (“clearly” “in a similar manner”, line 192) and expanded the l__egends of updated Figures 3C–D__ to clarify what is being shown: “(C) 4C-seq in hiPSCs using the promoter-proximal region of TTN as viewpoint. The top panel shows raw interaction profiles. The lower panels plot pairwise differences between conditions to reveal subtle changes. A schematic indicating the 4C viewpoint is included for clarity. Right inset: zoom of the CBS4–5 region. Mean of n = 3 cultures. (D) AUC of the differential 4C-seq signal for defined intervals (panel C). p-values by one-sample t-test against μ = 0.”. We also added a visual cue in updated Figure 3C indicating the 4C viewpoint to facilitate interpretation.

      Reviewer 1 – Point 11

      Again acknowledging how challenging these experiments are: when you mutant a locus, you change CTCF binding but you also change the DNA. Thus, attributing the changes in interactions to presence/absence of CTCF binding is difficult, because the DNA substrate itself has changed. Perhaps you are presenting all of this as a negative result, given the modest effect on transcription, which is as important as a positive result, given the assumptions usually made about such things. But the results are not clearly described and your interpretation seems to go between implying the structural change causative and being agnostic.

      We recognize that deleting a genomic region can affect both CTCF binding and the DNA substrate itself. For this reason, we implemented two parallel genome-editing strategies:

      (1) a straightforward Cas9-mediated deletion of ~100 bp centered on each CBS, and

      (2) a more precise HDR approach replacing only the 20 bp core CTCF motif.

      Because the HDR strategy succeeded, all downstream analyses were carried out on these minimal edits, which substantially limit disruption of other transcription factor motifs and reduce the likelihood of sequence-dependent polymer effects unrelated to CTCF.

      Nevertheless, to avoid implying unwarranted causality in the absence of more conclusive evidence, we added a paragraph to the Discussion outlining these limitations, including the sentence: “Our study also reflects general challenges in separating chromatin-architectural and transcriptional mechanisms. Although the CBS edits were restricted to the core CTCF motifs, additional sequence-dependent effects cannot be fully excluded, and we therefore interpret the resulting changes as consistent with—but not exclusively due to—loss of CTCF binding.” (lines 365-368)

      Reviewer 1 - Point 12.

      Figure 4C: since you have RNA-seq data, a much more objective way to present these data would be to show all data (again, A-B, up; A-B, down; B-A, up; B-A, down) and the effects of CTCF or GATA4. Regardless, you can still focus on the cardiac specific genes. But my guess is if you examine all genes, the pattern you show in panel C will not be present in the majority of cases. Furthermore, if this hypothesis is wrong, such an analysis will allow you to identify other genes affected by the mechanisms you describe and your analysis will test whether these mechanisms are in fact conserved at different loci.

      As outlined in our response to Point 1, we extended the analysis to all genes undergoing compartment changes and incorporated this into the cardioid RNA-seq dataset. This revealed a clear and consistent relationship between GATA4 or CTCF knockdown and the expression of B-to-A and A-to-B gene classes (updated Figure 4E).

      Reviewer 2 - Point 1.1

      1. CTCF regulation at TTN locus:

      (1) Figure 1A: The claim of the authors about convergent CTCF sites and transcriptional activation of TTN is quite simplistic. This claim is only valid when we know where cohesin is loaded. If cohesin is loaded at then intragenic GATA4 binding site, then the only important CTCF sites is at the promoter of TTN. I suggest that the authors read few more publications which may help the authors to better understand how cohesin and CTCF team up to regulate transcription, such as Hsieh et al., Nature Genetics, 2022; Liu et al., Nature Genetics, 2021; Rinzema et al., Nature Structural and Molecular Biology, 2022.

      __Suggestion: The authors should add cohesin (RAD21/SMC1A) and NIPBL ChIP-seq for better interpretation. __

      In line with the reviewer’s insightful suggestion, we integrated cohesin ChIP-seq data into updated Figure 1A. Specifically, we added a RAD21 ChIP-seq track from hESCs, which provides direct evidence of cohesin occupancy across the TTN locus. RAD21 binding closely parallels CTCF binding at five sites within the gene body, supporting a model in which promoter-proximal CTCF anchors cohesin to stabilize repressive loops at this locus. This analysis substantially strengthens the mechanistic framework and is consistent with the studies recommended by the reviewer, which we have now cited (lines 68 and 104).

      Reviewer 2 - Point 1.2. (2) Figure 3B: If delta2CBS only has heterozygenous deletion of CBS6, why we would expect the binding will be weaken to 50%. However, the CTCF binding is reduced to around 1/10 in the ChIP-qPCR. How do the authors explain this?

      Sequencing of the Δ2CBS line shows that one CBS6 allele carries the intended EcoRI replacement, while the second allele contains a 2-bp deletion within the core CTCF motif (Figure S3C). Remarkably, this small deletion is sufficient to abolish CTCF binding, resulting in complete loss of occupancy at CBS6 despite heterozygosity. We clarified this in the text as follows: “CTCF ChIP-qPCR in hiPSCs confirmed complete loss of CTCF binding at the targeted sites, including CBS6 in the Δ2CBS line, indicating that the 2-bp deletion sufficed to disrupt CTCF binding while occupancy at other CBSs remained unaffected.” (lines 187–189).

      Reviewer 2 - Point 1.3a (3) Figure 3C: There are two problems with the 4C experiments: (a) The changes are really mild. In fact, none of the p-values in Figure 3D are significant.

      The effect of deleting CBS1 is indeed modest, consistent with reports that individual CTCF binding sites often show functional redundancy (i.e., Rodríguez-Carballo et al. 2017; Barutcu et al. 2018; Kang et al. 2021). Nevertheless, our 4C-seq experiments have reproducibly shown the same directional trend across biological replicates. To increase statistical power and more rigorously assess the robustness of this effect, we are generating additional 4C replicates as part of the full revision.

      Reviewer 2 - Point 1.3b [In the 4C experiments] (b) The authors should also consider a model that CTCF directly serves as a repressor. In this way, 3D genome may not be involved. B-A switch is simply caused by the activation of the locus.

      We now explicitly acknowledge this possibility in the Discussion. The revised text states: “Moreover, our data cannot unambiguously separate CTCF’s architectural role from potential direct repressive activity. Both mechanisms could contribute to the observed effects, and our findings likely reflect the combined influence of CTCF on chromatin topology and gene regulation.” (lines 368–371).

      Reviewer 2 - Point 2.1a 2. __(CTCF) detachment: The authors mentioned few times "detachment". In the context of this manuscript, the authors indicate detachment from nuclear lamina. However, the authors haven't provide convincing evidence about this. __

      In the two instances where we used the term “detachment,” we intended it to refer exclusively to reduced CTCF binding to DNA, not to lamina repositioning. To avoid ambiguity, we have replaced “detachment” with “reduced binding” in both locations (lines 123 and 329). We do not use this term to describe TTN–lamina positioning.

      Reviewer 2 - Point 2.1b (1) Figure 1D: I doubt whether such changes of CTCF protein abundance will lead to LAD detachment. Suggest the authors read van Schaik et al., Genome Biology, 2022. With the full depletion of CTCF, the effects on LADs are still very restricted.

      We agree that the observed correlation between reduced CTCF levels and the relocation of TTN away from a LAD does not establish causality. As outlined in our response to Reviewer 1 – Point 8c, we are performing additional FISH experiments at earlier differentiation stages in the CTCF inducible knockdown line to directly assess whether partial CTCF depletion is sufficient to alter the timing of TTN–lamina separation.

      Reviewer 2 - Point 2.2 (2) Figure 2D: Lamin B1 should be mostly at nuclear periphery. I have few questions: (1) is the antibody specific? (2) do these cells carry mutation in LMNB1 gene? (3) is the staining actually LMNA?

      As also clarified in response to Reviewer 1 – Point 8b, the original images displayed maximum-intensity projections of Z-stacks, which obscured the peripheral distribution of LMNB1. We have updated Figure 2D and Figure S2E to show representative individual optical sections, which more clearly display the expected peripheral LMNB1 signal. We also confirm that the antibody used is specific for LMNB1 and previously validated (Bertero et al. 2019b), and that the WTC11-derived lines used in this study carry no mutation in LMNB1.

      Reviewer 2 - Point 3

      3. Opposite functions of GATA4 and CTCF: These data in Figure 5E-H argues the opposite role of GATA4 and CTCF in transcriptional regulation. Would it be that CTCF KD just affected cell proliferation, which is actually known for many cell types, rather than affect CM differentiation process? If this is the reason, inversed correlation between CTCF KD and GATA4 KD in Figure 4D could also be explained by opposite effects on cell cycle.

      We directly evaluated this possibility. In FHF–LV cardioids, cell cycle profiling in Figure 6C and Figure S6C (now S7C) showed that CTCF knockdown does not alter the distribution of CMs across G1/S/G2–M phases, in contrast to the marked increase in proliferation observed with GATA4 knockdown.

      Because this comment referred specifically to the SHF data, we also analyzed mitotic gene expression in the SHF–RV bulk RNA-seq dataset using GSEA. CTCF knockdown did not significantly enrich any cell cycle–related gene sets, whereas GATA4 knockdown produced a strong enrichment for mitotic cell cycle terms, in line with FHF-LV data (Reviewer Figure 2).

      These results are summarized in updated Figure S5C, reporting also the results of the broader GSEA analysis, and together indicate that the transcriptional divergence between CTCF and GATA4 knockdown is not simply explained by opposing effects on proliferation.

      (The figure cannot be rendered in this text-only format)

      Reviewer Figure 2. GSEA for mitotic cell cycle in SHF-RV after inducible knockdown of CTCF (left) or GATA4 (right). p-values by Adaptive Monte-Carlo Permutation test.

      Reviewer 2 - Point 4 4. In discussion, the authors suggested that CTCF is a local chromatin remodeller. In my view, association with local chromatin compaction doesn't qualify CTCF as a chromatin remodeler. To my knowledge, CTCF does not have an enzymatic domain, then how does it remodel chromatin?

      Our intended meaning was that CTCF shapes 3D chromatin architecture through its role in organizing intergenic looping, not that it remodels chromatin enzymatically. To avoid confusion, we have removed the original sentence from the Discussion.

      Reviewer 2 - Point 5. 5. Some conclusions are drawn based on insignificant p-values, e.g. Figure 2F, Figure 3D, etc. The authors should be careful about their conclusion, and tone down their statement for the observations have borderline significance.

      The conclusions based on bulk RNA-seq have been revised in response to Reviewer 1 – Point 1 (updated Figure 2F). By subsetting B-to-A and A-to-B genes according to their expression dynamics, this analysis now yields clearer and statistically significant differences between conditions.

      Regarding the 4C-seq data, as acknowledged in Reviewer 2 – Point 3a, the observed effects are modest. We are generating additional biological replicates to increase statistical power. In the meantime, we have adjusted the text to avoid overstating these findings. The revised manuscript now states: “While the difference did not reach significance, these trends suggest …” (lines 199–200).

      Reviewer 2 - Minor comment 1. Minor comments: 1. Figure 1A: (1) I suggest to label two promoters in the gene model. It's unclear in the figure in the current version; (2) I was a bit confused with the way how the authors labeled CTCF directionality. I thought there are a lot of promoters. Why didn't they use triangles?

      We updated Figure 1A to label both TTN promoters and indicate their orientation. For CTCF sites, we now clearly display the motif direction and core binding region as determined by FIMO analysis of the CTCF ChIP-seq peaks, improving consistency and interpretability.

      Reviewer 2 - Minor comment 2. 2. Figure 2C: I think the drastical reduction of titin-mEGFP levels is only due to the way how the authors analyze their FACS data. Can the author quantify on median fluorescence intensity?

      The gating strategy for titin-mEGFP⁺ cells was defined using a reporter-negative control, and cells lacking TNNT2 expression showed no detectable titin-mEGFP signal, confirming the specificity of the gate. To complement this analysis, we also quantified the median fluorescence intensity (MFI) of titin-mEGFP⁺ cells. The MFI analysis corroborates the original findings, showing a significant decrease in GATA4 knockdown and an increase in CTCF knockdown (updated Figure S2D).

      __Reviewer 2 - Minor comment 3. 3. Figure S2G: P value should be -log10, I assume. Please label it accurately. __

      We appreciate the reviewer pointing out this labeling error. In the revised manuscript, this panel has been removed to accommodate the updated compartment–expression analysis now presented in updated Figure 2H (see response to Reviewer 1 – Point 1), and the issue is no longer applicable.

      References

      Barutcu AR, Maass PG, Lewandowski JP, Weiner CL, Rinn JL. 2018. A TAD boundary is preserved upon deletion of the CTCF-rich Firre locus. Nat Commun 9: 1444.

      Bertero A, Fields PA, Ramani V, Bonora G, Yardımcı GG, Reinecke H, Pabon L, Noble WS, Shendure J, Murry CE. 2019a. Dynamics of genome reorganization during human cardiogenesis reveal an RBM20-dependent splicing factory. Nature communications 10: 1538.

      Bertero A, Fields PA, Smith AS, Leonard A, Beussman K, Sniadecki NJ, Kim D-H, Tse H-F, Pabon L, Shendure J, et al. 2019b. Chromatin compartment dynamics in a haploinsufficient model of cardiac laminopathy. Journal of Cell Biology 218: 2919–44.

      Kang J, Kim YW, Park S, Kang Y, Kim A. 2021. Multiple CTCF sites cooperate with each other to maintain a TAD for enhancer–promoter interaction in the β-globin locus. The FASEB Journal 35: e21768.

      Poleshko A, Shah PP, Gupta M, Babu A, Morley MP, Manderfield LJ, Ifkovits JL, Calderon D, Aghajanian H, Sierra-Pagán JE, et al. 2017. Genome-Nuclear Lamina Interactions Regulate Cardiac Stem Cell Lineage Restriction. Cell 171: 573–587.

      Rodríguez-Carballo E, Lopez-Delisle L, Zhan Y, Fabre PJ, Beccari L, El-Idrissi I, Huynh THN, Ozadam H, Dekker J, Duboule D. 2017. The HoxD cluster is a dynamic and resilient TAD boundary controlling the segregation of antagonistic regulatory landscapes. Genes Dev 31: 2264–2281.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Becca et al. characterized the functions of GATA4 and CTCF in the context of cardiomyogenesis. The authors aim to establish a link between 3D genome changes (A/B compartment and long-range chromatin interactions) and activation of cardiac specific genes such as TTN. They showed opposite effects of GATA4 and CTCF in regulating these genes as well as phenotypical traits. I have the following suggestions and questions:

      Major comments:

      1. CTCF regulation at TTN locus:

      (1) Figure 1A: The claim of the authors about convergent CTCF sites and transcriptional activation of TTN is quite simplistic. This claim is only valid when we know where cohesin is loaded. If cohesin is loaded at then intragenic GATA4 binding site, then the only important CTCF sites is at the promoter of TTN. I suggest that the authors read few more publications which may help the authors to better understand how cohesin and CTCF team up to regulate transcription, such as Hsieh et al., Nature Genetics, 2022; Liu et al., Nature Genetics, 2021; Rinzema et al., Nature Structural and Molecular Biology, 2022.

      Suggestion: The authors should add cohesin (RAD21/SMC1A) and NIPBL ChIP-seq for better interpretation. (2) Figure 3B: If delta2CBS only has heterozygenous deletion of CBS6, why we would expect the binding will be weaken to 50%. However, the CTCF binding is reduced to around 1/10 in the ChIP-qPCR. How do the authors explain this?

      (3) Figure 3C: There are two problems with the 4C experiments: (a) The changes are really mild. In fact, none of the p-values in Figure 3D are significant; (b) The authors should also consider a model that CTCF directly serves as a repressor. In this way, 3D genome may not be involved. B-A switch is simply caused by the activation of the locus. 2. (CTCF) detachment: The authors mentioned few times "detachment". In the context of this manuscript, the authors indicate detachment from nuclear lamina. However, the authors haven't provide convincing evidence about this.

      (1) Figure 1D: I doubt whether such changes of CTCF protein abundance will lead to LAD detachment. Suggest the authors read van Schaik et al., Genome Biology, 2022. With the full depletion of CTCF, the effects on LADs are still very restricted.

      (2) Figure 2D: Lamin B1 should be mostly at nuclear periphery. I have few questions: (1) is the antibody specific? (2) do these cells carry mutation in LMNB1 gene? (3) is the staining actually LMNA? 3. Opposite functions of GATA4 and CTCF: These data in Figure 5E-H argues the opposite role of GATA4 and CTCF in transcriptional regulation. Would it be that CTCF KD just affected cell proliferation, which is actually known for many cell types, rather than affect CM differentiation process? If this is the reason, inversed correlation between CTCF KD and GATA4 KD in Figure 4D could also be explained by opposite effects on cell cycle. 4. In discussion, the authors suggested that CTCF is a local chromatin remodeller. In my view, association with local chromatin compaction doesn't qualify CTCF as a chromatin remodeler. To my knowledge, CTCF does not have an enzymatic domain, then how does it remodel chromatin? 5. Some conclusions are drawn based on insignificant p-values, e.g. Figure 2F, Figure 3D, etc. The authors should be careful about their conclusion, and tone down their statement for the observations have borderline significance.

      Minor comments:

      1. Figure 1A: (1) I suggest to label two promoters in the gene model. It's unclear in the figure in the current version; (2) I was a bit confused with the way how the authors labeled CTCF directionality. I thought there are a lot of promoters. Why didn't they use triangles?
      2. Figure 2C: I think the drastical reduction of titin-mEGFP levels is only due to the way how the authors analyze their FACS data. Can the author quantify on median fluorescence intensity?
      3. Figure S2G: P value should be -log10, I assume. Please label it accurately.

      Significance

      Strengths and limitations:

      I feel that single-cell analysis and functional analysis of GATA4 and CTCF using cardiac organoid model are elegant. However, the weak part of the manuscript is the link between 3D genome and activation of TTN. I also think the authors should include more possible explanations for the interpretation of some genome organization data (CTCF site deletion, 4C, etc).

      Advance: The study does provide useful information to understand transcriptional regulation during cardiac lineage specification. The link between 3D genome and cardiac lineage specification is conceptually nice but needs more data to support.

      Audience: developmental biologists who is interested in heart development and molecular biologists with specific interests in gene regulation.

    1. Reviewer #1 (Public review):

      Summary:

      The authors report the results of a tDCS brain stimulation study (verum vs sham stimulation of left DLPFC; between-subjects) in 46 participants, using an intense stimulation protocol over 2 weeks, combined with an experience-sampling approach, plus follow-up measures after 6 months.

      Strengths:

      The authors are studying a relevant and interesting research question using an intriguing design, following participants quite intensely over time and even at a follow-up time point. The use of an experience-sampling approach is another strength of the work.

      Weaknesses:

      There are quite a few weaknesses, some related to the actual study and some more strongly related to the reporting about the study in the manuscript. The concerns are listed roughly in the order in which they appear in the manuscript.

      (1) In the introduction, the authors present procrastination nearly as if it were the most relevant and problematic issue there is in psychology. Surely, procrastination is a relevant and study-worthy topic, but that is also true if it is presented in more modest (and appropriate) terms. The manuscript mentions that procrastination is a main cause of psychopathology and bodily disease. These claims could possibly be described as 'sensationalized'. Also, the studies to support these claims seem to report associations, not causal mechanisms, as is implied in the manuscript.

      (2) It is laudable that the study was pre-registered; however, the cited OSF repository cannot be accessed and therefore, the OSF materials cannot be used to (a) check the preregistration or to (b) fill in the gaps and uncertainties about the exact analyses the authors conducted (this is important because the description of the analyses is insufficiently detailed and it is often unclear how they analyzed the data).

      (3) Related to the previous point: I find it impossible to check the analyses with respect to their appropriateness because too little detail and/or explanation is given. Therefore, I find it impossible to evaluate whether the conclusions are valid and warranted.

      (4) Why is a medium effect size chosen for the a priori power analysis? Is it reasonable to assume a medium effect size? This should be discussed/motivated. Related: 18 participants for a medium effect size in a between-subjects design strikes me as implausibly low; even for a within-subjects design, it would appear low (but perhaps I am just not fully understanding the details of the power analysis).

      (5) It remains somewhat ambiguous whether the sham group had the same number of stimulation sessions as the verum stimulation group; please clarify: Did both groups come in the same number of times into the lab? I.e., were all procedures identical except whether the stimulation was verum or sham?

      (6) The TDM analysis and hyperbolic discounting approach were unclear to me; this needs to be described in more detail, otherwise it cannot be evaluated.

      (7) Coming back to the point about the statistical analyses not being described in enough detail: One important example of this is the inclusion of random slopes in their mixed-effects model which is unclear. This is highly relevant as omission of random slopes has been repeatedly shown that it can lead to extremely inflated Type 1 errors (e.g., inflating Type 1 errors by a factor of then, e.g., a significant p value of .05 might be obtained when the true p value is .5). Thus, if indeed random slopes have been omitted, then it is possible that significant effects are significant only due to inflated Type 1 error. Without more information about the models, this cannot be ruled out.

      (8) Related to the previous point: The authors report, for example, on the first results page, line 420, an F-test as F(1, 269). This means the test has 269 residual degrees of freedom despite a sample size of about 50 participants. This likely suggests that relevant random slopes for this test were omitted, meaning that this statistical test likely suffers from inflated Type 1 error, and the reported p-value < .001 might be severely inflated. If that is the case, each observation was treated as independent instead of accounting for the nestedness of data within participants. The authors should check this carefully for this and all other statistical tests using mixed-effects models.

      (9) Many of the statistical procedures seem quite complex and hard to follow. If the results are indeed so robust as they are presented to be, would it make sense to use simpler analysis approaches (perhaps in addition to the complex ones) that are easier for the average reader to understand and comprehend?

      (10) As was noted by an earlier reviewer, the paper reports nearly exclusively about the role of the left DLPFC, while there is also work that demonstrates the role of the right DLPFC in self-control. A more balanced presentation of the relevant scientific literature would be desirable.

      (11) Active stimulation reduced procrastination, reduced task aversiveness, and increased the outcome value. If I am not mistaken, the authors claim based on these results that the brain stimulation effect operates via self-control, but - unless I missed it - the authors do not have any direct evidence (such as measures or specific task measures) that actually capture self-control. Thus, that self-control is involved seems speculation, but there is no empirical evidence for this; or am I mistaken about this? If that is indeed correct, I think it needs to be made explicit that it is an untested assumption (which might be very plausible, but it is still in the current study not empirically tested) that self-control plays any role in the reported results.

      (12) Figures 3F and 3H show that procrastination rates in the active modulation group go to 0 in all participants by sessions 6 and 7. This seems surprising and, to be honest, rather unlikely that there is absolutely no individual variation in this group anymore. In any case, this is quite extraordinary and should be explicitly discussed, if this is indeed correct: What might be the reasons that this is such an extreme pattern? Just a random fluctuation? Are the results robust if these extreme cells are ignored? The authors remove other cells in their design due to unusual patterns, so perhaps the same should be done here, at least as a robustness check.

      (13) The supplemental materials, unfortunately, do not give more information, which would be needed to understand the analyses the authors actually conducted. I had hoped I would find the missing information there, but it's not there.

      In sum, the reported/cited/discussed literature gives the impression of being incomplete/selectively reported; the analyses are not reported sufficiently transparently/fully to evaluate whether they are appropriate and thus whether the results are trustworthy or not. At least some of the patterns in the results seem highly unlikely (0 procrastination in the verum group in the last 2 observation periods), and the sample size seems very small for a between-subjects design.

    2. Reviewer #2 (Public review):

      Summary:

      Chen and colleagues conducted a cross-sectional longitudinal study, administering high-definition transcranial direct stimulation targeting the left DLPFC to examine the effect of HD-tDCS on real-world procrastination behavior. They find that seven sessions of active neuromodulation to the left DLPFC elicited greater modulation of procrastination measures (e.g., task-execution willingness, procrastination rates, task aversiveness, outcome value) relative to sham. They report that tDCS effects on task-execution willingness and procrastination are mediated by task outcome value and claim that this neuromodulatory intervention reduces procrastination rates quantified by their task. Although the study addresses an interesting question regarding the role of DLPFC on procrastination, concerns about the validity of the procrastination moderate enthusiasm for the study and limit the interpretability of the mechanism underlying the reported findings.

      Strengths:

      (1) This is a well-designed protocol with rigorous administration of high-definition transcranial direct current stimulation across multiple sessions. The approach is solid and aims to address an important question regarding the putative role of DLPFC in modulating chronic procrastination behavior.

      (2) The quantification of task aversiveness through AUC metrics is a clever approach to account for the temporal dynamics of task aversiveness, which is notoriously difficult to quantify.

      Weaknesses:

      (1) The lack of specificity surrounding the "real-world measures" of procrastination is problematic and undermines the strength of the evidence surrounding the DLPFC effects on procrastination behavior. It would be helpful to detail what "real-world tasks" individuals reported, which would inform the efficacy of the intervention on procrastination performance across the diversity of tasks. It is also unclear when and how tasks were reported using the ESM procedure. Providing greater detail of these measures overall would enhance the paper's impact.

      (2) Additionally, it is unclear whether the reported effects could be due to differential reporting of tasks (e.g., it could be that participants learned across sessions to report more achievable or less aversive task goals, rather than stimulation of DLPFC reducing procrastination per se). It would be helpful to demonstrate whether these self-reported tasks are consistent across sessions and similar in difficulty within each participant, which would strengthen the claims regarding the intervention.

      (3) It would be helpful to show evidence that the procrastination measures are valid and consistent, and detail how each of these measures was quantified and differed across sessions and by intervention. For instance, while the AUC metric is an innovative way to quantify the temporal dynamics of task-aversiveness, it was unclear how the timepoints were collected relative to the task deadline. It would be helpful to include greater detail on how these self-reported tasks and deadlines were determined and collected, which would clarify how these procrastination measures were quantified and varied across time.

      (4) There are strong claims about the multi-session neuromodulation alleviating chronic procrastination, which should be moderated, given the concerns regarding how procrastination was quantified. It would also be helpful to clarify whether DLPFC stimulation modulates subjective measures of procrastination, or alternatively, whether these effects could be driven by improved working memory or attention to the reported tasks. In general, more work is needed to clarify whether the targeted mechanisms are specific to procrastination and/or to rule out alternative explanations.

    3. Author response:

      Reviewer #1:

      (1) We fully thank you to point out the risks of sensationalizing ramification of procrastination on psychopathology, and would rewrite the Introduction section by adding balanced evidence and overall toning down such inappropriate claims meanwhile.

      (2) Thank you to raise this crucial question. We are sorry for this fundamental technical issue to preregistration. This occurs from a seriously technical hurdle. The OSF has banned my OSF account, as it claimed to detect “suspicious user’s activities” in my account. This causes no accesses to all materials that already deposited in this OSF account, including this preregistration. We have contacted OSF team, but received no valid technical solution. We reckon that this may be mistaken by my affiliation changes to Third Military Medical University of People’s Liberation Army (PLA). To tackle with this technical issue, we shall upload preregistration in a new repository soon.

      (3) This is a back-to-back study to conceptually probe into whether strengthening left DLPFC can mitigate procrastination via reducing task aversiveness or weighting outcome value. Thus, the current study selected a medium effect size in aprior by following the previous one (Xu et al., 2023). This effect size is calculated by the new tool called “Power Contours” (Baker et al., 2021), which weights statistical power by increasing within-subject repeated measures. As you kindly pointed out, we shall clarify effect size calculation in the revised manuscript.

      (4) Yes, both groups come in the same number of times into the lab for tDCS stimulation, except to the type (active vs sham).

      (5) We shall add full details for clarifying TDM and hyperbolic discounting modeling.

      (6) Thank you to raise this very crucial statistical question. We shall double-check whether multiple sessions are modeled as random slopes, and would like to reanalysis it in case which those random slopes are omitted.

      (7) Thank you. We have no intentions of confusing you by adding those complicated statistics, but indeed enrich understanding of how we can interpret those findings.

      (8) Yes, as mentioned above, we shall add balanced evidence to clarify both left and right DLPFC may function to self-control capability in the Introduction section.

      (9) Yes, this is a conceptual hypothesis --- actively stimulating left DLPFC could improve self-control functions. Thank you for this very nuanced but crucial insight, and we could explicitly clarify the nature of our conclusions.

      (10) Yes, we ensure that all the participants successfully completed their tasks before deadline at session 6 and 7, and the procrastination rates have been all decreased to 0. Personally speaking, this is somewhat surprise to us as well, but we affirmed this case. For a portion of participants included in the active group, we have received written letters of thanks from them. Thus, this is surprise but exciting finding. Furthermore, thank you for this helpful suggestion, and we would like to do this robustness check by iteratively removing each session, to obviate the statistical biases from an extreme pattern.

      (11) Yep, we fully agree with you to add full details in the main text rather in Supplemental materials, and would like to do so in the first round of revision.

      Reviewer #2:

      (1) Thank you for this very crucial suggestion. We are sorry for this case that much details are omitted to comply with editorial requirement at Nature Human Behaviour (last submission). We do apologize to confuse you as those ambiguous descriptions, and would like to clearly clarify how we measure participants’ procrastination in the real-world tasks. In brief, we asked participant to report a real task that would really happen in the tomorrow and its deadline is also no more than tomorrow. When tomorrow comes, we used ESM to require participant reporting real task completion rate (0-100%) at five time points before the deadline. The five time points are determined by a hyperbolic discounting model (see how and why we set those five time points in the full author’s response letter later). When participant reports the real task completion rate (0-100%) at a given time point, she/he is required to provide a photo to prove its authenticity. The dependent variable --- real-world procrastination rates --- is thus calculated as 100% subtracts the task completion rate (0-100%) when the deadline meets. That is to say, if participant reports task has been fully completed before or when deadline meets, his/her real-world procrastination rate is 100% - 100% = 0%; if reporting task has been completed 60% when deadline meets, the real-world procrastination rate is determined as 100% - 60% = 40%. Do not worry for spurious reporting, we asked all the participants to provide photo verifying the real task completion rate. This is merely a short instance. We shall show the full details in the formal author response letter later.

      (2) This is a very meaningful point. We agree with you for this case that participants may learn how to complete this experiment task swiftly rather benefit from neuromodulation. This speculation makes sense, but is compromised by experimental control and empirical observations. Firstly, we do not say “You must complete this task” or “The task completion is associated with bonus/rewards you may get” for participants, which indicates no motivations to do so. Then, the measures to task completion rate are not yet fully based on self-reporting, and we mandate them to provide photos for verification. Thus, this controls the marked risks of spurious reporting. Lastly, all the participants, including ones in either active or sham group, received all the same treatments, excepting “real simulation” and “sham simulation” protocol. Results demonstrated the significant amelioration in the active group rather sham one, indicating no significant “placebo” or “task learning” side effect.

      (3) Thank you. As you kindly suggested, we would like to add huge details for those measures in the revised manuscript. While this is a great idea, we did not collect procrastination scores from scales after neuromodulation, and would like to warrant this point into the Limitation section.

      (4) Yep, this is a conceptual hypothesis --- actively stimulating left DLPFC could improve self-control functions. We cannot rule out possibilities of amplifying working memory, attention or other cognitive components from this neuromodulation protocol. We fully agree with you for this helpful recommendation --- we would like tone down those claims regarding the roles of DLPFC on self-control, and explicitly warrant that this mechanism may be specialized to the procrastination.

      Reviewer #3:

      (1) Thank you for taking valuable time to review our manuscript. Yep, limited sample size should warrant cautions to draw a solid conclusion. We would like to claim it into the limitation section. Also, we have streamlined and tightened statistic section by removing complicated and redundancy statistical models.

      (2) As mentioned above, we are sorry for this fundamental technical issue to preregistration. This occurs from a seriously technical hurdle. The OSF has banned my OSF account, as it claimed to detect “suspicious user’s activities” in my account. This causes no accesses to all materials that already deposited in this OSF account, including this preregistration. We have contacted OSF team, but received no valid technical solution. We reckon that this may be mistaken by my affiliation changes to Third Military Medical University of People’s Liberation Army (PLA). To tackle with this technical issue, we shall upload preregistration in a new repository soon.

      (3) Yep, thank you for this very helpful suggestion. As you kindly indicated, we would like to clarify measures, analyses, methods, and protocols, as well as tighten the whole manuscript.

      References

      Baker, D. H., Vilidaite, G., Lygo, F. A., Smith, A. K., Flack, T. R., Gouws, A. D., & Andrews, T. J. (2021). Power contours: Optimising sample size and precision in experimental psychology and human neuroscience. Psychological methods, 26(3), 295–314. https://doi.org/10.1037/met0000337

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122-1133. https://doi.org/10.1037/xge0001312

      Again, we wholeheartedly appreciate all of those very helpful and insightful comments, with each one to contribute substantially for the quality of this manuscript. Notably, those response we presented above are merely provisional and initial. We shall revise our manuscript following those suggestions, one-by-one, along with a full-length response letter.

    1. Reviewer #2 (Public review):

      Summary:

      In this review article, the authors discuss the whole-brain activity changes induced by brain stimulation. They review the literature on how these activity changes depend on the cognitive state of the brain and divide the results by the scale of the change being induced, from microscale changes across small groups of neurons, up to macroscale changes across the entire brain. Finally, they describe attempts to model these changes using computational models.

      Strengths:

      The review provides an overview of the results within this subfield of neuroscience, and the authors are able to discuss a lot of prior results. The framing of the changes in neuronal activity in terms of computational changes is also a helpful approach.

      Weaknesses:

      However, the authors are not able to contextualize these results within a single framework, i.e. explaining from first principles how different aspects of stimulus-induced changes interact to generate functional changes in the brain, and how different changes - at distinct spatiotemporal scales - combine to form larger effects. This is a significant weakness in generating a review of the literature, since the authors do not provide a cohesive conceptual framework on which to frame the results. Similarly, the authors do not explain how their different computational models fit together, and how one can get a singular computational understanding of the distinct mechanisms of brain activity changes due to stimulation under different brain states, by combining the results derived from each separate model.

      Major Comments:

      (1) The authors have written this review as if it were intended for an audience who is already familiar with the topics. For example, they introduce concepts like complexity, spiral vs planar waves, without much explanation.

      (2) Regarding complexity, the authors present a quantification termed PCI. However, in the associated box, they state that PCI could be implemented in a number of different ways, using analogous metrics (which are, nonetheless, not identical). Yet the authors simply claim that all these metrics are sufficiently similar to be grouped together as "PCI". The authors do not provide much intuition about this, and they also don't present any other potential quantifications. This makes any interpretation of their results strongly dependent on your understanding of the concept of PCI. It would be helpful to present some other, analogous metric to demonstrate that the results that the authors are focusing on are not somehow tied to the specific computational structure of the PCI metric.

      (3) The authors divide the review into sections organized by the spatial extent of the effects that they are exploring (e.g. from microscale to macroscale). However, they don't bring together these insights into a cohesive structure - for example, by providing potential explanations of the macroscale effects by using the microscale changes.

      (4) The authors completely ignore any aspect of cell-type specificity in their review, despite the known importance of specific cell types at the microcircuit scale. This makes it difficult to map their results onto the true biological system.

      (5) The authors introduce several different computational models, such as the Hopf model, the AdEx model, and the MPR model. However, they do not provide the reader with a conceptual understanding of the structure of each of these models (except through potentially more complex terminology, e.g. the Hopf model is a "phenomenological Stuart-Landau nonlinear oscillator"). Additionally, though they present the results of each simulation, they don't provide the reader with intuition about how these models compare against each other, and how best to interpret results derived from each model.

      (6) In several cases, the authors make statements that they appear to believe to be completely straightforward (and require no justification), but that do not appear so to the reader. For example, they mention: "In wakefulness and REM sleep, ..., the membrane potential is depolarized and close to the spike threshold, which explains why neurons respond more reliably and with less response variability compared with slow-wave sleep". However, this statement is not obvious to the reader and requires explanation (for example, in a system that is close to balance, bringing cells closer to the firing threshold can result in increased response jitter).

    1. Reviewer #1 (Public review):

      Summary:

      In the paper, the authors review literature on synchronous activity, its relationship to brain state, and the multi-scale mechanisms underlying it.

      Strengths:

      The overall strength of the paper is the wide range of information reviewed, and the diversity of perspectives/approaches it brings together.

      Weaknesses:

      However, this strength is also the source of its major weaknesses - namely, that the overall structure lacks clarity, and there are inconsistencies throughout. Overall, in the opinion of this reviewer, the manuscript reads as disorganized and incomplete. Major and minor points are delineated below.

      Major points:

      (1) Most of the text in many figures was too small to read.

      (2) Terminology is inconsistent throughout the manuscript. What is the difference between slow oscillations and delta waves? Sometimes the term slow waves is used instead. For sleep state, sometimes the term SWS is used, sometimes non-REM. Similarly, "spindle activity" is not defined, but simply stated as if the reader knows. This brings up two issues: (a) the manuscript should be clearer and more consistent about its terminology, and (b) it's unclear who is the intended readership of the review - is it a pedagogical review for people outside the field of sleep and slow oscillations, or is it meant to be a consensus statement for readers who are already in the field in which a pressing concern has been addressed? It seems part way between these two, and as a result, is ineffective at either goal.

      (3) I suggest the authors look again at the overall structure and flow of the review... many sections feel redundant, and it's unclear how they fit together into a single review.

      (4) There are many speculative statements in the review that are not justified or explained sufficiently for the reader. For example: "While highly regular slow waves in vivo suggest a single mechanism of generation, namely local cortical circuits, irregular cycles are compatible with a larger role of subcortical nuclei, ..."; "The involvement of different cortical areas and subcortical nuclei can form the basis of these different roles in memory.". For these statements, I assume the relationship between slow wave statistics, subcortical nuclei, and memory either has been written about before, and then should be cited and summarized, or is a novel claim of the authors, which then should be explained and defended rather than stated. There are other similar examples, and I suggest the authors go through the manuscript and make sure that it's clear what is a novel claim of the authors vs a cited claim, and make sure that both are sufficiently justified for the reader.

      (5) An especially notable example can be found in the section on the role of the thalamus, where the authors state that they "hold that slow oscillations are fundamentally cortical". However, this section is far too short, and very little evidence is provided to back up this claim. Please review the ways in which the thalamus modulates, and, e.g., ways in which up-down is similar/different without the thalamus.

    2. Reviewer #2 (Public review):

      Summary:

      In this review article, the authors discuss the correlated dynamical states associated with distinct cognitive states, including those associated with anesthesia and sleep. They present evidence that these states are primarily cortically generated, and demonstrate the properties of these dynamical states at different levels, from the microscale dynamics in individual neurons to the macroscale dynamics across the brain.

      Strengths:

      Multiple groups have been adding to this field over the past decades, and therefore, a review of this literature is very helpful. This review collates a large amount of the literature within this field into a single document, which should make it a valuable resource within this area of neuroscience.

      Weaknesses:

      Unfortunately, this review does not seem to be a balanced viewpoint of the field in question. Although there are a lot of authors in the review, it feels as if they are from a common school of thought. The authors provide only a single perspective on these dynamical states, focusing on the perspective of wave-like electrical dynamics across the cortex. Their perspective is embedded in methods such as EEG and LFP recordings. This makes the work hard to interpret outside of the field in which the authors reside. Indeed, the review seems intended for a more specialized audience.

      In addition, the article reads more like a catalog of prior studies as opposed to a true synthesis across the large volume of data in this field that highlights links across multiple sources. Hence, it does not seem to provide a novel way of understanding the dynamics involved in cognitive state transitions.

      We have included more details on these general comments below:

      Major Comments:

      (1) The authors have written this review as if it were intended for an audience who is already familiar with these topics. They do not define many of the terms that they introduce within the review, including concepts like complexity, metastability, and oscillations that are fundamental to the concepts that the authors are introducing. Though these may seem like first principles concepts to the authors, they often introduce assumptions that may be unfamiliar to the general reader. For example, are slow wave oscillations periodic? A naïve reader may assume that oscillations - characterized by their frequency - should be somewhat periodic, but that is often not the case. For a journal with a general biological science readership, it would be particularly helpful for each of these terms to be formally defined and characterized.

      (2) It would be helpful for the authors to reframe their work in different perspectives and to incorporate all the literature on the dynamics of cortical brain states, and not simply the work that is most familiar to them. As one example, the authors do not discuss cell-type-specific changes in brain state during anesthesia and in altered states of consciousness (including dissociative states and hallucinatory states). There is recent work in this vein (Suzuki and Larkum, 2020; Vesuna et al, 2020; Bharioke, Munz et al, 2023), and yet the authors do not discuss these papers.

      (3) Given the authors' clear, extensive knowledge of their field, it would also be extremely helpful for the authors to reframe fundamental concepts in terms of neuronal population activity, trajectory analyses, etc. This would enable a more general audience to better understand their work.

      (4) The authors have one section focused on thalamic contributions to cortical wave-like activity. This is a cursory treatment of a subject that is quite controversial in the field. It would be helpful if the authors could provide a more balanced consideration of all the evidence regarding potential thalamocortical interactions and their role in wave-like activity.

      (5) The authors present many computational models and describe the results of simulations with these different models. However, this doesn't provide the reader with intuition about what each model adds or removes from the true biological picture. It would be helpful for the authors to provide some intuition about the assumptions and constraints that underlie each model.

      (6) The authors state that "The main mechanism [of slow oscillatory dynamics] consists of a combination of two ingredients: the recurrent connectivity, which maintains the excitability in the network, and adaptation, an activity-dependent fatigue variable that provides inhibitory feedback". They make this statement as a fact, yet they don't provide much justification for it. Additionally, it's not clear that any other possible combination of ingredients would be able to produce slow oscillatory dynamics.

      (7) The authors often define one concept in terms of other equally complex concepts. For example: "EIA (excitatory-inhibitory with adaptation) cortical circuits then display the typical slow-fast dynamics of relaxation oscillators". The reader would need an explanation of slow-fast dynamics and relaxation oscillators to understand this line, neither of which is provided in the text.

      (8) When discussing sleep, the authors do not discuss REM sleep, focusing on slow-wave non-REM sleep. It would be helpful if the authors could at least frame the full sleep cycle and discuss why they are focusing on one part of it.

      (9) The authors introduce the concept of sleep spindles without any explanation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This report demonstrates that the gene expression output of the Wnt pathway, when controlled precisely by a synthetic light-based input, depends substantially on the frequency of stimulation. The particular frequency-dependent trend that is observed - anti-resonance, a suppression of target gene expression at intermediate frequencies given a constant duty cycle - is a novel aspect that has not been clearly shown before for this or other signaling pathways. The paper provides both clear experimental evidence of the phenomenon with engineered cellular systems and a model-based analysis of how the pairing of rate constants in pathway activation/deactivation could result in such a trend.

      Strengths:

      This report couples in vitro experimental data with an abstracted mathematical model. Both of these approaches appear to be technically sound and to provide consistent and strong support for the main conclusion. The experimental data are particularly clear, and the demonstration that Brachyury expression is subject to anti-resonance in ESCs is particularly compelling. The modeling approach is reasonably scaled for the system at the level of detail that is needed in this case, and the hidden variable analysis provides some insight into how the anti-resonance works.

      Weaknesses:

      (1) The anti-resonance phenomenon has not been demonstrated using physiological Wnt ligands; however, I view this as only a minor weakness for an initial report of the phenomenon. The potential significance of the phenomenon for Wnt outweighs the amount of effort it would take to carry the demonstration further - testing different frequencies/duty cycles at the level of ligand stimulus using microfluidics could get quite involved, and would likely take quite some time. Adding some more discussion about how the time scales of ligand-receptor binding could play into the reduced model would further ameliorate this issue.

      We thank the reviewer for this comment and the interesting suggestion to test the anti-resonance phenomenon with microfluidics. We agree that combining physiological Wnt ligands with microfluidic stimulation would go beyond the scope of this current study, though it is an interesting extension. One advantage of the optogenetic setup, as mentioned in the discussion, is that the Wnt stimulus can be turned off sharply. This allows us to test the output from perfectly square wave input profiles; in microfluidics, washing the sticky ligand off the cells might “smear” the effective input profile cells respond to.

      We show in Supplement Fig. 6, that our reduced model matches the experimental data and that we would expect the antiresonance phenomenon as long as (see Fig. 4). Practically, a smeared input profile implies an effective reduction of 𝑘<sub>off</sub>, which means that the phenomenon would be visible with microfluidics (provided the minimum is deep enough, see Fig. 4). However, this should still be considered with caution, as the antiresonance would then appear because the cells essentially receive a smeared out or continuous pulse in the high frequency limit, rather than cells responding to a square wave in a specific way.

      (2) While the model is fully consistent with the data, it has not been validated using experimental manipulations to establish that the mechanisms of the cell system and the model are the same. There may be some ways to make such modifications, for example, using a proteasome inhibitor. An alternative would be to more explicitly mention the need to validate the model's mechanism with experiments.

      We thank the reviewer for this valuable and constructive comment. We agree that future experimental perturbations that directly modulate pathway activation and reset kinetics—such as proteasome inhibition, targeted degradation of pathway components, or engineered changes in receptor turnover—would provide an important validation of the model’s mechanistic interpretation. In the present study, our primary goal was to establish the existence and quantitative features of anti-resonance in the Wnt pathway and to identify the minimal set of timescale relationships that can explain it. We view the proposed experimental validations as exciting next steps that extend beyond the scope of the current work, and we are grateful to the reviewer for emphasizing their importance. We now mention this explicitly in the discussion of our manuscript.

      (3) I think the manuscript misses an opportunity to discuss the potential of the phenomenon in other pathways. The hedgehog pathway, for example, involves GSK3-mediated partial proteolysis of a transcription factor, which could conceivably be subject to similar behaviors, and there are certainly other examples as well.

      We thank the reviewer for pointing out an opportunity to emphasize the possibility of this phenomenon in other pathways. The minimal model indicates that anti-resonance emerges whenever a rapid activating process is paired with a slower deactivating/reset process. Beyond Hedgehog/Gli processing, candidate circuits include: NF-κB (rapid IκBα phosphorylation/degradation vs slower IκBα resynthesis), ERK (fast phosphorylation bursts vs slower transcriptional negative feedback such as DUSPs), Notch (fast γ-secretase NICD release vs slower NICD turnover and feedback), BMP/TGF-β–SMAD (fast R-SMAD phosphorylation vs slower receptor trafficking/SMAD7 feedback), and Hippo/YAP (rapid cytoplasmic sequestration vs slower transcriptional feedback). Each contains the same timescale separation that should create a frequency ‘stop-band,’ predicting suppressed gene expression or fate transitions at intermediate stimulation frequencies. We have updated the manuscript’s discussion to mention the Hedgehog connection with the following added sentence in the discussion: Analogous band-stop filtering should arise in other developmental circuits that couple a fast ‘ON’ step to slower deactivation or negative feedback. In Hedgehog, for example, PKA/CK1/GSK3-mediated partial proteolysis of Gli with slower recovery of full-length Gli creates the same fast-activation/slow-reset motif our hidden-variable model predicts will yield anti-resonance, and Wnt–Hedgehog crosstalk through the shared kinase GSK3 suggests such frequency selectivity could occur in other developmental signaling pathways.

      We also added an additional sentence regarding different activation and deactivation timescales in other pathways.

      (4) Some aspects of the modeling and hidden variable analysis are not optimally presented in the main text, although when considered together with the Supplemental Data, there are no significant deficiencies.

      We have addressed the model choices and analysis now more clearly in the main manuscript and also referred to the Supplemental Data more directly.

      Reviewer #2 (Public review):

      Summary:

      By combining optogenetics with theoretical modelling, the authors identify an anti-resonance behavior in the WnT signaling pathway. This behavior is manifested as a minimal response at a certain stimulation frequency. Using an abstracted hidden variable model, the authors explain their findings by a competition of timescales. Furthermore, they experimentally show that this anti-resonance influences the cell fate decision involved in human gastrulation.

      Strengths:

      (1) This interdisciplinary study combines precise optogenetic manipulation with advanced modelling.

      (2) The results are directly tested in two different systems: HEK293T cells and H9 human embryonic stem cells.

      (3) The model is implemented based on previous literature and has two levels of detail: i) a detailed biochemical model and ii) an abstract model with a hidden parameter.

      Weaknesses:

      (1) While the experiments provide both single-cell data and population data, the model only considers population data.

      We thank the reviewer for correctly pointing out that the single-cell measurements would in principle allow us to incorporate the cell-to-cell heterogeneity into the model. In this study, we sought to identify a minimal quantitative model of the Wnt pathway that could explain anti-resonance through competing time scales. We believe that, for our purposes, focusing on population data allowed us to keep the complexity of the model to a minimum to increase its explanatory value. We agree with the reviewer that considering single-cell trajectories is an interesting direction for further work.

      (2) Although the model captures the experimental data for TopFlash very well, the beta-Cat curves (Figure 2B) are only described qualitatively. This discrepancy is not discussed.

      Indeed, our model fits to mean β-catenin expressions are more qualitative than for TopFlash. The fit for β-catenin was tricky, as expression of β-catenin is typically low and closer to the detectable limits than TopFlash. These experimental constraints mean that the variation between individual signal trajectories is higher for β-catenin compared to the light-off condition than for TopFlash. Therefore, we strove to obtain a qualitative rather than a quantitative fit to the mean expression profile in β-catenin.  The current model fit is well within the standard deviation of variation. Given the observed heterogeneity and the fact that we take the parameters from literature (which ensures that the order of magnitude of parameters is in a sensible range), we believe that the model fits are reasonable. We now mention this explicitly in the text.

      Overall Assessment:

      The authors convincingly identified an anti-resonance behavior in a signaling pathway that is involved in cell fate decisions. The focus on a dynamic signal and the identification of such a behavior is important. I believe that the model approach of abstracting a complicated pathway with a hidden variable is an important tool to obtain an intuitive understanding of complicated dependencies in biology. Such a combination of precise ontogenetic manipulation with effective models will provide a new perspective on causal dependencies in signaling pathways and should not be limited only to the system that the authors study.

      We thank both reviewers for the positive assessment of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are several points that deserve more discussion, as noted above in the review.

      (1) It would be worthwhile to consider whether a relatively simple experiment with a proteasome inhibitor or similar pharmacological manipulation could provide useful validation data for the model.

      We address this point above in the weaknesses section from reviewer 1.

      (2) The figure legend for S5C should clarify whether the values plotted are at a particular fixed time point, or (more likely) at a certain time following the second pulse, which would be variable.

      We have modified the figure caption to clarify that the values plotted are at a fixed time point in the simulation (t\=48 hrs). We chose this timepoint sufficiently long after the second pulse to ensure that there are no residual dynamical effects. We thank the reviewer for noting this.

      (3) As noted in the Sci Score document, various aspects of the resource reporter should be improved, such as including RRIDs, etc.

      We are sending out our plasmids to AddGene; versions for Python and Matlab are listed in our methods section.

      Reviewer #2 (Recommendations for the authors):

      I mostly have suggestions to improve the clarity of the presentation.

      (1) Not all symbols in the equations given in the main text are explained. This is rather annoying, because either you present them and explain what they are or you don't show them and refer to the supplements. For example, d_0 or c_o or \bar{b} or n or K are not explained.

      We have now more clearly presented the parameters in the main text and added signposts to the Methods section.

      (2) Overall, it is often not clear what data in the figures are redundant, although the authors referred to them in the text. For example, in Figure 2c, a curve for 24 hours is shown and referred back to Figure 1D. However, in Figure 1D there is no curve for 24 hours. Is the data from Supplementary Figure 1 H and K also in the main text?

      We thank the referee for pointing out these redundancies. We have now included the 24hr line in Figure 1D and are now only showing the unsmoothed data, also in the main text of the manuscript. To clarify supplemental figures, we have now removed S1H and S1K since all they showed was the unsmoothed version of the data. The remaining plots in Supplementary Figure 1 are normalized differently from what we show in Figure 1 to demonstrate our choice of normalization is not the reason for the observed optogenetic response.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) The authors state that more is known about glial reactivation than cell-cycle re-entry. They are confusing many points here. More gene networks that require cell-cycle re-entry are known. Some of the genes listed for "reactivation" are, in fact, required for cell cycle re-entry/proliferation. And the authors confuse gliosis vs glial reactivation.

      We thank the reviewer for this important and constructive comment. We fully agree that clearly distinguishing between the concepts of glial reactivation, glial proliferation, gliosis, and neurogenesis is essential to avoid conceptual confusion in our study.

      Injury-induced retinal regeneration in zebrafish:

      Glial reactivation refers to the initial response of quiescent Müller glia (MG) to injury, characterized by morphological changes and upregulation of reactive markers (e.g., gfap, ascl1a, lin28a) and activation of signaling pathways such as Notch, Jak/Stat, and Wnt (Lahne et al., 2020; Pollak et al., 2013; Sifuentes et al., 2016; Yao et al., 2016).

      Glial proliferation refers to the clonal expansion of these MG-derived progenitor cells, which undergo rapid cell-cycle re-entry and amplify to generate sufficient progenitors for regeneration (Iribarne and Hyde, 2022; Lee et al., 2024; Wan and Goldman, 2016)

      Gliosis vs neurogenesis represents a divergent fate decision following proliferation. In zebrafish, MG-derived progenitor cells differentiate into retinal neurons that can replace those damaged or lost due to retinal injury. In contrast, mammalian MG tend to undergo an initial gliotic surge and rapidly revert to a quiescent state, exhibiting gliosis and glial scarring (Thomas et al., 2016; Yin et al., 2024). Thus, we totally agreed that gliosis cannot be confused with glial reactivation because glial reactivation is the very first step of glial injury responses, whereas gliogensis is the very last glial response to the injury.

      We agree with the reviewer that many genes typically described as “reactivation markers” (e.g., ascl1a, lin28a, sox2, mycb, mych) are also essential regulators of cell-cycle re-entry (Gorsuch et al., 2017; Hamon et al., 2019; Lee et al., 2024; Lourenço et al., 2021; Pollak et al., 2013; Thomas et al., 2016). Because the glial reactivation is a leading event for glial proliferation, the regulators of glial reactivation are expected to be responsible for glial proliferation as well.

      In our study, we focused on the states preceding glial proliferation to understand the mechanism underlying injury-induced glial cell-cycle re-entry. We defined these transitional states and the subsequent proliferative MG states based on single-cell RNA-seq trajectory analysis. (revised lines: 41-58)

      (2) A major weakness of the approach is testing cone ablation and regeneration in early larval animals. For example, cones are ablated starting the day that they are born. MG that are responding are also very young, less than 48 hrs old. It is also unclear whether the immune response of microglia is a mature response. All of these assays would be of higher significance if they were performed in the context of a mature, fully differentiated, adult retina. All analysis in the paper is negatively affected by this biological variable.

      We thank the reviewer for raising this important point regarding the developmental stage of the retina in our model system. We have carefully considered this concern and now provide additional clarification and justification, as follows:

      (1) The glial responses in larval and adult retina:

      Previous studies have demonstrated that injury-induced glial responses are largely conserved in larval and adult zebrafish retina, including reactive gliosis marked by gfap upregulation and proliferation(Meyers et al., 2012; Sarich et al., 2025). In our study, G/R cones were ablated beginning at 5 dpf using metronidazole (MTZ), and we observed robust induction of PCNA⁺ MG in the inner nuclear layer, consistent with injury-induced proliferation (Figure 1E). These findings align with previous studies showing that key features of MG regenerative responses are conserved across larval and adult stages.

      (2) The microglial responses in larval and adult retina:

      Retinal microglia functionally mature at 5 dpf in the zebrafish retina (Mazzolini et al., 2020; Svahn et al., 2013), and prior studies have demonstrated that microglia in larval and adult zebrafish exhibit similar responses to injury, including migration, morphological activation, and phagocytosis(Nagashima and Hitchcock, 2021; White et al., 2017). In our experiments using Tg(mpeg1: GFP) larvae, we observed clear microglial recruitment to the outer nuclear layer (ONL) following cone ablation (Figure 1E and Figure 1-figure supplement 1A), supporting the functional competence of larval microglia in injury-induced immune responses

      (3) The contribution using larval animals to study the regeneration program:

      We agree that regeneration studies in the adult retina can provide important biological insights, particularly in a fully differentiated tissue environment. Accordingly, we have acknowledged this limitation in our revised manuscript “limitations of this study” section (revised lines 534-540: “1. Our study focuses on larval zebrafish, in which the core features of MG and immune responses are conserved compared to the adult. However, we acknowledge that the adult retina—with its fully matured differentiated retina and immune response—provides irreplaceable biological insight. Nevertheless, larval models offer a powerful platform to uncover conserved regenerative mechanisms and serve as a valuable complement for identifying age-dependent differences in MG-mediated regeneration.”) and have stated our intention to extend future analyses to adult zebrafish, especially to explore age-dependent differences in redox signaling and MG proliferation. At the same time, we believe that the larval model offers unique advantages for uncovering fundamental, conserved mechanisms of regeneration and enables characterization of age-dependent regulatory differences. Thus, our study in larval animals serves as a complementary and informative platform for understanding both the conserved and developmental stage-specific features of injury-induced regeneration.

      (4) Related to the above point, the clonal analysis of cxcl18b+ MG is complicated by the fact that new MG are still being born in the CMZ (as are new cones for that matter).

      We thank the reviewer for raising this important point regarding potential contributions from CMZ-derived progenitors to the lineage-traced cxcl18b⁺ MG clones. To address this concern, we have implemented evidence to rule out a CMZ origin for the clones analyzed:

      Spatial restriction of clones: All clones included in our analysis were located exclusively within the central and dorsal retina, as shown in Figure 2H. From the spatial distribution of reactive MG populations across the retina, we observed a patterned organization in which the vast majority of proliferating MG arose from local mature MG–derived progenitors, rather than from peripheral CMZ-derived progenitors. However, we acknowledge that we cannot entirely exclude the possibility that CMZ-derived progenitors contribute to injury-induced MG proliferation, particularly in the peripheral retina.

      We have clarified this point in the revised Methods section (revised lines 756–762: “Clone analysis of cxcl18b<sup>+</sup> lineage-traced MG was restricted to cells located in the central and dorsal region of the zebrafish retina after G/R cone ablation in Figure 2, Figure 6, and their figure supplement. This spatial restriction strongly suggests that the proliferative MG originate from local mature MG, although we cannot completely rule out the possibility that CMZ-derived progenitors contribute to the generation of proliferative MG in the peripheral retina.”) and updated the corresponding figure legends.

      (4) A near identical study was already done by Hoang et al., 2020, in adult zebrafish, a more relevant biological timepoint. Did the authors check this published RNA-seq database for their gene(s) of interest?

      We thank the reviewer for pointing out the relevance of the study by Hoang et al., 2020, which characterized the transcriptional dynamics of MG reactivation in the adult zebrafish retina. We agree that comparisons with their single-cell RNA-seq dataset are important to confirm the conservation of our findings in larval vs adult zebrafish.

      To this end, we examined the adult zebrafish MG dataset reported by Hoang et al., and confirmed that cxcl18b is also present and enriched in their analysis, particularly in activated MG populations under various injury paradigms:

      (1) cxcl18b is listed as a differentially expressed gene (DEG) in Supplementary Table ST2, enriched in GFP⁺ MG following injury. It is also significantly upregulated in both NMDA-induced and light damage conditions, as shown in Supplementary Table ST3.

      (2) In Supplementary Table ST5, cxcl18b is identified as a classifier of activated MG, with classification power scores of 0.552 (NMDA), 0.632 (light damage), and 0.574 (TNFα + γ-secretase inhibitor treatment), indicating its consistent expression across multiple injury models.

      (3). In their pseudotime analysis (Figure 4C and Supplementary Table ST8), cxcl18b is specifically expressed in Module 5, which is expressed earlier along the trajectory than ascl1a. This temporal pattern of cxcl18b preceding ascl1a expression is consistent with our trajectory analysis in larval MG (Figure 1H), further supporting its role as an early marker of the transitional state before proliferation.

      These findings underscore the robustness and biological relevance of cxcl18b as a conserved marker of injury-responsive MG in both larval and adult zebrafish. Our data expand upon the prior work by specifically characterizing a cxcl18b-defined transitional MG state preceding cell-cycle re-entry, thereby offering additional insights into the temporal staging of MG activation during regeneration.

      (5) KD of cxcl18b did not affect MG proliferation or any other defined outcome. And yet the authors continually state such phrases as "microglia-mediated inflammation is critical for activating the cxcl18b-defined transitional states that drive MG proliferation." This is false. Cxcl18b does not drive MG proliferation at all.

      We thank the reviewer for raising this concern. We agree with the reviewer and have revised this statement as "These findings suggest that microglia-mediated inflammation may contribute to the activation of cxcl18b-defined transitional states that precede MG proliferation, although a causal relationship remains to be established." (revised lines 251-253).

      (6) A technical concern is that intravitreal injections are not routinely performed in larval fish.

      We appreciate the reviewer’s technical concern regarding the use of intravitreal injections in larval zebrafish. In our study, we performed intraocular injection according to previously established methods (Alvarez et al., 2009; Giannaccini et al., 2018; Rosa et al., 2023). This approach involves carefully delivering a small volume of viral suspension into the intraocular space by a glass micropipette. To address this concern, we will revise the Materials and Methods section to clearly describe the injection procedure and will cite the relevant references accordingly.

      Reviewer #2:

      (1) The authors note a peak of PCNA+ Muller glia at 72 hours post injury. This is somewhat surprising as the MG would be expected to generate progenitor cells that would continue proliferating and stain with PCNA. Indeed, only a handful of PCNA+ cells are seen in the INL/ONL layer in Figure 1E2 with few clusters of progenitors present. It would be helpful to stain with a Muller glia marker to confirm these PCNA+ cells are Muller glia. It's also curious that almost all the PCNA+ cells are in the dorsal retina, even though G/R cone loss extends across both dorsal and ventral retina. Is this typical for cone ablation models in larval zebrafish?

      We thank the reviewer for their insightful comment regarding the spatial distribution and identity of PCNA⁺ cells following injury.

      In our study, we observed that the injury-induced proliferating cells (PCNA⁺) were predominantly located in the central and dorsal regions of the retina at 72 hours post-injury (hpi) (Figure 1E). To verify the identity of these proliferating cells, we performed additional immunostaining using BLBP, and confirmed that the majority of PCNA⁺ cells also express BLBP (Figure 1–figure supplement 1B in our revised Data), these results supporting their MG origin.

      The regional bias of MG proliferation towards the central and dorsal retina is consistent with previous findings. Notably, (Krylov et al., 2023) demonstrated that MG exhibit region-specific heterogeneity in their regenerative responses to photoreceptor ablation. Their study identified proliferative MG subpopulations predominantly in the central (fgf24-expressing) and dorsal (efnb2a-expressing) domains, whereas ventral MG showed limited proliferative capacity (Krylov et al., 2023). These observations provide a plausible explanation for the spatially restricted PCNA⁺ MG population observed in our model following cone ablation.

      (2) In Line 148: What is meant by "most original MG states" in this context? Original meaning novel? Or original meaning the earliest state MG adopted following injury? The language here is confusing.

      We thank the reviewer for pointing out the ambiguous phrasing in our original manuscript. The term “most original MG states” was imprecise and misleading, as it could be interpreted as referring to the quiescent state of MG. In our context, we intended to describe the earliest transitional states in MG respond to injury, as they begin to exit quiescence and enter reactive characteristics. These early transitional MG populations co-express quiescent markers such as cx43 and early reactive markers gfap, as shown in Figure 1H.

      To avoid confusion and improve conceptual clarity, we have revised the manuscript by replacing “most original MG states” with “early transitional MG state” (revised line 154) and have added a clearer explanation in the corresponding Results section to define this population more accurately.

      (3) Perhaps provide a better image in Figure 2A of the cxcl18b at 48 hpi and 72 hpi. The current images appear virtually identical, with very little cxcl18b expression observed, especially compared to the 24 hpi. This is in contrast to the Tg(cxcl18b:GFP) transgenic line shown in Figure 2D, which indicates either much higher expression in proliferating cells at 48 hpi or the stability of GFP protein. Can the authors provide guidance on the accurate temporal expression of cxcl18b? Does expression peak rapidly at 24 hpi and then rapidly decline or is there persistence of expression to 48-72 hpi?

      We appreciate the reviewer’s careful observation regarding the apparent similarity of cxcl18b expression at 48 hpi and 72 hpi in the in situ hybridization (ISH) images (Figure 2A), and the differences compared to the Tg(cxcl18b: GFP) reporter line shown in Figure 2D.

      (1) The similarity of ISH images at the 48 hpi and 72 hpi (Figure 2A):

      The cxcl18b mRNA signal peaked at 24 hpi, suggesting a rapid transcriptional response after retina injury. By 48 hpi, cxcl18b expression had already declined substantially, and by 72 hpi, the signal was further reduced to near-background levels. This temporal expression pattern explains why the ISH images at 48 hpi and 72 hpi appear nearly identical and much weaker compared to 24 hpi.

      (2) The discrepancy between ISH and GFP reporter signal (Figure 2D):

      The Tg(cxcl18b: GFP) reporter line shows persistent GFP expression beyond the transcriptional window of cxcl18b mRNA. This may be due to the prolonged stay of GFP protein, which remains detectable even after the endogenous transcription of cxcl18b has diminished. This explanation is also noted in the manuscript (revised lines 198–200). As a result, GFP⁺ MG cells are still visible at 48–72 hpi, and some of them co-label with PCNA.

      These findings are consistent with our Pseudotime analysis based on scRNA-seq data (Figure 1H), which shows that cxcl18b expression precedes the induction of proliferative markers such as pcna and ascl1a.

      (4) Line 198: The establishment of the Tg(cxcl18b:Cre-vhmc:mcherry::ef1a:loxP-dsRed-loxP-EGFP::lws2:nfsb-mCherry) is considerable but the nomenclature doesn't properly fit. Is the mCherry fused with Cre and driven by the cxcl18b promoter? What is the vhmc component? Finally, while this may provide the ability to clonally track cxcl18b-expressing MG, it does not address the prior question of what is the actual temporal expression of cxcl18b? If anything, this only addresses whether proliferating MG expressed cxcl18b at some point in their history, but does not indicate that cxcl18b expression co-exists in proliferating cells. The most convincing evidence is in Supplemental Figure 2B.

      The "vmhc" component refers to the ventricular myosin heavy chain promoter, commonly used to label atrial cardiomyocytes (Jin et al., 2009). We cloned the vmhc upstream region containing its promoter and fusing with mCherry for selection during transgenic fish line construction.

      Clone analysis using the Tg(cxcl18b: Cre-vmhc: mCherry::ef1a: loxP-DsRed-loxP-EGFP::lws2: nfsb-mCherry) further indicates that cxcl18b-defined the transitional state is the essential routing for MG proliferation. We have clarified in the revised text that this lineage tracing indicates a “history of injury-induced cxcl18b expression” rather than its ongoing expression during proliferation (revised line 205).

      (5) Line 203: The data shown in Figure 2F do not indicate that these MG are cxcl18b+. Rather, the data are consistent with the interpretation that these MG expressed Cre at some prior stage and now express GFP from the ef1a promoter rather than DsRed. Whether these MG continue to express cxcl18b at the time these fish were collected is not addressed by these data. It is not accurate to conclude that these cells are cxcl18b+.

      We thank the reviewer for pointing out this important issue. We agreed that the GFP<sup>+</sup> MG shown in Figure 2F represents cells that have previously expressed cxcl18b and thus belong to the cxcl18b-expressing cell lineage, but this does not indicate that they continue to express cxcl18b at the time of sample collection. Performing clonal analysis using the Cre-loxp system, the GFP signal reflects historical cxcl18b promoter activity rather than ongoing transcription. We have revised the relevant sentence in our manuscript to clarify this point and now refer to these GFP<sup>+</sup> cells as "cxcl18b lineage-traced MG" rather than "cxcl18b<sup>+</sup> MG" to avoid any misinterpretation (revised line 207).

      (6) Line 213: The statement that proliferative MG mostly originated from cxcl18b+ MG transitional states is a conclusion that does appear fully supported by the data. Whether those MG continue to express cxcl18b remains unanswered by the data in Figure 2 and would likely be inconsistent with the single-cell data in Figure 1.

      We thank the reviewer for this valuable comment. We agree that the original statement on Line 213 regarding the lineage relationship between cxcl18b⁺ transitional MG and proliferative MG required clarification.

      (1) The cxcl18b expression dynamics:

      Our single-cell RNA-seq and ISH analyses consistently show that cxcl18b expression peaks as early as 24 hpi and declines rapidly, with significantly reduced expression by 48 and 72 hpi. These findings suggest that cxcl18b marks an early transitional MG state, rather than being maintained in proliferative MG. Indeed, in our scRNA-seq pseudotime trajectory analysis (Figure 1H), cxcl18b expression is highest in early transitional MG clusters (Clusters 1) and downregulated as cells progress toward proliferative states (Clusters 3/6), supporting a model in which cxcl18b is downregulated before cell-cycle re-entry.

      (2) Prolonged stability of GFP protein:

      The GFP signal observed in Tg(cxcl18b: GFP) retinas at 72 hpi may be because of the prolonged stability of GFP protein, rather than sustained cxcl18b transcription. The actual expression dynamics of cxcl18b are more directly reflected by our in situ hybridization and single-cell RNA-seq data, both showing a rapid decline after its early peak at 24 hpi. This explanation is also noted in the manuscript (revised lines 196–197).

      (7) Line 246: The use of Dexamethasone to block inflammation is a widely used approach. However, dexamethasone is a broad-spectrum anti-inflammatory molecule that works through glucocorticoid signaling that may involve more than microglia. The observation that microglia recruitment and cxcl18a expression are both reduced is correlative but does not prove causation. Thus, the data are not sufficient to conclude that microglia-mediated inflammation is critical for activating cxcl18b expression. Indeed, data in Figure 1 indicate that cxcl18b expression occurs prior to microglia migration to the ONL.

      We thank the reviewer for this thoughtful and important comment. We fully acknowledge that dexamethasone is a broad-spectrum anti-inflammatory agent that acts via glucocorticoid receptor signaling and may influence multiple immune and non-immune pathways beyond microglia.

      In our study, dexamethasone treatment led to a reduction in both microglial recruitment and the number of cxcl18b<sup>+</sup> MG at 72 hpi, suggesting a potential association between inflammation and cxcl18b activation. However, we agree that this observation remains correlative and is not sufficient to establish a direct link between microglia activity and cxcl18b induction. Our time-course analysis indicates that cxcl18b expression peaks at 24 hpi, preceding robust microglial accumulation in the ONL, further highlighting the need to clarify the temporal dynamics and cellular sources of inflammatory cues.

      To address this question more conclusively, selective ablation of microglia during cone injury would be necessary. However, implementing such an approach would require a complex intersection of three transgenic lines—Tg(mpeg1: nfsB-mCherry) for microglia ablation, Tg(lws2: nfsB-mCherry) for cone ablation, and Tg(cxcl18b: GFP) for reporting—posing substantial genetic and experimental challenges.

      We have revised the Results section accordingly to state: “These findings suggest that microglia-mediated inflammation may contribute to the activation of cxcl18b-defined transitional states that precede MG proliferation, although a causal relationship remains to be established.” (revised lines 251–253). We also added a new paragraph in the “Result: Clonal analysis reveals injury-induced MG proliferation via cxcl18b-defined transitional states associated with inflammation” as “While dexamethasone suppressed both microglial recruitment and cxcl18b<sup>+</sup> MG generation, its broad anti-inflammatory action precludes definitive conclusions about microglial causality. Dissecting this relationship would require concurrent ablation of microglia and cone photoreceptors using a triple-transgenic strategy, which is beyond the scope of the current study. Targeted approaches will be necessary to resolve the specific role of microglia in initiating cxcl18b expression.” (revised lines 251–258) to explicitly acknowledge this limitation and the need for future studies using microglia-specific ablation models to resolve the mechanism.

      (8) Could the authors clarify the basis of investigating NO signaling, given the relative expression of the genes by either cxcl18b+ MG or uninjured MG? Based on the expression illustrated in Supplemental Figure 4A, there is almost no expression of nos1 or nos2b in any MG. The authors are encouraged to revisit the earlier single-cell data sets to identify those cells that express components of NO signaling to determine the source(s) of NO that could be impacting the Muller glia.

      We thank the reviewer for raising these important points.

      Nitric oxide (NO) signaling has been implicated in the regeneration of multiple zebrafish tissues, including the heart (Rochon et al., 2020; Yu et al., 2024), spinal cord (Bradley et al., 2010), and fin (Matrone et al., 2021). Based on these findings, we hypothesized that NO signaling might also contribute to retinal regeneration.

      As described in the manuscript, we compiled a redox-related gene list and systematically screened their roles in injury-induced MG proliferation using CRISPR-Cas9-mediated gene disruption. Among the candidates, disruption of nos genes significantly reduced the number of PCNA<sup>+</sup> MG cells following G/R cone ablation (Figure 4), prompting us to further investigate the role of NO signaling.

      (9) Line 319-320: this sentence appears to be missing text as "while no influenced across the nos mutants and gsnor mutants" does not make sense.

      We appreciate the reviewer’s observation and agree that the original sentence was unclear. We have revised the sentence in the manuscript as follows:

      “In contrast, no significant change in MG proliferation was observed in nos1, nos2a, or gsnor mutants compared to wild type (Figures 4F–4I)” (revised lines 326-328).

      (10) Line 326-328: The text should be rewritten as the current meaning would suggest there was no significant loss of photoreceptors in the nos2b mutants. That is incorrect. Rather, there was no significant difference between WT and the nos2b mutants in the number of photoreceptors lost at 72 hpi following MTZ treatment. Both groups lost photoreceptors, but the number lost in nos2b hets and homozygotes was the same as WT.

      We agree with the suggestion and have revised our manuscript. We have revised the sentence in the manuscript as follows:

      “We observed no significant difference in the loss of cone photoreceptor at 72 hpi between nos2b mutants and WT, indicating that the reduced MG proliferation observed in nos2b mutants is independent of the injury (WT: 45 ± 8 remaining cones, n = 24; nos2b⁺/⁻: 49 ± 12, n = 20; nos2b⁻/⁻: 46 ± 9, n = 20; mean ± SEM) (Figure 4K).” (revised lines 331-335).

      (11) There is concern over the inconsistencies with some of the data. In Figure 4, Supplement 1A, the single-cell data found virtually no expression of nos2b in either uninjured MG or cxcl18b+ MG. In contrast, the authors find nos2b expression by RT-PCR in the cxcl18b:GFP+ MG. The in situ expression of nos2b in Figure 5 - Supplement 1 is not persuasive. The red puncta are seen in a single cxcl18b:GFP+ cell but also in the plexiform layer and is other non cxcl18b:GFP+ cells.

      We appreciate the concern regarding the apparent inconsistencies in nos2b expression across different datasets. We provide the following explanations:

      (1) Low expression of nos2b in scRNA-seq data:

      We propose a potential explanation: Nitric oxide (NO) signaling is known to exert its biological functions in a dose-dependent manner and is tightly regulated post-transcriptionally, especially in inducible nitric oxide synthase (iNOS) (Bogdan, 2001; Nathan and Xie, 1994; Thomas et al., 2008). Thus, even modest changes in nos2b expression may exert meaningful biological effects without producing strong transcriptional signals detectable by scRNA-seq, which could fall below the detection threshold of scRNA-seq methods. Supporting this idea, our functional assay (Figure 4J) reveals a clear concentration-dependent effect of NO on MG proliferation, consistent with the biological relevance of Nos2b activity despite its low transcript abundance.

      (2) Regarding the in situ hybridization data:

      We used both commercially available in situ hybridization probes from (HCR<sup>TM</sup>) and RNAscope<sup>TM</sup> (data not shown) to detect nos2b transcripts. While the nos2b signal was observed in other retinal cell types, including cells in the plexiform layer, our primary study was focused on examining its expression within the cxcl18b<sup>+</sup> MG lineage.

      (3) Regarding RT-PCR detection of nos2b in cxcl18b: GFP<sup>+</sup> MG:

      To enhance detection sensitivity, we enriched cxcl18b: GFP<sup>+</sup> MG by FACS at 72 hpi and performed cDNA amplification before RT-PCR. This approach allowed the detection of low-abundance transcripts such as nos2b. It is also important to note that RT-PCR reflects fold changes in expression compared to MG to other retina cell type. The subtle but biologically upregulated of nos2b expression may not be readily captured by in situ hybridization or scRNA-seq.

      (12) Line 356 - there is a disagreement over the interpretation of the current data. The statement that nos2b was specifically expressed in cxcl18b+ transitional MG states is not entirely accurate. This conclusion is based on expression of GFP from a cxcl18b promoter, which may reflect persistence of the GFP protein and not evidence of cxcl18b expression. Even assuming that the nos2b in situ hybridization and RT-PCR data are correct, the data would indicate that nos2b is expressed in proliferating MG that are derived from the cxcl18b+ transitional states. The single-cell trajectory analysis in Figure 2 indicates that cxcl18b is not co-expressed with PCNA. Furthermore, the single-cell data in Figure 4, Supplement 1, indicates no expression of nos2b in cxcl18b+ MG. The authors need to reconcile these seemingly contradictory pieces of data.

      We thank the reviewer for this thoughtful and important comment. We agree that clarification is needed to accurately interpret the relationship between cxcl18b, nos2b, and MG proliferation, particularly considering the different temporal and technical contexts of our datasets.

      (1) Lineage labeling and interpretation of GFP expression:

      We acknowledge that in the Tg(cxcl18b: Cre-vhmc: mcherry::ef1a: loxP-dsRed-loxP-EGFP::lws2: nfsb-mCherry) line, GFP expression reflects historical activity of the cxcl18b promoter, rather than ongoing transcription. This GFP signal, due to its prolonged stay, may persist beyond the time window of endogenous cxcl18b expression. Accordingly, we have revised the manuscript to replace “cxcl18b⁺ MG” with “cxcl18b⁺ lineage-traced MG” throughout the relevant sections to prevent potential misinterpretation.

      (2) Functional experiments support a lineage relationship between cxcl18b⁺ states and nos2b activity:

      To further investigate the regulatory relationship between cxcl18b and nos2b, we conducted NO scavenging experiments using C-PTIO in the Tg(cxcl18b: GFP) background. We observed that the generation of cxcl18b: GFP⁺ MG after injury was not affected by NO depletion, indicating that cxcl18b activation precedes NO signaling (data not shown). However, PCNA⁺ MG was significantly reduced under the same treatment, suggesting that NO signaling is not required for cxcl18b⁺ transitional state formation, but is necessary for proliferation. Together with our MG-specific nos2b knockout data, these results support a model in which nos2b-derived NO acts downstream of the cxcl18b⁺ transitional state to promote MG cell-cycle re-entry.

      (3) The scRNA-seq data with nos2b expression:

      We agree with the reviewer that our scRNA-seq dataset shows minimal overlap between cxcl18b and pcna expression, which is consistent with our interpretation that cxcl18b expression marks a transitional phase before cell-cycle entry. Furthermore, nos2b transcripts were not robustly detected in cxcl18b⁺ MG clusters in our scRNA dataset. This discrepancy may be caused by technical limitations of scRNA-seq in capturing low-abundance or transient transcripts such as nos2b, as discussed in response to comment #11.

      (13) The data in Figure 7 are interesting and suggest a link between NO signaling and notch activity. The use of the C-PTIO NO scavenger is not specific to MG, which limits the conclusions related to autocrine NO signaling in cxcl18b+ MG.

      We acknowledge that the use of C-PTIO cannot distinguish between NO signaling within MG and paracrine effects from other retinal cells. Currently, technical limitations prevent MG-specific NO depletion. We have discussed this limitation accordingly in our revised “Limitations of this study” section (revised lines 540-545: “2. While our data suggest that injury-induced NO suppresses Notch signaling activation and promotes MG proliferation, the use of a general NO scavenger (C-PTIO) does not allow us to determine whether this regulation occurs in an autocrine or paracrine manner. The specific role of NO signaling within cxcl18b⁺ MG requires further validation using MG-specific NO depletion.”)

      (14) Line 446-448. As mentioned before, the data do not support a causative link between microglia recruitment and cxcl18b induction. More specifically, dexamethasone is a broad-spectrum anti-inflammatory drug that blocks microglia activation and recruitment. Critically, the authors demonstrate that expression of cxcl18b occurs prior to microglia recruitment (see Figure 1, Supplement 1). Thus, the statement that cxcl18b induction depends on microglia recruitment is not accurate.

      We thank the reviewer for reiterating this important point. We fully agree that the current data do not support a direct causal relationship between microglia recruitment and cxcl18b induction. As also addressed in our response to Comment 7, dexamethasone, as a broad-spectrum anti-inflammatory agent, cannot distinguish microglia-specific effects from those of other immune components. We have revised the text in revised lines 251–258 to clarify that microglia-mediated inflammation is associated with—but not required for—activation of cxcl18b-defined transitional MG states.

      Reference:

      Bogdan, C. (2001). Nitric oxide and the immune response. Nature immunology 2, 907-916.

      Bradley, S., Tossell, K., Lockley, R., and McDearmid, J.R. (2010). Nitric oxide synthase regulates morphogenesis of zebrafish spinal cord motoneurons. The Journal of neuroscience : the official journal of the Society for Neuroscience 30, 16818-16831.

      Gorsuch, R.A., Lahne, M., Yarka, C.E., Petravick, M.E., Li, J., and Hyde, D.R. (2017). Sox2 regulates Müller glia reprogramming and proliferation in the regenerating zebrafish retina via Lin28 and Ascl1a. Experimental eye research 161, 174-192.

      Hamon, A., García-García, D., Ail, D., Bitard, J., Chesneau, A., Dalkara, D., Locker, M., Roger, J.E., and Perron, M. (2019). Linking YAP to Müller Glia Quiescence Exit in the Degenerative Retina. Cell reports 27, 1712-1725.e1716.

      Iribarne, M., and Hyde, D.R. (2022). Different inflammation responses modulate Müller glia proliferation in the acute or chronically damaged zebrafish retina. Frontiers in cell and developmental biology 10, 892271.

      Jin, D., Ni, T.T., Hou, J., Rellinger, E., and Zhong, T.P. (2009). Promoter analysis of ventricular myosin heavy chain (vmhc) in zebrafish embryos. Developmental dynamics : an official publication of the American Association of Anatomists 238, 1760-1767.

      Krylov, A., Yu, S., Veen, K., Newton, A., Ye, A., Qin, H., He, J., and Jusuf, P.R. (2023). Heterogeneity in quiescent Müller glia in the uninjured zebrafish retina drive differential responses following photoreceptor ablation. Frontiers in molecular neuroscience 16, 1087136.

      Lahne, M., Nagashima, M., Hyde, D.R., and Hitchcock, P.F. (2020). Reprogramming Müller Glia to Regenerate Retinal Neurons. Annual review of vision science 6, 171-193.

      Lee, M.S., Jui, J., Sahu, A., and Goldman, D. (2024). Mycb and Mych stimulate Müller glial cell reprogramming and proliferation in the uninjured and injured zebrafish retina. Development (Cambridge, England) 151.

      Lourenço, R., Brandão, A.S., Borbinha, J., Gorgulho, R., and Jacinto, A. (2021). Yap Regulates Müller Glia Reprogramming in Damaged Zebrafish Retinas. Frontiers in cell and developmental biology 9, 667796.

      Matrone, G., Jung, S.Y., Choi, J.M., Jain, A., Leung, H.E., Rajapakshe, K., Coarfa, C., Rodor, J., Denvir, M.A., Baker, A.H., et al. (2021). Nuclear S-nitrosylation impacts tissue regeneration in zebrafish. Nat Commun 12, 6282.

      Mazzolini, J., Le Clerc, S., Morisse, G., Coulonges, C., Kuil, L.E., van Ham, T.J., Zagury, J.F., and Sieger, D. (2020). Gene expression profiling reveals a conserved microglia signature in larval zebrafish. Glia 68, 298-315.

      Meyers, J.R., Hu, L., Moses, A., Kaboli, K., Papandrea, A., and Raymond, P.A. (2012). β-catenin/Wnt signaling controls progenitor fate in the developing and regenerating zebrafish retina. Neural development 7, 30.

      Nagashima, M., and Hitchcock, P.F. (2021). Inflammation Regulates the Multi-Step Process of Retinal Regeneration in Zebrafish. Cells 10.

      Nathan, C., and Xie, Q.W. (1994). Nitric oxide synthases: roles, tolls, and controls. Cell 78, 915-918.

      Pollak, J., Wilken, M.S., Ueki, Y., Cox, K.E., Sullivan, J.M., Taylor, R.J., Levine, E.M., and Reh, T.A. (2013). ASCL1 reprograms mouse Muller glia into neurogenic retinal progenitors. Development (Cambridge, England) 140, 2619-2631.

      Rochon, E.R., Missinato, M.A., Xue, J., Tejero, J., Tsang, M., Gladwin, M.T., and Corti, P. (2020). Nitrite Improves Heart Regeneration in Zebrafish. Antioxidants & redox signaling 32, 363-377.

      Sarich, S.C., Sreevidya, V.S., Udvadia, A.J., Svoboda, K.R., and Gutzman, J.H. (2025). The transcription factor Jun is necessary for optic nerve regeneration in larval zebrafish. PloS one 20, e0313534.

      Sifuentes, C.J., Kim, J.W., Swaroop, A., and Raymond, P.A. (2016). Rapid, Dynamic Activation of Müller Glial Stem Cell Responses in Zebrafish. Investigative ophthalmology & visual science 57, 5148-5160.

      Svahn, A.J., Graeber, M.B., Ellett, F., Lieschke, G.J., Rinkwitz, S., Bennett, M.R., and Becker, T.S. (2013). Development of ramified microglia from early macrophages in the zebrafish optic tectum. Developmental neurobiology 73, 60-71.

      Thomas, D.D., Ridnour, L.A., Isenberg, J.S., Flores-Santana, W., Switzer, C.H., Donzelli, S., Hussain, P., Vecoli, C., Paolocci, N., Ambs, S., et al. (2008). The chemical biology of nitric oxide: implications in cellular signaling. Free radical biology & medicine 45, 18-31.

      Thomas, J.L., Ranski, A.H., Morgan, G.W., and Thummel, R. (2016). Reactive gliosis in the adult zebrafish retina. Experimental eye research 143, 98-109.

      Wan, J., and Goldman, D. (2016). Retina regeneration in zebrafish. Current opinion in genetics & development 40, 41-47.

      White, D.T., Sengupta, S., Saxena, M.T., Xu, Q., Hanes, J., Ding, D., Ji, H., and Mumm, J.S. (2017). Immunomodulation-accelerated neuronal regeneration following selective rod photoreceptor cell ablation in the zebrafish retina. Proceedings of the National Academy of Sciences of the United States of America 114, E3719-e3728.

      Yao, K., Qiu, S., Tian, L., Snider, W.D., Flannery, J.G., Schaffer, D.V., and Chen, B. (2016). Wnt Regulates Proliferation and Neurogenic Potential of Müller Glial Cells via a Lin28/let-7 miRNA-Dependent Pathway in Adult Mammalian Retinas. Cell reports 17, 165-178.

      Yin, Z., Kang, J., Xu, H., Huo, S., and Xu, H. (2024). Recent progress of principal techniques used in the study of Müller glia reprogramming in mice. Cell regeneration (London, England) 13, 30.

      Yu, C., Li, X., Ma, J., Liang, S., Zhao, Y., Li, Q., and Zhang, R. (2024). Spatiotemporal modulation of nitric oxide and Notch signaling by hemodynamic-responsive Trpv4 is essential for ventricle regeneration. Cellular and molecular life sciences : CMLS 81, 60.

    1. Reviewer #1 (Public review):

      Summary:

      Lai and Doe address the integration of spatial information with temporal patterning and genes that specify cell fate. They identify the Forkhead transcription factor Fd4 as a lineage-restricted cell fate regulator that bridges transient spatial transcription factors to terminal selector genes in the developing Drosophila ventral nerve cord. The experimental evidence convincingly demonstrates that Fd4 is both necessary for late-born NB7-1 neurons, but also sufficient to transform other neural stem cell lineages toward the NB7-1 identity. This work addresses an important question that will be of interest to developmental neurobiologists: How can cell identities defined by initial transient developmental cues be maintained in the progeny cells, even if the molecular mechanism remains to be investigated? In addition, the study proposes a broader concept of lineage identity genes that could be utilized in other lineages and regions in the Drosophila nervous system and in other species.

      Strengths:

      While the spatial factors patterning the neuroepithelium to define the neuroblast lineages in the Drosophila ventral nerve cord are known, these factors are sometimes absent or not required during neurogenesis. In the current work, Lai and Doe identified Fd4 in the NB7-1 lineage that bridges this gap and explains how NB7-1 neurons are specified after Engrailed (En) and Vnd cease their expression. They show that Fd4 is transiently co-expressed with En and Vnd and is present in all nascent NB7-1 progenies. They further demonstrate that Fd4 is required for later-born NB7-1 progenies and sufficient for the induction of NB7-1 markers (Eve and Dbx) while repressing markers of other lineages when force-expressed in neural progenitors, e.g., in the NB5-6 lineage and in the NB7-3 lineage. They also demonstrate that, when Fd4 is ectopically expressed in NB7-3 and NB5-6 lineages, this leads to the ectopic generation of dorsal muscle-innervating neurons. The inclusion of functional validation using axon projections demonstrates that the transformed neurons acquire appropriate NB7-1 characteristics beyond just molecular markers. Quantitative analyses are thorough and well-presented for all experiments.

      Weaknesses:

      (1) While Fd4 is required and sufficient for several later-born NB7-1 progeny features, a comparison between early-born (Hb/Eve) and later-born (Run/Eve) appears missing for pan-progenitor gain of Fd4 (with sca-Gal4; Figure 4) and for the NB7-3 lineage (Figure 6). Having a quantification for both could make it clearer whether Fd4 preferentially induces later-born neurons or is sufficient for NB7-1 features without temporal restriction.

      (2) Fd4 and Fd5 are shown to be partially redundant, as Fd4 loss of function alone does not alter the number of Eve+ and Dbx+ neurons. This information is critical and should be included in Figure 3.

      (3) Several observations suggest that lineage identity maintenance involves both Fd4-dependent and Fd4-independent mechanisms. In particular, the fact that fd4-Gal4 reporter remains active in fd4/fd5 mutants even after Vnd and En disappear indicates that Fd4's own expression, a key feature of NB7-1 identity, is maintained independently of Fd4 protein. This raises questions about what proportion of lineage identity features require Fd4 versus other maintenance mechanisms, which deserves discussion.

      (4) Similarly, while gain of Fd4 induces NB7-1 lineage markers and dorsal muscle innervation in NB5-6 and NB7-3 lineages, drivers for the two lineages remain active despite the loss of molecular markers, indicating some regulatory elements retain activity consistent with their original lineage identity. It is therefore important to understand the degree of functional conversion in the gain-of-function experiments. Sparse labeling of Fd4 overexpressing NB5-6 and NB7-3 progenies, as was done in Seroka and Doe (2019), would be an option.

      (5) The less-penetrant induction of Dbx+ neurons in NB5-6 with Fd4-overexpression is interesting. It might be worth the authors discussing whether it is an Fd4 feature or an NB5-6 feature by examining Dbx+ neuron number in NB7-3 with Fd4-overexpression.

      (6) It is logical to hypothesize that spatial factors specify early-born neurons directly, so only late-born neurons require Fd4, but it was not tested. The model would be strengthened by examining whether Fd4-Gal4-driven Vnd rescues the generation of later-born neurons in fd4/fd5 mutants.

      (7) It is mentioned that Fd5 is not sufficient for the NB7-1 lineage identity. The observation is intriguing in how similar regulators serve distinct roles, but the data are not shown. The analysis in Figure 4 should be performed for Fd5 as supplemental information.

    2. Author response:

      Reviewer #1 (Public Review):

      Lai and Doe address the integration of spatial information with temporal patterning and genes that specify cell fate. They identify the Forkhead transcription factor Fd4 as a lineage-restricted cell fate regulator that bridges transient spatial transcription factors to terminal selector genes in the developing Drosophila ventral nerve cord. The experimental evidence convincingly demonstrates that Fd4 is both necessary for lateborn NB7-1 neurons, but also sufficient to transform other neural stem cell lineages toward the NB7-1 identity. This work addresses an important question that will be of interest to developmental neurobiologists: How can cell identities defined by initial transient developmental cues be maintained in the progeny cells, even if the molecular mechanism remains to be investigated? In addition, the study proposes a broader concept of lineage identity genes that could be utilized in other lineages and regions in the Drosophila nervous system and in other species. 

      Thanks for the accurate summary and positive comments!

      While the spatial factors patterning the neuroepithelium to define the neuroblast lineages in the Drosophila ventral nerve cord are known, these factors are sometimes absent or not required during neurogenesis. In the current work, Lai and Doe identified Fd4 in the NB7-1 lineage that bridges this gap and explains how NB7-1 neurons are specified after Engrailed (En) and Vnd cease their expression. They show that Fd4 is transiently co-expressed with En and Vnd and is present in all nascent NB7-1 progenies. They further demonstrate that Fd4 is required for later-born NB7-1 progenies and sufficient for the induction of NB7-1 markers (Eve and Dbx) while repressing markers of other lineages when force-expressed in neural progenitors, e.g., in the NB56 lineage and in the NB7-3 lineage. They also demonstrate that, when Fd4 is ectopically expressed in NB7-3 and NB5-6 lineages, this leads to the ectopic generation of dorsal muscle-innervating neurons. The inclusion of functional validation using axon projections demonstrates that the transformed neurons acquire appropriate NB7-1 characteristics beyond just molecular markers. Quantitative analyses are thorough and well-presented for all experiments.

      Thanks for the positive comments!

      (1) While Fd4 is required and sufficient for several later-born NB7-1 progeny features, a comparison between early-born (Hb/Eve) and later-born (Run/Eve) appears missing for pan-progenitor gain of Fd4 (with sca-Gal4; Figure 4) and for the NB7-3 lineage (Figure 6). Having a quantification for both could make it clearer whether Fd4 preferentially induces later-born neurons or is sufficient for NB7-1 features without temporal restriction.

      We quantified the percentage of Hb+ and Runt+ cells among Eve+ cells with sca-gal4, and the results are shown in Figure 4-figure supplement 1. We found that the proportion of early-born cells is slightly reduced but the proportion of later-born cells remain similar. Interestingly, we also found a subset of Eve+ cells with a mixed fate (Hb+Runt+) but the reason remains unclear.

      (2) Fd4 and Fd5 are shown to be partially redundant, as Fd4 loss of function alone does not alter the number of Eve+ and Dbx+ neurons. This information is critical and should be included in Figure 3.

      Because every hemisegment in an fd4 single mutant is normal, we just added it as the following text: “In fd4 mutants, we observe no change in the number of Eve+ neurons or Dbx+ neurons (n=40 hemisegments).”

      (3) Several observations suggest that lineage identity maintenance involves both Fd4dependent and Fd4-independent mechanisms. In particular, the fact that fd4-Gal4 reporter remains active in fd4/fd5 mutants even after Vnd and En disappear indicates that Fd4's own expression, a key feature of NB7-1 identity, is maintained independently of Fd4 protein. This raises questions about what proportion of lineage identity features require Fd4 versus other maintenance mechanisms, which deserves discussion.

      We agree, thanks for raising this point. We add the following text to the Discussion. “Interestingly, the fd4 fd5 mutant maintains expression of fd4:gal4, suggesting that the fd4/fd5 locus may have established a chromatin state that allows “permanent” expression in the absence of Vnd, En, and Fd4/Fd5 proteins.”

      (4) Similarly, while gain of Fd4 induces NB7-1 lineage markers and dorsal muscle innervation in NB5-6 and NB7-3 lineages, drivers for the two lineages remain active despite the loss of molecular markers, indicating some regulatory elements retain activity consistent with their original lineage identity. It is therefore important to understand the degree of functional conversion in the gain-of-function experiments. Sparse labeling of Fd4 overexpressing NB5-6 and NB7-3 progenies, as was done in Seroka and Doe (2019), would be an option.

      We agree it is interesting that the NB7-3 and NB5-6 drivers remain on following Fd4 misexpression. To explore this, we used sca-gal4 to overexpress Fd4 and observed that Lbe expression persisted while Eg was largely repressed (see Author response image 1 below). The results show that Lbe and Eg respond differently to Fd4. A non-mutually exclusive possibility is that the continued expression of lbe-Gal4 UAS-GFP or eg-Gal4 UAS-GFP may be due to the lengthy perdurance of both Gal4 and GFP.

      Author response image 1.

      (5) The less-penetrant induction of Dbx+ neurons in NB5-6 with Fd4-overexpression is interesting. It might be worth the authors discussing whether it is an Fd4 feature or an NB56 feature by examining Dbx+ neuron number in NB7-3 with Fd4-overexpression.

      In the NB7-3 lineages misexpressing Fd4, only 5 lineages generated Dbx+ cells (0.1±0.4, n=64 hemisegments), suggesting that the low penetrance of Dbx+ induction is an intrinsic feature of Fd4 rather than lineage context. We have added this information in the results section. 

      (6) It is logical to hypothesize that spatial factors specify early-born neurons directly, so only late-born neurons require Fd4, but it was not tested. The model would be strengthened by examining whether Fd4-Gal4-driven Vnd rescues the generation of laterborn neurons in fd4/fd5 mutants.

      When we used en-gal4 driver to express UAS-vnd in the fd4/fd5 mutant background, we found an average 7.4±2.2 Eve+ cells per hemisegment (n=36), significantly higher than fd4/fd5 mutant alone (3.9±0.8 cells, n=52, p=2.6x10<sup.-11</sup>) (Figure 3J). In addition, 0.2±0.5 Eve+ cells were ectopic Hb+ (excluding U1/U2), indicating that Vnd-En integration is sufficient to generate both early-born and late-born Eve+ cells in the fd4/fd5 mutants. We have added the results to the text.

      (7) It is mentioned that Fd5 is not sufficient for the NB7-1 lineage identity. The observation is intriguing in how similar regulators serve distinct roles, but the data are not shown. The analysis in Figure 4 should be performed for Fd5 as supplemental information.

      Thanks for the suggestion. Because the results are exactly the same as the wild type, we don’t think it is necessary to provide an additional images or analysis as supplemental information.

      Reviewer #2 (Public review):

      Via a detailed expression analysis, they find that Fd4 is selectively expressed in embryonic NB7-1 and newly born neurons within this lineage. They also undertake a comprehensive genetic analysis to provide evidence that fd4 is necessary and sufficient for the identity of NB7-1 progeny. 

      Thanks for the accurate summary!

      The analysis is both careful and rigorous, and the findings are of interest to developmental neurobiologists interested in molecular mechanisms underlying the generation of neuronal diversity. Great care was taken to make the figures clear and accessible. This work takes great advantage of years of painstaking descriptive work that has mapped embryonic neuroblast lineages in Drosophila. 

      Thanks for the positive comments!

      The argument that Fd4 is necessary for NB7-1 lineage identity is based on a Fd4/Fd5 double mutant. Loss of fd4 alone did not alter the number of NB7-1-derived Eve+ or Dbx+ neurons. The authors clearly demonstrate redundancy between fd4 and fd5, and the fact that the LOF analysis is based on a double mutant should be better woven through the text.

      The authors generated an Fd5 mutant. I assume that Fd5 single mutants do not display NB7-1 lineage defects, but this is not stated. The focus on Fd4 over Fd5 is based on its highly specific expression profile and the dramatic misexpression phenotypes. But the LOF analysis demonstrates redundancy, and the conclusions in the abstract and through the results should reflect the existence of Fd5 in the conclusions of this manuscript.

      We agree, and have added new text to clarify the single mutant phenotypes (there are none) and the double mutant phenotype (loss of NB7-1 molecular and morphological features. The following text is added to the manuscript: “Not surprisingly, we found that fd4 single mutants or fd5 single mutants had no phenotype (Eve+ neurons were all normal). Thus, to assess their roles, we generated a fd4 and fd5 double mutant. Because many Eve+ and Dbx+ cells are generated outside of NB7-1 lineage, it was also essential to identify the Eve+ or Dbx+ cells within NB7-1 lineage in wild type and fd4 mutant embryos. To achieve this, we replaced the open reading frame of fd4 with gal4 (called fd4-gal4) (see Methods); this stock simultaneously knocked out both fd4 and fd5 (called fd4/fd5 mutant hereafter) while specifically labeling the NB7-1 lineage. For the remainder of this paper we use the fd4/fd5 double mutant to assay for loss of function phenotypes.”

      It is notable that Fd4 overexpression can rewire motor circuits. This analysis adds another dimension to the changes in transcription factor expression and, importantly, demonstrates functional consequences. Could the authors test whether U4 and U5 motor axon targeting changes in the fd4/fd5 double mutant? To strengthen claims regarding the importance of fd4/fd5 for lineage identity, it would help to address terminal features of U motorneuron identity in the LOF condition.

      Thanks for raising this important point. We examined the axon targeting on body wall muscles in both wild type and in fd4/fd5 mutant background and added the results in Figure 3-figure supplement 2. We found that the axon targeting in the late-born neuron region (LL1) is significantly reduced, suggesting that the loss of late-born neurons in fd4/fd5 mutant leads to the absence of innervation of corresponding muscle targets.

      Reviewer #3 (Public review):

      The goal of the work is to establish the linkage between the spatial transcription factors (STFs) that function transiently to establish the identities of the individual NBs and the terminal selector genes (typically homeodomain genes) that appear in the newborn postmitotic neurons. How is the identity of the NB maintained and carried forward after the spatial genes have faded away? Focusing on a single neuroblast (NB 7-1), the authors present evidence that the fork-head transcription factor, fd4, provides a bridge linking the transient spatial cues that initially specified neuroblast identity with the terminal selector genes that establish and maintain the identity of the stem cell's progeny. 

      Thanks for the positive comments!

      The study is systematic, concise, and takes full advantage of 40+ years of work on the molecular players that establish neuronal identities in the Drosophila CNS. In the embryonic VNC, fd4 is expressed only in the NB 7-1 and its lineage. They show that Fd4 appears in the NB while the latter is still expressing the Spatial Transcription Factors and continues after the expression of the latter fades out. Fd4 is maintained through the early life of the neuronal progeny but then declines as the neurons turn on their terminal selector genes. Hence, fd4 expression is compatible with it being a bridging factor between the two sets of genes. 

      Thanks for the accurate summary!

      Experimental support for the "bridging" role of Fd4 comes from a set of loss-of-function and gain-of-function manipulations. The loss of function of Fd4, and the partially redundant gene Fd5, from lineage 7-1 does not aoect the size of the lineage, but terminal markers of late-born neuronal phenotypes, like Eve and Dbx, are reduced or missing. By contrast, ectopic expression of fd4, but not fd5, results in ectopic expression of the terminal markers eve and Dbx throughout diverse VNC lineages. 

      Thanks for the accurate summary!

      A detailed test of fd4's expression was then carried out using lineages 7-3 and 5-6, two well-characterized lineages in Drosophila. Lineage 7-3 is much smaller than 7-1 and continues to be so when subjected to fd4 misexpression. However, under the influence of ectopic Fd4 expression, the lineage 7-3 neurons lost their expected serotonin and corazonin expression and showed Eve expression as well as motoneuron phenotypes that partially mimic the U motoneurons of lineage 7-1.

      Thanks for the positive comments!

      Ectopic expression of Fd4 also produced changes in the 5-6 lineage. Expression of apterous, a feature of lineage 5-6, was suppressed, and expression of the 7-1 marker, Eve, was evident. Dbx expression was also evident in the transformed 5-6 lineages, but extremely restricted as compared to a normal 7-1 lineage. Considering the partial redundancy of fd4 and fd5, it would have been interesting to express both genes in the 5-6 lineage. The anatomical changes that are exhibited by motoneurons in response to Fd4 expression confirm that these cells do, indeed, show a shift in their cellular identity.

      We appreciate the positive comments. We agree double misexpression of Fd4 and Fd5 might give a stronger phenotype (as the reviewer says) but the lack of this experiment does not change the conclusions that Fd4 can promote NB7-1 molecular and morphological aspects at the expense of NB5-6 molecular markers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The study introduces an open-source, cost-effective method for automating the quantification of male social behaviors in Drosophila melanogaster. It combines machine-learning-based behavioral classifiers developed using JAABA (Janelia Automatic Animal Behavior Annotator) with inexpensive hardware constructed from off-the-shelf components. This approach addresses the limitations of existing methods, which often require expensive hardware and specialized setups. The authors demonstrate that their new "DANCE" classifiers accurately identify aggression (lunges) and courtship behaviors (wing extension, following, circling, attempted copulation, and copulation), closely matching manually annotated groundtruth data. Furthermore, DANCE classifiers outperform existing rule-based methods in accuracy. Finally, the study shows that DANCE classifiers perform as well when used with low-cost experimental hardware as with standard experimental setups across multiple paradigms, including RNAi knockdown of the neuropeptide Dsk and optogenetic silencing of dopaminergic neurons.

      The authors make creative use of existing resources and technology to develop an inexpensive, flexible, and robust experimental tool for the quantitative analysis of Drosophila behavior. A key strength of this work is the thorough benchmarking of both the behavioral classifiers and the experimental hardware against existing methods. In particular, the direct comparison of their low-cost experimental system with established systems across different experimental paradigms is compelling.

      While JAABA-based classifiers have been previously used to analyze aggression and courtship (Tao et al., J. Neurosci., 2024; Sten et al., Cell, 2023; Chiu et al., Cell, 2021; Isshi et al., eLife, 2020; Duistermars et al., Neuron, 2018), the demonstration that they work as well without expensive experimental hardware opens the door to more low-cost systems for quantitative behavior analysis.

      We thank the reviewer for their positive assessment and constructive suggestions. We have cited these additional JAABA studies in the Introduction. We clarified that several prior JAABA-based classifiers were developed using specialized machinevision cameras or custom setups, and that in some cases the original code and classifiers were not made publicly available, which limits reproducibility and wider adoption. To address this, we explicitly note in the revised manuscript that DANCE was developed with accessibility in mind.

      Although the study provides a detailed evaluation of DANCE classifier performance, its conclusions would be strengthened by a more comprehensive analysis. The authors assess classifier accuracy using a bout-level comparison rather than a frame-level analysis, as employed in previous studies (Kabra et al., Nat Methods, 2013). They define a true positive as any instance where a DANCE-detected bout overlaps with a manually annotated ground-truth bout by at least one frame. This criterion may inflate true positive rates and underestimate false positives, particularly for longer-duration courtship behaviors. For example, a 15-second DANCE-classified wing extension bout that overlaps with ground truth for only one frame would still be considered a true positive. A frame-level analysis performance would help address this possibility.

      We thank the reviewer for raising this important point. Our original use of bout-level analysis followed existing literature (Duistermars et al., 2018; Ishii et al., 2020; Chiu et al., 2021; Tao et al., 2024; Hindmarsh Sten et al., 2025). While our lunge classifier already operates at the frame level, we have now performed additional frame-level evaluations for the duration based courtship classifiers. These analyses revealed only minor differences in precision, recall, and F1 scores compared with the original bout-level approach (see new Figure 5—Figure Supplement 3). Details of this analysis are now included in the Materials and Methods.

      In summary, this work provides a practical and accessible approach to quantifying Drosophila behavior, reducing the economic barriers to the study of the neural and molecular mechanisms underlying social behavior.

      We thank the reviewer for their encouraging comments and for recognizing the accessibility and practical value of our approach. We appreciate the constructive suggestions, which have helped strengthen the manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript addresses the development of a low-cost behavioural setup and standardised open-source high-performing classifiers for aggression and courtship behaviour. It does so by using readily available laboratory equipment and previously developed software packages. By comparing the performance of the setup and the classifiers to previously developed ones, this study shows the classifier's overperformance and the reliability of the low-cost setup in recapitulating previously described effects of different manipulations on aggression and courtship.

      Strengths:

      The newly developed classifiers for lunges, wing extension, attempted copulation, copulation, following, and circling, perform better than available previously developed ones. The behavioural setup developed is low cost and reliably allows analysis of both aggression and courtship behaviour, validated through social experience manipulation (social isolation), gene knock (Dsk in Dilp2 neurons) and neuronal inactivation (dopaminergic neurons) known to affect courtship and aggression.

      We thank the reviewer for the clear summary of our work and for highlighting its strengths. We appreciate these positive comments and suggestions, which have helped improve the clarity of the manuscript.

      Weaknesses:

      Aggression encompasses multiple defined behaviours, yet only lunges were analysed. Moreover, the CADABRA software to which DANCE was compared analyses further aggression behaviours, making their comparisons incomplete. In addition, though DANCE performs better than CADABRA and Divider in classifying lunges in the behavioural setup tested, it did not yield very high recall and F1 scores.

      We thank the reviewer for raising this important point. We focused on lunges because they are widely used as a standard proxy for male aggression across multiple laboratories (Agrawal et al., 2020; Asahina et al., 2014; Chiu et al., 2021; Chowdhury et al., 2021; Dierick et al., 2007; Hoyer et al., 2008; Jung et al., 2020; Nilsen et al., 2004; Watanabe et al., 2017). As noted in the Discussion, our study also provides a template for the future development of additional aggression classifiers (fencing, wing flick, tussle, chase, female headbutt) and courtship classifiers (tapping, licking, rejection), which can be trained and shared through the same DANCE framework. Developing and validating these was beyond the scope of the present work.

      To address the concern regarding precision, recall, and F1 scores, we performed additional analyses across all training videos and compiled these results in the new Figure 2—Figure Supplement 2. Our earlier lunge classifier had performance metrics obtained after training on a total of 11 videos. Our analysis shows performance metrics for classifiers trained on four independent datasets (Videos 8– 11). We found that the classifier trained on nine videos provided the best balance of precision, recall, and F1 (78.73%, 73.07%, and 75.79%, respectively), which was slightly better than the earlier classifier. We therefore updated the main figure, text, and Materials and Methods to use this version and uploaded the corresponding classifier and training details to the GitHub repository. 

      DANCE is of limited use for neuronal circuit-level enquiries, since mechanisms for intensity and temporally controlled optogenetic manipulations, which are nowadays possible with open-source software and low-cost hardware, were not embedded in its development.

      We thank the reviewer for this valuable point. The primary aim of DANCE is to provide an accessible, modular, and low-cost behavioural recording and analysis platform. It was designed so that users can readily integrate additional components such as optogenetic control when needed. As a proof of concept, we implemented optogenetic silencing of dopaminergic neurons using the DANCE hardware and confirmed that this manipulation increased aggression (Figure 7R). 

      To facilitate adoption, we now provide schematic diagrams, LED control code, and instructions on our GitHub page and setup photographs in the manuscript (see new Figure 7—Figure Supplement 1). The released code allows programmable timing and intensity control, enabling users to reproduce temporally precise optogenetic protocols or extend the system for other stimulation paradigms.

      Reviewer #3 (Public review):

      The preprint by Yadav et al. describes a new setup to quantify a number of aggression and mating behaviors in Drosophila melanogaster. The investigation of these behaviors requires the analysis of a large number of videos to identify each kind of behavior displayed by a fly. Several approaches to automatize this process have been published before, but each of them has its limitations. The authors set out to develop a new setup that includes very low-cost, easy-to-acquire hardware and open-source machine-learning classifiers to identify and quantify the behavior.

      Strengths:

      (1) The study demonstrates that their cheap, simple, and easy-to-obtain hardware works just as well as custom-made, specialized hardware for analyzing aggression and mating behavior. This enables the setup to be used in a wide range of settings, from research with limited resources to classroom teaching.

      (2) The authors used previously published software to train new classifiers for detecting a range of behaviors related to aggression and mating and to make them freely available. The classifiers are very positively benchmarked against a manually acquired ground truth as well as existing algorithms.

      (3) The study demonstrates the applicability of the setup (hardware and classifiers) to common methods in the field by confirming a number of expected phenotypes with their setup.

      We thank the reviewer for the positive assessment of our work and for highlighting its strengths. We appreciate these encouraging comments and suggestions, which have helped improve the clarity and presentation of the manuscript.

      Weaknesses:

      (1) When measuring the performance of the duration-based classifiers, the authors count any bout of behavior as true positive if it overlaps with a ground-truth positive for only 1 frame - despite the minimal duration of a bout is 10 frames, and most bouts are much longer. That way, true positives could contain cases that are almost totally wrong as long there was an overlap of a single frame. For the mating behaviors that are classified in ongoing bouts, I think performance should be evaluated based on the % of correctly classified frames, not bouts.

      We thank the reviewer for raising this concern. In response to this point, and to Reviewer #1’s similar comment, we performed a frame-level evaluation of all duration-based courtship classifiers. The analysis revealed only minor differences compared with the original bout-level metrics (see new Figure 5—Figure Supplement 3), confirming the robustness of our classifiers. We have also added a description of this analysis in the Materials and Methods section.

      (2) In the methods part, only one of the pre-existing algorithms (MateBook), is described. Given that the comparison with those algorithms is a so central part of the manuscript, each of them should be briefly explained and the settings used in this study should be described.

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we expanded the Materials and Methods to include concise descriptions and parameter settings for all pre-existing algorithms used for comparison. This includes dedicated subsections for CADABRA and the Divider assay, with explicit reference to their rulebased or geometric features. For MateBook, we specified the persistence filters used and the adjustments made for fair benchmarking. These changes ensure transparency and reproducibility.

      Taken together, this work can greatly facilitate research on aggression and mating in Drosophila. The combination of low-cost, off-the-shelf hardware and open-source, robust software enables researchers with very little funding or technical expertise to contribute to the scientific process and also allows large-scale experiments, for example in classroom teaching with many students, or for systematic screenings.

      We thank the reviewer for the encouraging comments and for recognizing the accessibility and broad applicability of DANCE. We believe these revisions have further strengthened the manuscript.

      Reviewer #1 (Recommendations for the authors):

      The following comments highlight areas where additional context, clarification, or further analysis could strengthen the manuscript. I hope these suggestions will be useful in refining your work.

      (1) Lines 71-73: The authors state that Ctrax "leads to frequent identity switches among tracked flies, which is not the case while using FlyTracker." However, Ctrax was specifically designed to minimize identity errors, and Kabra et al. (2013) reported a low frequency of such errors-approximately one per five fly-hours in 10-fly videos. In contrast, Caltech FlyTracker does not correct identity errors automatically, requiring manual corrections, as noted in the Methods section of this study. If this is not an oversight, please provide further context to clarify this distinction.

      We thank the reviewer for raising this clarification. As reported by Bentzur et al. (2021), when groups of flies were tracked simultaneously, Ctrax often generated multiple identities for the same individual, sometimes producing more trajectories than the actual number of flies. To prevent ambiguity, we revised the text to read: “While both Ctrax and FlyTracker (Eyjolfsdottir et al., 2014) may produce identity switches, when groups of flies were tracked simultaneously, Ctrax led to inaccuracies that required manual correction using specialized algorithms such as FixTrax (Bentzur et al., 2021).”  We also quantified FlyTracker identity-switch rates in our datasets and report them in new Supplementary File 5, confirming that such events were rare (< 2% of tracked intervals). We believe, this updated version provides the necessary context and ensures accuracy in describing each tracker’s limitations.

      (2) Line 85: Providing additional context on how this study builds on previous work using JAABA-based classifiers for fly social behavior and comparing these classifiers to rule-based methods would more accurately situate it within the field. The authors state that "recently, a few JAABA-based classifiers have been developed for measuring aggression and courtship" and cite four related studies. However, this statement seems to underrepresent the use of JAABA-based classifiers for quantifying fly social behavior, which has become common in the field. Several additional studies (as noted in the public review) have developed JAABA-based classifiers for scoring aggression or courtship. Furthermore, other studies have compared the performance of JAABA-based classifiers with rule-based classifiers like CADABRA (e.g., Chowdhury et al., Comm Biology 2021; Leng et al., PlosOne 2020; Kabra et al., Nat Methods 2013). Mentioning the similar findings in those studies and your own helps strengthen the conclusion that machine-learning-based classifiers outperform rule-based classifiers in several experimental contexts.

      We thank the reviewer for this helpful suggestion. We have revised the Introduction to include additional references to studies that applied JAABA-based classifiers for aggression and courtship and made textual edits to reflect this. We further noted that, unlike several previous studies, all DANCE classifiers and analysis code are publicly available.

      Reviewer #2 (Recommendations for the authors):

      (1) Suggestions for improved or additional experiments, data or analyses: As mentioned in the description of the effect of optogenetic inactivation of dopaminergic neurons, in the conclusion and also reported in the literature, there are other important identified aggression behaviours, such as fencing, wing flick, tussle, and chase. Similarly, for courtship, tapping and licking have also been defined. This study, as opposed to proposed future studies, would benefit from creating opensource classifiers for these established behaviours, which are important for the analysis of aggression and courtship.

      We thank the reviewer for this valuable suggestion. As clarified in the Discussion, this manuscript intentionally focuses on six core, well-validated aggression and courtship behaviors to demonstrate DANCE’s modularity and reproducibility. Developing additional classifiers such as fencing, wing flick, tussle, chase, tapping, and licking would require extensive annotation and validation beyond the present scope. To address this point, we explicitly note in the revised text that the DANCE pipeline is readily extendable, allowing the community to build new classifiers within the same framework.

      In terms of observer bias assessment for ground-truthing in courtship, this was only presented for circling and it would be beneficial to have encompassed all behaviours analysed.

      We thank the reviewer for this suggestion. Observer-bias comparisons for all six classifiers are presented in Figure 2—Figure Supplement 1 (panels A–F). We clarified in the Results that annotations from two independent evaluators were compared for all classifiers, with no significant differences observed, confirming their robustness.

      Finally, intensity and temporal optogenetic control are important for neuronal circuit analysis of underlying behaviour. The authors could embed this aspect in DANCE by integrating control of the green light LED strip used in this study using, for example, the open-source visual reactive programming software Bonsai (Lopes et al., 2015) and open-source electronics platform Arduino. This is an important and valuable addition in line with maintaining low cost.

      We thank the reviewer for this valuable suggestion. DANCE was designed to be modular, allowing integration of temporal optogenetic control. To support immediate adoption, we now provide Arduino LED control code, setup schematics, and photographs (new Figure 7—Figure Supplement 1) along with step-by-step instructions on our GitHub page. We also note that Bonsai and Arduino frameworks are compatible with DANCE, enabling future extensions for closed-loop or behaviortriggered stimulation.

      (2) Minor corrections to the text and figures:

      Figure Supplement 1 refers only to Figure 2, yet panels D-F refer to the behaviour circling in courtship and therefore should be assigned to the respective figure.

      Thanks, we have corrected this.

      In lines 315-316, the cumbersome task of fluon coating for aggression assays seems to be ubiquitous across assays which is not the case, and therefore the sentence should include the word 'some'.

      Thanks, we have edited this.

      The cost of the phone and/or tablet should be included in the DANCE setup costs, as presumably these devices will be dedicated to the behavioural studies, for consistency purposes.

      We thank the reviewer for this comment. We intentionally did not include smartphones or tablets in the setup cost because, in our experiments, these devices were not dedicated exclusively to DANCE but were repurposed from routine personal use. Our aim was to leverage readily available consumer electronics so that their cost does not become a barrier to adoption. We confirmed that commonly available Android phones capable of 30 fps at 1080p in H.264 format, as well as tablets or phones running a simple white-screen light app, are sufficient for reliable behavior classification and illumination. Since these devices can be returned to regular use after recordings, including their cost in the setup would not accurately reflect the intended accessibility of DANCE. For consistency, we now clarify in the Materials and Methods that such devices should be placed in airplane mode during recordings.

      Reviewer #3 (Recommendations for the authors):

      (1) For my taste, the authors put too much emphasis on the point that their method outperforms existing methods. I understand the value in comparing to published methods and it is of course fully justified to state the advantages of the new method. But the whole preprint is set up as a competition with the old algorithms, and the conclusion that the new classifier is better is repeated in each figure caption and after each paragraph of the results. This competitive mindset also extends to the selection of which results are presented as main figures and which as supplements - all cases in which the previous methods actually perform well are only presented in the supplement. I think this is simply unnecessary as the authors' results speak for themselves, and do not need the continuous competitive comparison.

      We thank the reviewer for this thoughtful suggestion. Our intention was to benchmark DANCE rigorously against existing methods, not to frame the study competitively. We agree that repeated emphasis on relative performance was unnecessary. In the revised version, we streamlined figure captions and text throughout the manuscript to balance comparisons and removed redundant phrasing. Instances where other methods performed well are now presented with equal clarity to maintain a neutral and informative tone.

      (2) When describing the DANCE hardware, as a reader I would find it interesting to also read about potential issues that the authors encountered. For example, how difficult is it to handle the materials without breaking or deforming them, which could affect the behavioral assays? How critical is it to use specific blister packs - the availability of which will likely vary strongly between countries? Did the authors try different sizes, and products? Such information, even as a supplement, could be very helpful for the widespread use of the hardware.

      We thank the reviewer for this important point. To address this, we conducted additional tests comparing DANCE arenas of different diameters (new Figure 7— Figure Supplement 3A–C and new Figure 7—Figure Supplement 4A–L). We also consulted colleagues in multiple countries and verified that the blister packs used in our assays are readily available. The Materials and Methods now include practical handling notes: blister foils can be reused ~30–40 times for aggression assays and ~10–15 times for courtship assays before deformation. We also describe how to prevent agar surface damage during assembly and how to wash and dry the arenas for optimal reusability.

      (3) I find the arrows pointing to several videos in a number of figures rather distracting and redundant, and suggest omitting them.

      Thanks, we have omitted these arrows from all relevant figures and clarified the figure legends to enhance readability.

      (4) P8, line 169 ff: this is a very long sentence that should be separated into several sentences.

      We have rewritten this as follows: “DANCE scores remained comparable to groundtruth scores across all categories, whereas CADABRA and Divider underestimated the lunge counts (Figure 2B–E). Correlation analysis revealed a strong relationship between DANCE and ground-truth scores (Figure 2F, Supplementary File 2). In comparison, CADABRA and the Divider assay classifier showed a weaker correlation (Figure 2G-H, Supplementary File 2).”

      (5) P10, line 216: please explain, here and in the methods, how these behavioral indices are calculated. I did not find this information anywhere in the paper.

      We thank the reviewer for pointing this out. We now define the behavioral index explicitly in Materials and Methods: “For each assay, a behavioral index was calculated as the proportion of frames in which the male engaged in the specified behavior. This was obtained by dividing the total number of frames annotated for that behavior by the total number of frames in the recording.”

      (6) P11, line 253: I don't understand the modifications to MateBook regarding attempted copulations, neither in the results nor the methods section. I would ask the authors to explain more explicitly what was done.

      We thank the reviewer for this helpful suggestion. We have re-written several parts of the Materials and methods to clarify these details and streamline the text. To train the attempted copulation classifier, we combined datasets from assays with mated and decapitated virgin females, using manual annotations as ground truth. We also adapted MateBook’s persistence filters (Ribeiro et al., 2018) and defined thresholds explicitly: mounting lasting >45 s (>1350 frames at 30 fps) was defined as copulation, whereas abdominal curling without mounting, or mounting lasting 0.33– 45 s, was defined as attempted copulation.

      (7) Figure 7F: this is the only case with a significant difference between the two setups. What explanations do the authors have for the discrepancy?

      We thank the reviewer for raising this point. After repeating the experiments, we no longer found a significant difference between the setups. Figure 7 and its legend have been updated to reflect these results.

      (8) Figure 2 - Supplement 1: I do not understand why the boxes for Observer 1 have different colors in different figures. Does this have a meaning?

      Thanks for pointing this out. The color differences had no intended meaning, and we have corrected the figure for consistency across panels.

      (9) P22, line 517ff: It would be interesting to know how frequently identity switches occurred. For large-scale, automatic behavioral screenings that step could be a crucial bottleneck.

      We thank the reviewer for this valuable suggestion. We analyzed identity switches using the FlyTracker “Visualizer” package, which flags frames with possible overlaps or jumps. Flagged intervals were manually verified, and we report these data in new Supplementary File 5. Identity switch rates were very low: 0.66% for high-resolution recordings and 1.9% for smartphone DANCE videos in the most challenging decapitated-virgin dataset. These findings demonstrate robust tracking performance under both setups.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Biomolecular condensates are an essential part of cellular homeostatic regulation. In this manuscript, the authors develop a theoretical framework for the phase separation of membrane-bound proteins. They show the effect of non-dilute surface binding and phase separation on tight junction protein organization. 

      Strengths: 

      It is an important study, considering that the phase separation of membrane-bound molecules is taking the center stage of signaling, spanning from immune signaling to cell-cell adhesion. A theoretical framework will help biologists to quantitatively interpret their findings. 

      Weaknesses: 

      Understandably, the authors used one system to test their theory (ZO-1). However, to establish a theoretical framework, this is sufficient. 

      We acknowledge this limitation. While we agree that additional systems would strengthen the generality of our theory, we note that the focus of this work is to introduce and validate a theoretical framework. As the reviewer notes, this is sufficient for establishing the framework. Nonetheless, we are open to further collaborations or future studies to test the model with other systems.

      Reviewer #2 (Public review): 

      Summary: 

      The authors present a clear expansion of biophysical (thermodynamic) theory regarding the binding of proteins to membrane-bound receptors, accounting for higher local concentration effects of the protein. To partially test the expanded theory, the authors perform in vitro experiments on the binding of ZO1 proteins to Claudin2 C-terminal receptors anchored to a supported lipid bilayer, and capture the effects that surface phase separation of ZO1 has on its adsorption to the membrane. 

      Strengths: 

      (1) The derived theoretical framework is consistent and largely well-explained. 

      (2) The experimental and numerical methodologies are transparent. 

      (3) The comparison between the best parameterized non-dilute theory is in reasonable agreement with experiments. 

      Weaknesses: 

      (1) In the theoretical section, what has previously been known, compared to which equations are new, should be made more clear. 

      We have revised the theory section to clearly distinguish previously established formulations from novel contributions following equation (4), which is .

      (2) Some assumptions in the model are made purely for convenience and without sufficient accompanying physical justification. E.g., the authors should justify, on physical grounds, why binding rate effects are/could be larger than the other fluxes. 

      For our problem, binding is relevant together with diffusive transport in each phase. Each process is accompanied by kinetic coefficients that we estimate for the experimental system. For the considered biological systems (and related ones), it is difficult to determine whether other fluxes (see, e.g., Eq. 8(e)) have relaxed or not. We note that their effects are, of course, included in the kinetic model applied to the coarsening of ZO1 surface condensates as boundary conditions. But we cannot exclude that the corresponding kinetic coefficient in the actual biological system is large enough such that, e.g., Eq. (9e) does not vanish to zero “quasi-statically”. We have now added a sentence to the outlook highlighting the relevance of testing those flux-force relationships in biological systems. 

      (3) I feel that further mechanistic explanation as to why bulk phase separation widens the regime of surface phase separation is warranted.  

      We have discussed the mechanistic explanation related to bulk protein interaction strength in the manuscript in the section: “Effects of binding affinity and interactions on surface phase separation”. We explained how the bulk interaction parameter affects the binding equilibrium. 

      (4) The major advantage of the non-dilute theory as compared with a best parameterized dilute (or homogenous) theory requires further clarification/evidence with respect to capturing the experimental data. 

      We thank reviewer for this helpful question. To address this point, we have added new paragraphs in the conclusion section, which explicitly discuss the necessity of employing the non-dilute theory for interpreting the experimental data.

      (5) Discrete (particle-based) molecular modelling could help to delineate the quantitative improvements that the non-dilute theory has over the previous state-of-the-art. Also, this could help test theoretical statements regarding the roles of bulk-phase separation, which were not explored experimentally.  

      We appreciate the suggestion and agree that such modeling would be valuable. However, this is beyond the scope of the current study. 

      (6) Discussion of the caveats and limitations of the theory and modelling is missing from the text. 

      We sincerely appreciate the reviewer’s helpful comment. We have added a discussion in the conclusion section outlining the caveats and limitations of our modeling approach.

      Reviewing Editor Comments: 

      Upon discussing with the reviewers, we feel that this manuscript could significantly be improved if testing the model with a different model system (beyond ZO1/tight junctions), in which case we foresee that we could enhance the strength of evidence from "compelling" to "exceptional". But of course, this is up to the authors to go for it or not, the paper is already very good. 

      Reviewer #2 (Recommendations for the authors): 

      (1) Lines 132-134: Re-word, the use of "complex" is confusing.

      We have rephrased the sentence for clarity. The revised version reads: ṽ<sub>_𝑃𝑅</sub>_ are the molecular volume and area of the protein-receptor complex ѵ<sub>𝑃𝑅</sub>, respectively”, and the changes have been in the revised manuscript.

      (2) Line 154 use of ""\nu"" for volume and area could be avoided for better clarity. 

      We thank the reviewer for this helpful suggestion. We have removed the statement involving ""\nu"" as these quantities have already been defined in the preceding context.

      (3) Line 158 the total "Helmholtz" free energy F... 

      We have added the word "Helmholtz" to the sentence.

      (4) Line 160 typo "In specific,..." 

      We carefully checked this sentence but could not identify a typo.  

      (5) For equation 5 explain the physical origins of each term, or provide a reference if this equation is explained elsewhere. 

      Thank you very much for your valuable suggestions. We have carefully rephrased Equation (5) and added a paragraph immediately afterward to provide a detailed explanation of its physical meaning.

      (6) Derivation on lines 163-174 is poorly written. Make the logical flow between the equations clearer. 

      We greatly appreciate your insightful suggestions. Equation (6) has been carefully revised for clarity, and the explanation has been rewritten to ensure better readability. All modifications are Done.

      (7) Define bold "t" in Equation 6. 

      The variable “t” has been explicitly defined in the context for clarity.

      (8) In equations. 7b-7c the nablas (gradients) should be the 2D versions.  

      We have updated the gradient operators in Equations (7b) and (7c) [Eq. (9) in revised manuscript]  to their 2D forms for consistency. 

      (9) Line 190, avoid referring to the future Equation 14, and state in words what is meant by "thermodynamic equilibrium". 

      We have added the explanation of “thermodynamic equilibrium” and remove the reference to equation accordingly.

      (10) In Equation 11 you don't explain what you are doing ( which is a perturbation around the minimum of the free energy). 

      We have revised the paragraph before equation (11) [Eq. (13) in revised manuscript] to clarify that the expression represents a perturbation around the minimum of the free energy.

      (11)  In Equation 12, doesn't this also depend on how you have written equation 6 (not just equation 5). 

      Eq. (12) [Eq. (14) in revised manuscript] is derived directly from the variation of the total free energy F. In contrast, Eq. (6) contains the time derivative of free energies that were not written in their final form. In the revised version, we have now given the conjugate forces and fluxes in Eqs. (7) and (8) for clarity.

      (12) Line 206 specify the threshold of local concentration (or provide a reference). 

      We have specified the threshold of local concentration in the revised text, and the corresponding statement has been highlighted.

      (13) Line 223 is the deviation from ideality captured in a pair-wise fashion? I presume it does not account for N many-body interactions?  

      Yes, our model is formulated within a mean-field framework that incorporates pairwise (second order) interaction coefficients. For example, 𝜒<sub>𝑃𝑅 -𝑅</sub> characterizes the interaction between the complex 𝑃𝑅 and the free receptor 𝑅, 𝜒<sub>𝑅 -L</sub> the interaction between free receptor 𝑅 and free lipid 𝐿, 𝜒<sub>𝑃𝑅-𝐿</sub> the interaction between complex 𝑃𝑅and free lipid 𝐿. We have stressed this choice of free energy in the revised manuscript.

      (14) Line 274, how do the authors know the secondary effects (of which they should mention a few) do not significantly impact the observed behaviour?  

      We sincerely thank the reviewer for the helpful comment. First, the parameters 𝜒<sub>𝑅 -L</sub> and 𝜒<sub>𝑃𝑅 -𝑅</sub> are not essential based on the experimental observations. For more information, please see our revised paragraph on the choice of the specific parameter values, which has been in the following Eq. (21).

      (15) It's not clear how Figures 3 b and c are generated with reference to which parameters are changed to investigate with/without bulk phase separation. 

      To improve clarity, we have revised Figure 3 to display the corresponding parameter values directly in each panel. Figures 3b and 3c were generated by computing the surface binding curves (as shown in Fig. 2) for each binding affinity 𝜔<sub>𝑃𝑅</sub> and membrane-complex interaction strength 𝜒<sub>𝑃𝑅-𝐿</sub>, under different bulk interaction strengths chi, to compare the cases with and without bulk phase separation. 

      (16) The jump between theory and the "Mechanism in ..." section is too much. The authors should include the biological context of tight junctions and ZO1 in the main introduction. 

      We appreciate the reviewer’s suggestion. Following this comment, we have added an extended discussion in the main introduction to provide the necessary biological context of tight junctions and ZO1. In addition, we inserted new bridging paragraphs between the theoretical section and the section “Mechanism in tight junction formation” to create a smoother transition from theory to experiments. These revisions help to better connect the theoretical framework with the biological phenomena discussed in the later section.

    1. Reviewer #1 (Public review):

      Disclaimer: While I am familiar with the CFS method and the CFS literature, I am not familiar with primate research or two-photon calcium imaging. Additionally, I may be biased regarding unconscious processing under CFS, as I have extensively investigated this area but have found no compelling evidence in favor of unconscious processing under CFS.

      This manuscript reports the results of a nonhuman-primate study (N=2 behaving macaque monkeys) investigating V1 responses under continuous flash suppression (CFS). The results show that CFS substantially suppressed V1 orientation responses, albeit slightly differently in the two monkeys. The authors conclude that CFS-suppressed orientation information "may not suffice for high-level visual and cognitive processing" (abstract).

      The manuscript is clearly written and well-organized. The conclusions are supported by the data and analyses presented (but see disclaimer). However, I believe that the manuscript would benefit from a more detailed discussion of the different results observed for monkeys A and B (i.e., inter-individual differences), and how exactly the observed results are related to findings of higher-order cognitive processing under CFS, on the one hand, and the "dorsal-ventral CFS hypothesis", on the other hand.

      Major Comments:

      (1) Some references are imprecise. For example, l.53: "Nevertheless, two fMRI studies reported that V1 activity is either unaffected or only weakly affected (Watanabe et al., 2011; Yuval-Greenberg & Heeger, 2013)". "To the best of my understanding, the second study reaches a conclusion that is entirely opposite to that of the first, specifically that for low-contrast, invisible stimuli, stimulus-evoked fMRI BOLD activity in the early visual cortex (V1-V3) is statistically indistinguishable from activity observed during stimulus-absent (mask-only) trials. Therefore, high-level unconscious processing under CFS should not be possible if Yuval-Greenberg & Heeger are correct. The two studies contradict each other; they do not imply the same thing.

      (2) Line 354: "The flashing masker was a circular white noise pattern with a diameter of 1.89{degree sign}{degree sign}, a contrast of 0.5, and a flickering rate of 10 Hz. The white noise consisted of randomly generated black and white blocks (0.07 × 0.07 each)." Why did the authors choose a white noise stimulus as the CFS mask? It has previously been shown that the depth of suppression engendered by CFS depends jointly on the spatiotemporal composition of the CFS and the stimulus it is competing with (Yang & Blake, 2012). For example, Hesselmann et al. (2016) compared Mondrian versus random dot masks using the probe detection technique (see Supplementary Figure S4 in the reference below) and found only a poor masking performance of the random dot masks.

      Yang, E., & Blake, R. (2012). Deconstructing continuous flash suppression. Journal of Vision, 12(3), 8. https://doi.org/10.1167/12.3.8

      Hesselmann, G., Darcy, N., Ludwig, K., & Sterzer, P. (2016). Priming in a shape task but not in a category task under continuous flash suppression. Journal of Vision, 16, 1-17.

      (3) Related to my previous point: I guess we do not know whether the monkeys saw the CF-suppressed grating stimuli or not? Therefore, could it be that the differences between monkey A and B are due to a different individual visibility of the suppressed stimuli? Interocular suppression has been shown to be extremely variable between participants (see reference below). This inter-individual variability may, in fact, be one of the reasons why the CFS literature is so heterogeneous in terms of unconscious cognitive processing: due to the variability in interocular suppression, a significant amount of data is often excluded prior to analysis, leading to statistical inconsistencies. Moreover, the authors' main conclusion (lines 305-307) builds on the assumption that the stimuli were rendered invisible, but isn't this speculation without a measure of awareness?

      Yamashiro, H., Yamamoto, H., Mano, H., Umeda, M., Higuchi, T., & Saiki, J. (2014). Activity in early visual areas predicts interindividual differences in binocular rivalry dynamics. Journal of Neurophysiology, 111(6), 1190-1202. https://doi.org/10.1152/jn.00509.2013

      (4) The authors refer to the "tool priming" CFS studies by Almeida et al. (l.33, l.280, and elsewhere) and Sakuraba et al. (l.284). A thorough critique of this line of research can be found here:

      Hesselmann, G., Darcy, N., Rothkirch, M., & Sterzer, P. (2018). Investigating Masked Priming Along the "Vision-for-Perception" and "Vision-for-Action" Dimensions of Unconscious Processing. Journal of Experimental Psychology. General. https://doi.org/10.1037/xge0000420

      This line of research ("dorsal-ventral CFS hypothesis") has inspired a significant body of behavioral and fMRI/EEG studies (see reference for a review below). The manuscript would benefit from a brief paragraph in the discussion section that addresses how the observed results contribute to this area of research.

      Ludwig, K., & Hesselmann, G. (2015). Weighing the evidence for a dorsal processing bias under continuous flash suppression. Consciousness and Cognition, 35, 251-259. https://doi.org/10.1016/j.concog.2014.12.010

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, participants completed two different tasks. A perceptual choice task in which they compared the sizes of pairs of items and a value-different task in which they identified the higher value option among pairs of items with the two tasks involving the same stimuli. Based on previous fMRI research, the authors sought to determine whether the superior frontal sulcus (SFS) is involved in both perceptual and value-based decisions or just one or the other. Initial fMRI analyses were devised to isolate brain regions that were activated for both types of choices and also regions that were unique to each. Transcranial magnetic stimulation was applied to the SFS in between fMRI sessions and it was found to lead to a significant decrease in accuracy and RT on the perceptual choice task but only a decrease in RT on the value-different task. Hierarchical drift diffusion modelling of the data indicated that the TMS had led to a lowering of decision boundaries in the perceptual task and a lower of nondecision times on the value-based task. Additional analyses show that SFS covaries with model derived estimates of cumulative evidence, that this relationship is weakened by TMS.

      Strengths:

      The paper has many strengths, including the rigorous multi-pronged approach of causal manipulation, fMRI and computational modelling, which offers a fresh perspective on the neural drivers of decision making. Some additional strengths include the careful paradigm design, which ensured that the two types of tasks were matched for their perceptual content while orthogonalizing trial-to-trial variations in choice difficulty. The paper also lays out a number of specific hypotheses at the outset regarding the behavioural outcomes that are tied to decision model parameters and well justified.

      We thank the reviewer for their thoughtful summary of the study and for highlighting these strengths. We are pleased that the multi-pronged approach combining causal manipulation, fMRI, and hierarchical drift–diffusion modelling, as well as the careful matching of perceptual content across the two tasks, came across clearly. We also appreciate the reviewer’s positive remarks on the specificity of our a priori hypotheses and their links to decision-model parameters. In revising the manuscript, we have aimed to further streamline the presentation of these hypotheses and to more explicitly connect the behavioural predictions, model parameters, and neural readouts throughout the Results and Discussion sections.

      Weaknesses:

      In my previous comments (1.3.1 and 1.3.2) I noted that key results could be potentially explained by cTBS leading to faster perceptual decision making in both the perceptual and value-based tasks. The authors responded that if this were the case then we would expect either a reduction in NDT in both tasks or a reduction in decision boundaries in both tasks (whereas they observed a lowering of boundaries in the perceptual task and a shortening of NDT in the value task). I disagree with this statement. First, it is important to note that the perceptual decision that must be completed before the value-based choice process can even be initiated (i.e. the identification of the two stimuli) is no less trivial than that involved in the perceptual choice task (comparison of stimulus size). Given that the perceptual choice must be completed before the value comparison can begin, it would be expected that the model would capture any variations in RT due to the perceptual choice in the NDT parameter and not as the authors suggest in the bound or drift rate parameters since they are designed to account for the strength and final quantity of value evidence specifically. If, in fact, cTBS causes a general lowering of decision boundaries for perceptual decisions (and hence speeding of RTs) then it would be predicted that this would manifest as a short NDT in the value task model, which is what the authors see.

      We thank the reviewer for raising these points and for the helpful clarification. We agree that, in principle, the architecture of the value-based task can be conceived as involving an upstream perceptual process that must be completed, to some degree, before value comparison can proceed. Under such a multistage framework, it is indeed possible that cTBS-induced changes in a perceptual decision stage could manifest as a reduction in boundary separation in the pure perceptual task, while the same perturbation appears as a shortening of non-decision time (NDT) when fitting a single-stage DDM to the value task. In this sense, our earlier statement that a “general speeding effect” would necessarily produce identical parameter changes (either NDT or boundaries) in both tasks was too strong, and we are grateful to the reviewer for pointing this out.

      At the same time, this alternative explanation remains fully compatible with our central claim that the left SFS plays a perceptual rather than value-based role. We agree with the reviewer that there must be a stimulus-related circuit (in visual and parietal regions) that encodes the physical attributes of the options, and that this upstream processing can influence both tasks. However, a large body of work suggests that left SFS is not part of this primary identification circuitry, but rather contributes specifically to the accumulation and comparison of sensory evidence (e.g., Heekeren et al., 2004, 2006), downstream from areas such as FFA, PPA, or MT/V5 that encode stimulus identity. In other words, stimulus identification (forming a representation of “what is where”) is anatomically and functionally distinct from the accumulation of evidence toward a perceptual decision. Within this framework, the reviewer’s proposal that cTBS speeds “perceptual decisions” across tasks can be understood as targeting precisely the evidence-accumulation stage we ascribe to SFS, with the value-comparison stage proper likely implemented in other regions (e.g., vmPFC and connected valuation circuitry).

      We therefore do not rely solely on the dissociation between boundary changes in the perceptual task and NDT changes in the value task as decisive evidence against a “general speeding” account. Instead, our interpretation is based on the convergence of behavioural, model-based, and neural results. First, in the perceptual task, cTBS to left SFS leads to a selective reduction in decision boundary and a concomitant change in trialwise BOLD activity within the stimulated region that covaries with perceptual choice behaviour and with the latent decision variable inferred from the HDDM. Second, in the value task, cTBS does not affect value sensitivity or accuracy, nor does it alter value-related drift or boundary parameters; the only robust HDDM effect is a modest shortening of NDT. Third, critically, left SFS BOLD activity is modulated by perceptual evidence and by cTBS in the perceptual task, but we observe no evidence that SFS activity encodes value evidence or shows value-related cTBS neuronal effects in the value task.

      Taken together, these findings indicate that the left SFS serves a causal role in the accumulation of perceptual evidence and in the setting of the choice criterion for perceptual decisions. The reviewer’s suggestion that cTBS may induce a general speeding of perceptual processes that also influences the value task is compatible with this conclusion, in the sense that any contribution of SFS to the value task is best understood as acting via a perceptual component that is upstream of value comparison, rather than via the value accumulation process itself. We have clarified this point in the Discussion of the revised manuscript and now explicitly acknowledge that our DDM dissociation alone does not exclude a general perceptual speeding account, but that the combination of task-specific neural effects in SFS, preserved value-based choice behaviour, and the absence of value-related BOLD changes in SFS strongly support a primarily perceptual role for this region.

      Reviewer #2 (Public review):

      Summary:

      The authors set out to test whether a TMS-induced reduction in excitability of the left Superior Frontal Sulcus influenced evidence integration in perceptual and value-based decisions. They directly compared behaviour-including fits to a computational decision process model---and fMRI pre and post TMS in one of each type of decision-making task. Their goal was to test domain-specific theories of the prefrontal cortex by examining whether the proposed role of the SFS in evidence integration was selective for perceptual but not value-based evidence.

      Strengths:

      The paper presents multiple credible sources of evidence for the role of the left SFS in perceptual decision making, finding similar mechanisms to prior literature and a nuanced discussion of where they diverge from prior findings. The value-based and perceptual decision-making tasks were carefully matched in terms of stimulus display and motor response, making their comparison credible.

      We thank the reviewer for their clear summary of our aims and approach, and for highlighting these strengths. We are pleased that the convergence between causal TMS, fMRI, and hierarchical modelling comes across as providing credible evidence for the role of left SFS in perceptual decision-making, and that our attempt to link these results to the existing literature is seen as appropriately nuanced. We also appreciate the reviewer’s positive assessment of the task design, in particular the close matching of perceptual content and motor output across perceptual and value-based decisions, which was central to our goal of testing domain-specific theories of prefrontal function. In revising the manuscript, we have further clarified these design choices and their rationale, and we have streamlined the exposition of how the hypotheses, model parameters, and neural readouts are connected across the two decision domains.

      Weaknesses:

      I was confused about the model specification in terms of the relationship between evidence level and drift rate. While the methods (and e.g. supplementary figure 3) specify a linear relationship between evidence level and drift rate, suggesting, unless I misunderstood, that only a single drift rate parameter (kappa) is fit. However, the drift rate parameter estimates in the supplementary tables (and response to reviewers) do not scale linearly with evidence level.

      We thank the reviewer for raising this point and appreciate the opportunity to clarify the model specification. In our hierarchical DDM, we did not fit separate, free drift parameters for each evidence level. As shown in Supplementary Fig. 3, the drift on each trial is specified as

      where 𝐸<sub>𝑐,𝑠,𝑖</sub> the trial-wise evidence (difference in size or value) and κ<sub>𝑐,𝑠</sub> is a single drift-scaling parameter per condition and session. Thus, the linear dependence of drift on evidence is implemented at the trial level via 𝜅; we do not estimate independent 𝛿 parameters for each evidence level.

      In Supplementary Tables 8 and 9 we report, for descriptive purposes, the posterior means of 𝛿 conditional on each evidence bin (levels 1–4), alongside the corresponding decision boundary and nondecision time summaries. These values are therefore derived quantities that reflect the combination of (i) the single κ<sub>𝑐,𝑠</sub> parameter, (ii) the empirical distribution of continuous evidence values 𝐸 within each bin, and (iii) hierarchical pooling across subjects and sessions. Consequently, they are expected to increase monotonically with evidence level—as they do in our data—but not to lie exactly on a straight line in the discrete level index, because the underlying evidence bins are not equally spaced in physical units and because of between-subject variability and posterior uncertainty.

      We will revise the text and table captions to make clear that the evidence-level entries are descriptive summaries of 𝛿 implied by the 𝜅×𝐸 formulation, rather than independently estimated drift parameters, in order to avoid this confusion.

      -The fit quality for the value-based decision task is not as good as that for the PDM, and this would be worth commenting on in the paper.

      We agree that the HDDM fit for the value-based task is somewhat weaker than for the perceptual task. This is reflected in the somewhat higher DIC values for VDM compared with PDM and in slightly broader posterior-predictive distributions (Supplementary Tables 8–11 and Supplementary Figs. 11–16). We believe this difference primarily reflects the greater intrinsic variability of subjective value-based choices (e.g. trial-to-trial fluctuations in preferences, satiety, or attention), coupled with our decision to use the same relatively simple DDM architecture for both tasks to allow a principled cross-task comparison. Importantly, posterior-predictive checks show that, for VDM as well, the model adequately reproduces both accuracy and full RT distributions at the group and subject level (Supplementary Figs. 11–16), indicating that the fit quality is sufficient for our purposes. In the revised manuscript we now explicitly note that the model captures PDM behaviour more tightly than VDM and that this may reduce sensitivity to very small cTBS effects on value-based decision parameters, even though no systematic effects are evident in our data. Crucially, our central conclusion—that left SFS plays a domain-specific role in setting the decision boundary for perceptual evidence—relies on the robust behavioural, computational, and neural effects observed in PDM and does not depend on assuming a perfect model fit for VDM.

      - Supplementary Figure 3 specifies the distribution for kappa hyper-parameter twice.

      We thank the reviewer for spotting this typo. We have revised Supplementary Figure 3 legend.

    1. Reviewer #1 (Public review):

      Summary:

      Age-related synaptic dysfunction can have detrimental effects on cognitive and locomotor function. Additionally, aging makes the nervous system vulnerable to late-onset neurodegenerative diseases. This manuscript by Marques et al. seeks to profile the cell surface proteomes of glia to uncover signaling pathways that are implicated in age-related neurodegeneration. They compared the glial cell-surface proteomes in the central brain of young (day 5) and old (day 50) flies, and identified the most up- and down-regulated proteins during the aging process. 48 genes were selected for analysis in a lifespan screen, and interestingly, most sex-specific phenotypes. Among these, adult-specific pan-glial DIP-β overexpression (OE) significantly increased the lifespan of both males and females and improved their motor control ability. To investigate the effect of DIP-β in the aging brain, Marques et al. performed snRNA-seq on 50-day-old Drosophila brains with or without DIP-β OE in glia. Cortex and ensheathing glia showed the most differentially expressed genes. Computational analysis revealed that glial DIP-β OE increased cell-cell communication, particularly with neurons and fat cells.

      Strengths:

      (1) State-of-the-art methodology to reveal the cell surface proteomes of glia in young and old flies.

      (2) Rigorous analyses to identify differentially expressed proteins.

      (3) Examination of up- and down-regulated candidates and identification of glial-expressed mediators that impact fly lifespan.

      (4) Intriguing sex-specific glial genes that regulate life span.

      (5) Follow-up RNA-seq analysis to examine cellular transcriptomes upon overexpression of an identified candidate (DIP-β).

      (6) A compelling dataset for the community that should generate extensive interest and spawn many projects.

      Weaknesses:

      (1) DIP-β OE using flySAM:

      a) These flies showed a larger increase in lifespan compared to using UAS-DIP-β (Figure 2 C, D). Do the authors think that flySAM is a more efficient way of OE than UAS? Also, the UAS construct would be specific to one DIP-β isoform, while flySAM would likely express all isoforms. Could this also contribute to the phenotypes observed?

      b) The Glial-GS>DIP-β flySAM flies without RU-486 have significantly shorter lifespans (Figure 2C) than their UAS-DIP-β counterparts. flySAM is lethal when expressed under the control of tubulin-GAL4 (Jia et al. 2018), likely due tothe toxicity of such high levels of overexpression. Is it possible that a larger increase in lifespan is due to the already reduced viability of these flies?

      c) Statistics: It is stated in the Methods that "statistical methods used are described in the figure legend of each relevant panel." However, there is no description of the statistics or sample sizes used in Figure 2.

      (2) Figure 3: The authors use a glial GeneSwitch (GS) to knock down and overexpress candidate genes. In Figure 3A, they look at glial-GS>UAS-GFP with and without RU. Without RU, there is no GFP expression, as expected. With RU, there is GFP expression. It is expected that all cell body GFP signal should colocalize with a glial nuclear marker (Repo). However, there is some signal that does not appear to be glia. Also, many glia do not express GFP, suggesting the glial GS driver does not label all glia. This could impact which glia are being targeted in several experiments.

      (3) It is interesting that sex-specific lifespan effects were observed in the candidate screen.

      a) The authors should provide a discussion about these sex-specific differences and their thoughts about why these were observed.

      b) The authors should also provide information regarding the sex of the flies used in the glial cell surface proteome study.

      c) Also, beyond the scope of this study, examining sex-specific glial proteomes could reveal additional insights into age-related pathways affecting males and females differentially.

      (4) The behavioral assay used in this study (climbing) tests locomotion driven by motor neurons. The proteomic analysis was performed with the central adult brain, which does not include the nerve cord, where motor neurons reside. While likely beyond the scope of this study, it would be informative to test other behaviors, including learning, circadian rhythms, etc.

      (5) It is surprising that overexpressing a CAM in glia has such a broad impact on the transcriptomes of so many different cell types. Could this be due to DIP-β OE maintaining the brain in a "younger" state and indirectly influencing the transcriptomes? Instead of DIP-β OE in glia directly influencing cell-cell interactions? Can the authors comment on this?

    2. Reviewer #2 (Public review):

      This manuscript presents an ambitious and technically innovative study that combines in situ cell-surface proteomics, functional genetic screening, and single-nucleus RNA sequencing to uncover glial factors that influence aging in Drosophila. The authors identify DIP-β as a glial protein whose overexpression extends lifespan and report intriguing sex-specific differences in lifespan outcomes. Overall, the study is conceptually compelling and offers a valuable dataset that will be of considerable interest to researchers studying glia-neuron communication, aging biology, and proteomic profiling in vivo.

      The in-situ proteomic labeling approach represents a notable methodological advance. If validated more extensively, it has the potential to become a widely used resource for probing glial aging mechanisms. The use of an inducible glial GeneSwitch driver is another strength, enabling the authors to carefully separate aging-relevant effects from developmental confounds. These technical choices meaningfully elevate the rigor of the study and support its central conclusions. The discovery of new candidate genes from the proteomics pipeline, including DIP-β, is intriguing and opens new avenues for understanding glial contributions to organismal lifespan. The observation of sex-specific lifespan effects is particularly interesting and warrants further exploration; the study sets the stage for future work in this direction.

      At the same time, several areas would benefit from clarification or additional analysis to fully support the manuscript's claims:

      (1) The manuscript frequently refers to "improved" or "increased" cell-cell communication following DIP-β overexpression, but the meaning of this term remains somewhat vague. Because the current analysis relies largely on transcriptomic predictions, it would be helpful to define precisely what metric is being used, e.g., increased numbers of predicted ligand-receptor interactions, enrichment of specific signaling pathways, or altered expression of communication-related components. Strengthening the mechanistic link between DIP-β, cell-cell communication, and lifespan extension, potentially through targeted validation of specific glial interactions, would substantially reinforce the interpretation.

      (2) The lifespan screen is central to the paper, and clearer visualization and contextualization of these results would significantly improve the manuscript's impact. For example, Figure 3D is challenging to interpret in its current form. More explicit presentation of which manipulations extend lifespan in each sex, along with effect sizes and significance values, would provide clarity. Including positive controls for lifespan extension would also help contextualize the magnitude of the observed effects. The reported effects of DIP-β, while promising, are modest relative to baseline effects of RU feeding, and a discussion of this would help appropriately calibrate the conclusions.

      (3) Several figures would benefit from improved labeling or more detailed legends. For instance, the meaning of "N" and "C" in Figure 1D is unclear; Figure 3A should clarify that Repo is a glial marker; and Figure 5C appears to have truncated labels. Reordering certain panels (e.g., moving control data in Figure 4A-B) may also improve narrative flow. These refinements would greatly aid reader comprehension.

      (4) A few claims would be strengthened by more specific references or acknowledgment of alternative interpretations. Examples include the phenoxy-radical labeling radius, the impact of H₂O₂ exposure, and the specificity of neutravidin. Additionally, downregulation of synapse-related GO terms may reflect age-related transcriptional changes rather than impaired glia-neuron communication per se, and this possibility should be recognized. The term "unbiased" to describe the screen may also be reconsidered, given the preselection of candidate genes.

      (5) Clarifying the rationale for focusing on central brain glia over optic-lobe glia would be useful.

    1. for - James Hansen - youtube - The truth about global warming

      Transcript

      2:47 We do not have to wait 10 years to conclude that we have reached 1.5 Degrees of warming. Satelllite data shows that earth is strongly out of energy balance.

      3:09 An important factor is that IPCC's best estimate of climate sensitivity is a substantial underestimate. I will show that tomorrow in several independent ways.

      3:28 Climate sensitivity is probably between 4 and 5 degrees Celcius for doubled CO2 rather than 3 degrees

      4:28 What we witness now is scientific reticence on steroids, perhaps because IPCC was granted the position of supreme authority

      4:43 But in science, supreme authority is not granted to anyone. Galileo proved that.

      4:55 An example of expert herd mentality is the response to our global warming acceleration paper which Annie was coauthor on. The next day, these experts condemned our paper in the media.

      5:26 Not one of them discussed the physics in our paper or explained what was wrong. Instead there were ad hominem remarks.

      5:51 What could the media do They dropped the paper.

  2. bafybeihwigujdzh7xrbwmf2t2zv5eku6cr3reb5qzqmhgrpnfdd2ryhh7y.ipfs.dweb.link bafybeihwigujdzh7xrbwmf2t2zv5eku6cr3reb5qzqmhgrpnfdd2ryhh7y.ipfs.dweb.link
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      1. First, the authors have not convincingly shown that skin cells, or more specifically skin ECs, are a major source of circulating G-CSF in the psoriasis model as stated in the title and abstract. The data in Figure 4 show selective upregulation of Csf3 gene in skin ECs and their ability to secrete G-CSF upon IMQ treatment in vitro. However, the provided data do not address to what degree the skin EC-derived G-CSF contributes to the elevated level of circulating G-CSF. Additional experiments to selectively deplete G-CSF in skin ECs, or at least in skin cells of the affected site, are warranted to support the authors' claim. Does intradermal injection of G-CSF neutralizing antibody into the psoriatic skin reduce circulating levels of G-CSF?

      Author's response:

      Thank you for reviewer's comment. We agree with the Reviewer#1 that it is important to directly block G-CSF to the skin via intradermal injection and measure the G-CSF level in the serum afterwards. Therefore, we will perform intradermal injection of IgG-isotype or anti-G-CSF antibody into the IMQ-induced psoriatic mice.

      Another concern is insufficient demonstration of G-CSF-mediated emergency granulopoiesis in the psoriasis model. All data in Figure 5 were obtained from experiments with only n=3, and adding more replicates, in particular to those in Figure 5B, which show quite some variation in MPP numbers, is recommended. The relatively small reduction of BM granulocyte numbers (Figure 5C) compared to greater depletion of circulating granulocytes (Figure S5A) raises the possibility that it is the mobilization effect rather than granulopoiesis-stimulating effect that skin-derived G-CSF exerts to promote supply of circulating neutrophils that eventually infiltrate into the affected skin. This could also explain the negligible effect of IL-1blockade (Figure S4), which selectively shut off myelopoiesis-stimulating effect of IL-1 (Pietras et al. Nat Cell Biol 2016, PMID: 27111842). Are the HSPCs in the psoriasis model more cycling? Do they show myeloid-skewed differentiation when cultured ex vivo or upon transplantation?

      Author's response: Thank you for these critical comments. We agree to do the following experiments to address them:

      1) HSPCs quantification in Figure 5 especially the MPPs will be added with more replicates.

      2) We will assess cycling status of HSPCs by flow cytometric analysis of Ki67and Propidium Iodide to characterize G0, G1 and G2/M cell cycle phase.

      3) To test myeloid-skewed differentiation, Lin- c-Kit+ Sca-1+ cells containing HSPCs will be isolated from bone marrow of Vas/IMQ-treated mice and transplanted into lethally irradiated syngeneic mice.

      The authors' claim that skin-derived G-CSF "induces" neutrophil infiltration warrants further clarification. Alternative explanation is that the upregulated neutrophil-attracting chemokines (Figure S1D) could induce infiltration, whereas G-CSF increase the number of neutrophils to circulate in the vessels near the psoriatic skin. This notion seems supported elsewhere (Moos et al. J Invest Dermatol. 2019, PMID: 30684554). Can the infiltration be inhibited by systemically injecting neutralizing antibody of their receptor, CXCR2?

      Author's response: The manuscript focuses on the skin-derived G-CSF function as a long-distance signal for emergency granulopoiesis in the bone marrow upon psoriasis, not the chemoattractant property of it. The sentence of interest is "We found that upon psoriasis induction, skin-resident endothelial cells are activated to produce G-CSF which activates emergency granulopoiesis in bone marrow and induces cutaneous infiltration and accumulation of neutrophil that are functionally inflammatory." in line 28-30. In agreement with point #2 from Reviewer#2, the fact that neutrophil recruitment factors (CXCL1, CXCL2, and CXCL5) were upregulated in psoriatic skin (Figure S1D), suggesting a CXCL-mediated neutrophil recruitment. The sentence of concern need to be changed to "We found that upon psoriasis induction, skin-resident endothelial cells are activated to produce G-CSF which activates emergency granulopoiesis in bone marrow, leading to cutaneous accumulation of neutrophil that are functionally inflammatory.". This revised sentence has omitted the proposal that G-CSF directly dictates neutrophils mobilization to the skin, which is not the key message of the study. Therefore, we found that the CXCR2 (CXCLs receptor) blockade experiment may be of the benefit of future studies.

      It remains unclear how skin-derived G-CSF accumulates pathogenic neutrophils. The authors state "pathogenic granulopoiesis," but are the circulating neutrophils in the psoriatic mice already "pathogenic" or do they acquire pathogenic phenotype after cutaneous infiltration? Additional RNA-seq to compare circulating and infiltrated neutrophils would answer this question.

      Author's response: We appreciate this valuable comment. We will perform RNA-seq with the peripheral blood-circulating neutrophils (CD45+ CD11b+ Ly6G+ Ly6Cmid) versus skin-infiltrating neutrophils from both Vas/IMQ mice.

      In addition, how the accumulated pathogenic neutrophils exacerbate the psoriatic changes remains obscure. Although the authors have attempted to correlate Il17a gene expression in infiltrated neutrophils with psoriatic skin changes, the data do not address to what degree it contributes to cutaneous IL-17A protein levels. The data that cutaneous neutrophil depletion leads to subtle decrease in skin IL-17A expression (Figure 2H) rather supports alternative possibilities. For instance, as indicated elsewhere, IL-17A cutaneous tone could be enhanced by neutrophil-mediated augmentation of Th17 or gamma/delta T cell function (Lambert et al. J Invest Dermatol. 2019, PMID: 30528823). Does neutrophil depletion or G-CSF neutralization alter cell numbers or function of cutaneous Th17 and gamma/delta T cells?

      Author's response: Thank you for this insightful comment. To better understand the relative contribution of neutrophils to the cutaneous IL-17A tone in the psoriatic skin, we will perform flowcytometric analysis of Th17 and gamma/delta T cells which are widely known as the major source of IL-17 in psoriatic skin of IMQ-induced mice following injection of isotype-matched or anti-Ly6G antibody.

      Finally, as the above conclusions rely solely on the IMQ-induced acute psoriasis model, it would be informative if they could be derived from another psoriasis model. IMQ is known to induce unintended systemic inflammation due to grooming-associated ingestion (Gangwar et al. J Invest Dermatol. 2022, PMID: 34953514), and "pathological crosstalk between skin and BM in psoriatic inflammation" could be strengthened by an intradermal injection model.

      Author's response: We appreciate the reviewer for bringing this important point. Regarding the systemic inflammation upon psoriasis, the above-cited study reported increased IFN-B expression in the intestines of IMQ-ingested animal (Grine L et al. Sci Rep. 2016, PMID: 26818707 in Gangwar et al. J Invest Dermatol. 2022, PMID: 34953514). We examined several pro-inflammatory cytokines including IFN-b, IFN-g, and IL-6 and in contrast, found no systemic increase in all these cytokines, except for IFN-g downregulation (Explanation Figure 1), which suggests no evidence of grooming-associated ingestion.

      We also examined the Csf3 expression across several distinctively located tissues which showed a selective upregulation in the skin (Figure 4C), suggesting a skin-restricted perturbation. In addition, one study showed that IMQ-ingestion didn't alter number of gut injury-associated CXCR3+ macrophages nor did it aggravate skin inflammation (Pinget et al. Cell Reports. 2022, PMID: 35977500). Together, these findings support that IMQ-induced psoriasis by topical cutaneous application used in our study elicit a local inflammation but not systemic inflammation.

      The authors, however, realize that testing alternative psoriasis model such as intradermal injection of IL-23 (Chan et al. J Exp Med. 2006, PMID: 17074928) will strengthen the skin-local insults within the psoriasis model employed, and should be tested in the future.

      Minor comments

      Figure 1E shows multiple elongated Ly6G+ structures in d0-2 control and d0 IMQ skins that do not appear to be neutrophils.

      Author's response: We appreciate the Reviewer#1 pointing this issue. As mentioned by the Reviewer#1, the elongated structures detected in the intravital microscopy are not neutrophils, but autofluorescence from the skin bulge regions (Wun et al. J Invest Dermatol. 2005, PMID: 15816847). We have eliminated these unspecific signals from the transformation and quantification (Figure 1F, S1G, and S1H). We will also add an explanatory sentence in Materials and Methods section "Of note, the fluorescent signal with elongated structures resembling hair bulge were autofluorescence and thus removed from further analysis." to be more precise about our methods.

      In Figure 2C, the bottom GSEA seems to be showing type II IFN response, not type I IFN, according to the text.

      Author's response: Thank you for the comment, we will correct this misspelling.

      Author's response: We appreciate that Reviewer#1 bring up this point. We examined the kinetics of the bone marrow cellularity and GMPs across 4 days of psoriasis induction in mice. The bone marrow cell number was lowered along that span with lowermost count at 2 days. Consistent to the BM-cellularity, the GMP number was also lowered about one-third in the first 2 days of psoriasis. This kinetic is consistent with the previous report showing a rapid reduction of GMPs in the bone marrow within 2 days following systemic G-CSF administration driven emergency granulopoiesis (Hirai et al. Nat. Immunol. 2006, PMID: 16751774). From 2 days to 4 days, the GMP number rapidly increased to slightly above basal number (Explanation Figure 2). This timely coordinated expansion suggests a significant supply of GMPs from the differentiating upstream myeloid progenitors (Figure 3B).

      When the psoriatic mice with elevated G-CSF is injected with anti-G-CSF or IgG-isotype antibody, the bone marrow cellularity and GMP numbers at 4 days were (Explanation Figure 3). Firstly, as psoriasis reduced bone marrow cellularity (Explanation Figure 2), the unchanged number after anti-G-CSF injection indicates that administration of 10µg/day for 4 days does not significantly affect mobilization of psoriatic bone marrow cells. Secondly, the similar GMP numbers at 4 days psoriasis is plausibly due to snapshot analysis when it has already in the numerical recovery period (Explanation Figure 2). Importantly, the notion that anti-G-CSF injection to psoriatic mice reduced granulocytes in the bone marrow, peripheral blood, and skin suggesting G-CSF as a key mediator in psoriatic driven emergency granulopoiesis on top of unlikely case of ineffective anti-G-CSF treatment.

      Taken together, these data suggest a G-CSF mediated emergency granulopoiesis occurrence in the IMQ-induced psoriasis. We will put these data into a revised Figure.

      In Figures 6B, in which cluster of human skin cells IL-17A expression would be enriched?

      Author's response: Thank you for this important point. The IL-17A expression is found in the T-cell cluster (Explanation Figure 4). We also expected to see IL-17A contribution from other cell subset(s), in particular neutrophil. However, due to the fragile nature of neutrophils and thereby, technical difficulty to get their sequencing reads, this dataset (GSE173706) doesn't contain neutrophils, but rather monocytes, macrophages, and dendritic cells among the myeloid subset (Explanation Figure 5). With this, it leaves open the question on what potential contribution of IL-17A produced by neutrophils is in human psoriasis (Reich et al. Exp. Dermatol. 2015, PMID: 25828362).

      Figure 1E shows multiple elongated Ly6G+ structures in d0-2 control and d0 IMQ skins that do not appear to be neutrophils.

      Author's response: We appreciate the Reviewer#1 pointing this issue. As mentioned by the Reviewer#1, the elongated structures detected in the intravital microscopy are not neutrophils, but autofluorescence from the skin bulge regions (Wun et al. J Invest Dermatol. 2005, PMID: 15816847). We have eliminated these unspecific signals from the transformation and quantification (Figure 1F, S1G, and S1H). We will also add an explanatory sentence in Materials and Methods section "Of note, the fluorescent signal with elongated structures resembling hair bulge were autofluorescence and thus removed from further analysis." to be more precise about our methods.

      In Figure 2C, the bottom GSEA seems to be showing type II IFN response, not type I IFN, according to the text.

      Author's response: Thank you for the comment, we will correct this misspelling.

      Reviewer#2

      1. Interpretation of neutrophil transcriptomic changes (Figure 2)

      The RNA-seq analysis reveals substantial downregulation of several canonical pro inflammatory pathways in neutrophils from psoriatic skin, including IL-6, IL-1, and type II interferon signaling. The authors should discuss the functional relevance of this unexpected transcriptional repression. For example, does this indicate a shift toward specialized effector functions rather than classical cytokine responsiveness? More importantly, the most striking transcriptional change is the upregulation of NADPH oxidase-related genes (e.g., Nox1, Nox3, Nox4, Enox2). This suggests an oxidative stress-driven pathogenic mechanism, potentially more relevant than IL-17A production. Yet this aspect is not explored in the manuscript. Assessing ROS levels or oxidative neutrophil effector functions in this model would considerably strengthen the mechanistic link. Conversely, although IL-17A is upregulated in neutrophils, neutrophil depletion reduces total Il17a expression in skin only partially. This indicates that neutrophils are unlikely to be the dominant IL-17A source in the lesion. The authors' focus on neutrophil-derived IL 17A therefore seems overstated. A more rigorous assessment-e.g., conditional deletion of Il17a specifically in neutrophils-would be required to establish its true contribution. Taken together, the data suggest that oxidative programs, rather than IL-17A production, may represent the principal pathogenic axis downstream of neutrophils, and this deserves deeper discussion.

      Author's response: Thank you for raising this valuable views. We have agreed to address these critical points by the following approaches:

      1) To address the changes in NADPH oxidase-related gene signature, we will measure ROS production in the neutrophils from skin and peripheral blood with DHR123.

      2) Responding to the IL17A contribution by neutrophils, we will flow cytometrically assess the Th17 and gamma/delta T cell population in the skin of psoriatic mice treated with anti-Ly6G or isotype-matched antibody as was suggested by Reviewer#1.

      3) We will discuss downregulation of the canonical pro inflammatory and IL-17 pathways in the psoriatic neutrophils in the discussion.

      Human data reanalysis (Figure 6):

      The re-analysis of bulk and single-cell RNA-seq datasets is valuable but incomplete. Several mechanistically relevant questions could be addressed with the available data:

      2.1. GM-CSF (CSF2) is also strongly upregulated in psoriatic lesions (bulk RNA-seq). It would be informative to determine whether endothelial cells also express CSF2 in the scRNA-seq dataset, as this would suggest coordinated regulation of myeloid-supporting cytokines.

      2.2. Myeloid cell subsets should be examined more closely. A comparison of human myeloid transcriptomes with the mouse neutrophil RNA-seq would clarify whether similar IL-17A-related or NADPH oxidase-related signatures occur in human disease. In particular, which cell types express IL17A in human lesions?

      2.3. Chemokine production should be attributed to specific cell types. Bulk RNA-seq confirms strong induction of CXCL1, CXCL2, CXCL5, but the scRNA-seq dataset allows determining whether these chemokines originate from endothelial cells or other stromal/immune populations. This information is important for defining whether endothelial cells coordinate both neutrophil recruitment and granulopoiesis.

      Addressing these points would make the human-mouse comparison substantially stronger.

      Author's response: Thank you for pointing these important issues. By reanalyzing the dataset, we found several points regarding the comments, as follows:

      2.1) CSF2 is expressed by T-cell cluster in the human skin dataset (Explanation Figure 4), in agreement with previous murine study (Hartwig et al. Cell Reports. 2018, PMID: 30590032). We will add this data in the revised manuscript.

      2.2) In line with point#10 from Reviewer#1, the dataset clearly shows T-cell cluster as the main IL17A source (Explanation Figure 4 above). The dataset, however, doesn't contain phenotypic neutrophils (CEACAM (CD66b) and PGLYRP1) but monocytes, macrophages, and dendritic cells (Explanation Figure 5 above). This loss was probably due to a technical limitation given the difficulty in capturing sequencing reads from fragile neutrophils. Therefore, it is no longer possible to reanalyze IL-17 expression in the absence of neutrophils in the datapool.

      2.3) Reanalysis of CXCLs in the human scRNAseq dataset (GSE173706) clarified their secretion dynamics and cellular sources under normal and psoriatic condition. In normal skin, all examined cell subsets show only low CXCLs expression. In contrast, psoriatic skin exhibits significant CXCLs upregulation with distinct cell subsets clearly showing dramatic upregulation, potentially being the major CXCLs source. CXCL1 is markedly upregulated in fibroblasts, myeloid cells, and melanocyte and nerve cells. CXCL2 is strikingly upregulated to myeloid cells, while CXCL5 is hugely increased in fibroblasts, myeloid cells, and mast cells (Explanation Figure 7). Taken together, these results suggest that CXCLs upregulation in the psoriatic skin is coordinatively executed by both stromal and immune compartments. Of note, the endothelial cells show minimal changes in CXCLs expression, even downregulate CXCL2 in psoriasis, indicating that they are unlikely to be the major contributor to CXCL-mediated neutrophil recruitment.

      **Referees cross-commenting**

      I agree with Reviewer 1 that the contribution of EC-derived G-CSF to circulating G-CSF levels and to emergency myelopoiesis requires additional genetic or neutralization experiments to be fully established.

      Author's response: We appreciate that Reviewer#2 raised this key point. In addition to examining the serum G-CSF upon intradermal anti-G-CSF administration in point#1 from Reviewer#1 above, we will also examine the emergency myelopoiesis signs in vivo.

      Minor points

      1. Line 319: the text likely refers to Figure S4, not S3.

      Author's response: Thank you, we will correct the nomenclature.

      Line 338: "psoriatic" is misspelled.

      Author's response: Thank you, we will change this to "psoriatic".

      Reviewer #3

      • Place the work in the context of the existing literature (provide references, where appropriate).

      Psoriasis is extensively studied, a good recent reference- https://doi.org/10.1016/j.mam.2024.101306

      Author's response: Thank you for Reviewer#3's suggestion. The referenced study highlights the current paradigm that largely focus on skin-restricted mechanism and overlook potential cross-organ interaction in the psoriasis inflammation. Our findings provide a new insight into the skin-bone marrow crosstalk in the disease context. In addition, the suggested reference underscores the key roles of diverse innate immune cells including neutrophils, eosinophils, dendritic cells, etc. which is fundamental for our study and might also guide future exploration of additional innate cell subsets beyond neutrophils. We will therefore include the mentioned reference to our revised manuscript.

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      It is all good. May add graphical-abstract.

      Author's response: Thank you for the reviewer's input, we agree that a graphical-abstract will help the readers more clearly grasp the key messages of our manuscript. We will include it in the revised manuscript.

      Major comments:

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      No. It is very solid.

      Author's response: We appreciate the reviewer's view that the claims in our paper are solid.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      Such a discovery clearly opens many options, and it is fascinating to suggest additional experiments for future studies. It is a complete study, best to publish as-is and let many to read and proceed with this new concept.

      Author's response: We thank the reviewer for noting that the current experimental evidence is complete that no additional experiments are necessary at this stage. We agree that the discovery opens prospective directions for future studies.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      N/A - I suggest no additional experiments at this point. Get it published and see how many will follow this new direction!

      Author's response: We thank the reviewer for recognizing that the experimental data has been sufficient to be a foundation for the future research.

      • Are the data and the methods presented in such a way that they can be reproduced?

      Yes.

      Author's response: We thank the reviewer for recognizing that our methods are reproducible.

      • Are the experiments adequately replicated, and is the statistical analysis adequate?

      Yes. The data are of very high quality.

      Author's response: We are grateful that the reviewer view our replication strategy and statistical analysis are of a high quality.

      Minor comments:

      • Specific experimental issues that are easily addressable.

      None. It is good as-is. One may always suggest minor things- but this one is better published so many laboratories may rush for this new direction. I think it will be interesting studying some long-term impacts, and changes not only of neutrophils but also of other innate cells, such as DCs, Macrophages, and Eosinophils - so it is best to let laboratories that focus on these cells know of the discovery and pursue independent studies.

      Author's response: We appreciate the reviewer's assessment that our paper is already well set for the community to explore the newly proposed direction.

      • Are the text and figures clear and accurate?

      Yes.

      Author's response: We thank the reviewer's evaluation. We have ensured that the text and figures in our manuscript are clear and accurate. Once again, we thank the reviewer for the encouraging and constructive appraisal. We are pleased that the reviewer find the manuscript has already been strong and suitable for publication.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      A role of neutrophils in psoriasis pathogenesis has been highlighted by several past studies; however, how the neutrophils are recruited to the affected skin has not been fully understood. The work by Kosasih et al. tackles a relevant question and has investigated the effect of psoriatic skin inflammation on BM myelopoiesis. Using an IMQ-induced acute psoriasis mouse model, the authors derive 3 major conclusions: (1) skin ECs secrete G-CSF into circulation in response to psoriatic stress, (2) skin EC-derived G-CSF stimulates emergency granulopoiesis, and (3) skin EC-derived G-CSF induces infiltration and accumulation of pathogenic neutrophils in the affected skin. The authors provide many pieces of interesting data, but most of them remain correlative and insufficient to support the conclusions. Many of the experiments were performed in a small number of samples or mice (mostly with n=3), leaving the story still preliminary.

      Major comments:

      1. First, the authors have not convincingly shown that skin cells, or more specifically skin ECs, are a major source of circulating G-CSF in the psoriasis model as stated in the title and abstract. The data in Figure 4 show selective upregulation of Csf3 gene in skin ECs and their ability to secrete G-CSF upon IMQ treatment in vitro. However, the provided data do not address to what degree the skin EC-derived G-CSF contributes to the elevated level of circulating G-CSF. Additional experiments to selectively deplete G-CSF in skin ECs, or at least in skin cells of the affected site, are warranted to support the authors' claim. Does intradermal injection of G-CSF neutralizing antibody into the psoriatic skin reduce circulating levels of G-CSF?
      2. Another concern is insufficient demonstration of G-CSF-mediated emergency granulopoiesis in the psoriasis model. All data in Figure 5 were obtained from experiments with only n=3, and adding more replicates, in particular to those in Figure 5B, which show quite some variation in MPP numbers, is recommended. The relatively small reduction of BM granulocyte numbers (Figure 5C) compared to greater depletion of circulating granulocytes (Figure S5A) raises the possibility that it is the mobilization effect rather than granulopoiesis-stimulating effect that skin-derived G-CSF exerts to promote supply of circulating neutrophils that eventually infiltrate into the affected skin. This could also explain the negligible effect of IL-1blockade (Figure S4), which selectively shut off myelopoiesis-stimulating effect of IL-1 (Pietras et al. Nat Cell Biol 2016, PMID: 27111842). Are the HSPCs in the psoriasis model more cycling? Do they show myeloid-skewed differentiation when cultured ex vivo or upon transplantation?
      3. The authors' claim that skin-derived G-CSF "induces" neutrophil infiltration warrants further clarification. Alternative explanation is that the upregulated neutrophil-attracting chemokines (Figure S1D) could induce infiltration, whereas G-CSF increase the number of neutrophils to circulate in the vessels near the psoriatic skin. This notion seems supported elsewhere (Moos et al. J Invest Dermatol. 2019, PMID: 30684554). Can the infiltration be inhibited by systemically injecting neutralizing antibody of their receptor, CXCR2?
      4. It remains unclear how skin-derived G-CSF accumulates pathogenic neutrophils. The authors state "pathogenic granulopoiesis," but are the circulating neutrophils in the psoriatic mice already "pathogenic" or do they acquire pathogenic phenotype after cutaneous infiltration? Additional RNA-seq to compare circulating and infiltrated neutrophils would answer this question.
      5. In addition, how the accumulated pathogenic neutrophils exacerbate the psoriatic changes remains obscure. Although the authors have attempted to correlate Il17a gene expression in infiltrated neutrophils with psoriatic skin changes, the data do not address to what degree it contributes to cutaneous IL-17A protein levels. The data that cutaneous neutrophil depletion leads to subtle decrease in skin IL-17A expression (Figure 2H) rather supports alternative possibilities. For instance, as indicated elsewhere, IL-17A cutaneous tone could be enhanced by neutrophil-mediated augmentation of Th17 or gamma/delta T cell function (Lambert et al. J Invest Dermatol. 2019, PMID: 30528823). Does neutrophil depletion or G-CSF neutralization alter cell numbers or function of cutaneous Th17 and gamma/delta T cells?
      6. Finally, as the above conclusions rely solely on the IMQ-induced acute psoriasis model, it would be informative if they could be derived from another psoriasis model. IMQ is known to induce unintended systemic inflammation due to grooming-associated ingestion (Gangwar et al. J Invest Dermatol. 2022, PMID: 34953514), and "pathological crosstalk between skin and BM in psoriatic inflammation" could be strengthened by an intradermal injection model.

      Minor comments:

      1. Figure 1E shows multiple elongated Ly6G+ structures in d0-2 control and d0 IMQ skins that do not appear to be neutrophils.
      2. In Figure 2C, the bottom GSEA seems to be showing type II IFN response, not type I IFN, according to the text.
      3. For the BM analysis in Figures 3, 5, S3, and S5, it would be informative if BM cellularity and numbers of committed myeloid progenitors (e.g., GMPs) are shown.
      4. In Figures 6B, in which cluster of human skin cells IL-17A expression would be enriched?

      Significance

      Although quite a few studies have reported various examples of emergency myelopoiesis (Swann et al. Nat Rev Immunol. 2024, PMID: 38467802), there is limited evidence on its occurrence and involvement in locally restricted disease, such as periodontitis (Li et al. Cell 2022, PMID: 35483374; 35483374). As an HSC biologist, I see this study is conceptually interesting as it could extend the above concept to psoriasis, a non-infectious, local inflammatory disease in the skin, and describes a potential causal link between skin-derived G-CSF and emergency myelopoiesis. That said, as detailed in the first section, the conclusions, especially that related to emergency myelopoiesis driven by skin-derived G-CSF, need to be more convincingly supported before taking its value. The findings offer additional understanding of how psoriasis is developed in concert with aberrant hematopoiesis and will be relevant to those working in the field of dermatology, immunology, and hematology.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary The manuscript by Aarts et al. explores the role of GRHL2 as a regulator of the progesterone receptor (PR) in breast cancer cells. The authors show that GRHL2 and PR interact in a hormone-independent manner and based on genomic analyses, propose that they co-regulate target genes via chromatin looping. To support this model, the study integrates both newly generated and previously published datasets, including ChIP-seq, CUT&RUN, RNA-seq, and chromatin interaction assays, in breast cancer cell models (T47DS and T47D).

      Major comments: R1.1 Novelty of GRHL2 in steroid receptor biology The role of GRHL2 as a co-regulator of steroid hormone receptors has previously been described for ER (J Endocr Soc. 2021;5(Suppl 1):A819) and AR (Cancer Res. 2017;77:3417-3430). In the ER study, the authors also employed a GRHL2 ΔTAD T47D cell model. Therefore, while this manuscript extends GRHL2 involvement to PR, the contribution appears incremental rather than conceptual.

      We are fully aware of the previously described role of GRHL2 as a co-regulator of steroid hormone receptors, particularly ER and AR. As acknowledged in our introduction (lines 104-108), we explicitly state: "Grainyhead-like 2 (GRHL2) has recently emerged as a potential pioneer factor in hormone receptor-positive cancers, including breast cancer21. However, nearly all studies to date have focused on GRHL2 in the context of ER and estrogen signaling, leaving its role in PR- and progesterone-mediated regulation unexplored22-26".

      As for the specific publications that the reviewer refers to: The first refers to an abstract from an annual meeting of the Endocrine Society. As we have been unable to assess the original data underpinning the abstract - including the mentioned GRHL2 DTAD model - we prefer not to cite this particular reference. We do cite other work by the same authors (Reese et al. 2022, our ref. 25). We also cite the AR study mentioned by the reviewer (our ref. 55) in our discussion. As such, we think we do give credit to prior work done in this area.

      By characterizing GRHL2 as a co-regulator of the progesterone receptor (PR), we expand on the current understanding of GRHL2 as a common transcriptional regulator within the broader context of steroid hormone receptor biology. Given that ER and PR are frequently co-expressed and active within the same breast cancer cells, our findings raise the important possibility that GRHL2 may actively coordinate or modulate the balance between ER- and PR-driven transcriptional programs, as postulated in the discussion paragraph.

      Importantly, we also functionally link PR/GRHL2-bound enhancers to their target genes (Fig5), providing novel insights into the downstream regulatory networks influenced by this interaction. These results not only offer a deeper mechanistic understanding of PR signaling in breast cancer but also lay the groundwork for future comparative analyses between GRHL2's role in ER-, AR-, and PR-mediated gene regulation.

      As such, we respectfully suggest that our work offers more than an incremental advance in our knowledge and understanding of GRHL2 and steroid hormone receptor biology.

      R1.2 Mechanistic depth The study provides limited mechanistic insight into how GRHL2 functions as a PR co-regulator. Key mechanistic questions remain unaddressed, such as whether GRHL2 modulates PR activation, the sequential recruitment of co-activators/co-repressors, engages chromatin remodelers, or alters PR DNA-binding dynamics. Incorporating these analyses would considerably strengthen the mechanistic conclusions.

      Although our RNA-seq data demonstrate that GRHL2 modulates the expression of PR target genes, and our CUT&RUN experiments show that GRHL2 chromatin binding is reshaped upon R5020 exposure, we acknowledge that we have not further dissected the molecular mechanisms by which GRHL2 functions as a PR co-regulator.

      We did consider several follow-up experiments to address this, including PR CUT&RUN in GRHL2 knockdown cells, CUT&RUN for known co-activators such as KMT2C/D and P300, as well as functional studies involving GRHL2 TAD and DBD mutants. However, due to technical and logistical challenges, we were unable to carry out these experiments within the timeframe of this study.

      That said, we fully recognize that such approaches would provide deeper mechanistic insight into the interplay between PR and GRHL2. We have therefore explicitly acknowledged this limitation in our limitations of the study section (line 502-507) and mention this as an important avenue for future investigation.

      R1.3 Definition of GRHL2-PR regulatory regions (Figure 2) The 6,335 loci defined as GRHL2-PR co-regulatory regions are derived from a PR ChIP-seq performed in the presence of hormone and a GRHL2 ChIP-seq performed in its absence. This approach raises doubts about whether GRHL2 and PR actually co-occupy these regions under ligand stimulation. GRHL2 ChIP-seq experiments in both hormone-treated and untreated conditions are necessary to provide stronger support for this conclusion.

      Although bulk ChIP-seq cannot definitively demonstrate simultaneous binding of PR and GRHL2 at the same genomic regions, we agree that the ChIP-seq experiments we present do not provide a definitive answer on if GRHL2 and PR co-occupy these regions under ligand stimulation. As a first step to address this, we performed CUT&RUN experiments for both GRHL2 and PR under untreated and R5020-treated conditions. These experiments revealed a subset of overlapping PR and GRHL2 binding sites (approximately {plus minus}5% of the identified PR peaks under ligand stimulation).

      We specifically chose CUT&RUN to minimize artifacts from crosslinking and sonication, thereby reducing background and enabling the mapping of high-confidence direct DNA-binding events: Given that a fraction of GRHL2 physically interacts with PR (Fig1D), it is possible that ChIP-seq detects indirect binding of GRHL2 at PR-bound sites and vice versa. CUT&RUN, by contrast, allows us to identify direct binding sites with higher confidence.

      Nonetheless, although outside the scope of the current manuscript, we agree that a dedicated GRHL2 ChIP with and without ligand stimulation would provide additional insight, and we have accordingly added this suggestion to the discussion (line 502-507).

      R1.4 Cell model considerations The manuscript relies heavily on the T47DS subclone, which expresses markedly higher PR levels than parental T47D cells (Aarts et al., J Mammary Gland Biol Neoplasia 2023; Kalkhoven et al., Int J Cancer 1995). This raises concerns about physiological relevance. Key findings, including co-IP and qPCR-ChIP experiments, should be validated in additional breast cancer models such as parental T47D, BT474, and MCF-7 cells to generalize the conclusions. Furthermore, data obtained from T47D (PR ChIP-seq, HiChIP, CTCF and Rad21 ChIP-seq) and T47DS (RNA-seq, CUT&RUN) are combined along the manuscript. Given the substantial differences in PR expression between these cell lines, this approach is problematic and should be reconsidered.

      We agree that physiological relevance is important to consider. Here, all existing model systems have some limitations. In our experience, it is technically challenging to robustly measure gene expression changes in parental T47D cells (or MCF7 cells, for that matter) in response to progesterone stimulation (Aarts et al., J Mammary Gland Biol Neoplasia 2023). As we set out to integrate PR and GRHL2 binding to downstream target gene induction, we therefore opted for the most progesterone responsive model system (T47DS cells). We agree that observations made in T47D and T47DS cells should not be overinterpreted and require further validation. We have now explicitly acknowledged this and added it to the discussion (line 507-509).

      As for the reviewer's suggestion to use MCF7 cells: apart from its suboptimal PR-responsiveness, this cell line is also known to harbor GRHL2 amplification, resulting in elevated GRHL2 levels (Reese et al., Endocrinology2019). By that line of reasoning, the use of MCF7 cells would also introduce concerns about physiological relevance. That being said, and as noted in the discussion (line 390-391), the study by Mohammed et al. which identified GRHL2 as a PR interactor using RIME, was performed in both MCF7 and T47D cells. This further supports the notion that the PR-GRHL2 interaction is not limited to a single cell line.

      R1.5 CUT&RUN vs ChIP-seq data The CUT&RUN experiments identify fewer than 10% of the PR binding sites reported in the ChIP-seq datasets. This discrepancy likely results from methodological differences (e.g., absence of crosslinking, potential loss of weaker binding events). The overlap of only 158 sites between PR and GRHL2 under hormone treatment (Figure 3B) provides limited support for the proposed model and should be interpreted with greater caution.

      We acknowledge the discrepancy between the number of binding sites between ChIP-seq and CUT&RUN. Indeed, methodological differences likely contribute to the differences in PR binding sites reported between the ChIP-seq and CUT&RUN datasets. As the reviewer correctly notes, the absence of crosslinking and sonication in CUT&RUN reduces detection of weaker binding events. However, it also reduces the detection of indirect binding events which could increase the reported number of peaks in ChIPseq data (e.g. the common presence of "shadow peaks").

      As also discussed in our response to R1.3, we deliberately chose the CUT&RUN approach to enable the identification of high-confidence direct DNA-binding events. Since GRHL2 physically interacts with PR, ChIP-seq could potentially capture indirect binding of GRHL2 at PR-bound sites, and vice versa. By contrast, CUT&RUN primarily captures direct DNA-protein interactions, offering a more specific binding profile. Thus, while the number of CUT&RUN binding sites is much smaller than previously reported by ChIP-seq, we are confident that they represent true, direct binding events.

      We would also like to emphasize that the model presented in figure 6 does not represent a generic or random gene, but rather a specific gene that is co-regulated by both GRHL2 and PR. In this specific case, regulation is proposed to occur via looping interactions from either individual TF-bound sites (e.g., PR-only or GRHL2-only) or shared GRHL2/PR sites. We do not propose that only shared sites are functionally relevant, nor do we assume that GRHL2 and PR must both be directly bound to DNA at these shared sites. Therefore, overlapping sites identified by ChIP-seq-potentially reflecting indirect binding events-could indeed be missed by CUT&RUN, yet still contribute to gene regulation. To clarify this, we have revised the main text (line 331-334) and the legend of Figure 6 to explicitly state that the model refers to a gene with established co-regulation by both GRHL2 and PR.

      R1.6 Gene expression analyses (Figure 4) The RNA-seq analysis after 24 hours of hormone treatment likely captures indirect or secondary effects rather than the direct PR-GRHL2 regulatory program. Including earlier time points (e.g., 4-hour induction) in the analysis would better capture primary transcriptional responses. The criteria used to define PR-GRHL2 co-regulated genes are not convincing and may not reflect the regulatory interactions proposed in the model. Strong basal expression changes in GRHL2-depleted cells suggest that much of the transcriptional response is PR-independent, conflicting with the model (Figure 6). A more straightforward approach would be to define hormone-regulated genes in shControl cells and then examine their response in GRHL2-depleted cells. Finally, integrating chromatin accessibility and histone modification datasets (e.g., ATAC-seq, H3K27ac ChIP-seq) would help establish whether PR-GRHL2-bound regions correspond to active enhancers, providing stronger functional support for the proposed regulatory model.

      We thank the reviewer for pointing this out. We now recognize that our criteria for selecting PR/GRHL2 co-regulated genes were not clearly described. To address this, we have revised our approach as per the reviewer's suggestion: we first identified early and sustained PR target genes based on their response at 4 and 24 hours of induction and subsequently overlaid this list with the gene expression changes observed in GRHL2-depleted cells. This revised approach reduced the amount of PR-responsive, GRHL2 regulated target genes from 549 to 298 (46% reduction). We consequently updated all following analyses, resulting in revised figures 4 and 5 and supplementary figures 2,3 and 4. As a result of this revised approach, the number of genes that are transcriptionally regulated by GRHL2 and PR (RNAseq data) that also harbor a PR loop anchor at or near their TSS after 30 minutes of progesterone stimulation (PR HiChIP data) dropped from 114 to 79 (30% reduction). We thank the reviewer for suggesting this more straightforward approach and want to emphasize that our overall conclusions remain unaltered.

      As above in our response to R1.3, we want to emphasize that the model presented in figure 6 does not depict a generic or randomly chosen gene, but a gene that is specifically co-regulated by both GRHL2 and PR. We also want to emphasize that the majority of GRHL2's transcriptional activity is PR-independent. This is consistent with the limited fraction of GRHL2 that co-immunoprecipitated with PR (Figure 1D), and with the well-established roles of GRHL2 beyond steroid receptor signaling. In fact, the overall importance of GRHL2 for cell viability in T47D(S) cells is underscored by our inability to generate a full knockout (multiple failed attempts of CRISPR/Cas mediated GRHL2 deletion in T47D(S) and MCF7 cells), and by the strong selection we observed against high-level GRHL2 knockdown using shRNA.

      As for the reviewer's suggestion to assess whether GRHL2/PR co-bound regions correspond to active enhancers by integrating H3K27ac and ATAC-seq data: We have re-analyzed publicly available H3K27ac and ATAC-seq datasets from T47D cells (references 42 and 43). These analyses are now added to figure 2 (F and G). The H3K27Ac profile suggests that GRHL2-PR overlapping sites indeed correspond to more active enhancers (Figure 2F), with a proposed role for GRHL2 since siGRHL2 affects the accessibility of these sites (Figure 2G).

      Minor comments Page 19: The statement that "PR and GRHL2 trigger extensive chromatin reorganization" is not experimentally supported. ATAC-seq would be an appropriate method to test this directly.

      We agree with the reviewer and have removed this sentence, as it does not contribute meaningfully to the flow of the manuscript.

      Prior literature on GRHL2 as a steroid receptor co-regulator should be discussed more thoroughly.

      We now added additional literature on GRHL2 as a steroid hormone receptor co-regulator in the discussion (line 397-401) and we cite the papers suggested by R1 in R1.1 (references 25 and 54).

      Reviewer #1 (Significance (Required)):

      The identification of novel PR co-regulators is an important objective, as the mechanistic basis of PR signaling in breast cancer remains incompletely understood. The main strength of this study lies in highlighting GRHL2 as a factor influencing PR genomic binding and transcriptional regulation, thereby expanding the repertoire of regulators implicated in PR biology.

      That said, the novelty is limited, given the established roles of GRHL2 in ER and AR regulation. Mechanistic insight is underdeveloped, and the reliance on an engineered T47DS model with supra-physiological PR levels reduces the general impact. Without validation in physiologically relevant breast cancer models and clearer separation of direct versus indirect effects, the overall advance remains modest.

      The manuscript will be of interest to a specialized audience in the fields of nuclear receptor signaling, breast cancer genomics, and transcriptional regulation. Broader appeal, including translational or clinical relevance, is limited in its current form.

      We have addressed all of these points in our response above and agree that with our implemented changes, this study should reach (and appeal to) an audience interested in transcriptional regulation, chromatin biology, hormone receptor signaling and breast cancer.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors present a study investigating the role of GRHL2 in hormone receptor signaling. Previous research has primarily focused on GRHL2 interaction with estrogen receptor (ER) and androgen receptor (AR). In breast cancer, GRHL2 has been extensively studied in relation to ER, while its potential involvement with the progesterone receptor (PR) remains largely unexplored. This is the rationale of this study to investigate the relation between PR and GRHL2. The authors demonstrate an interaction between GRHL2 and PR and further explore this relationship at the level of genomic binding sites. They also perform GRHL2 knockdown experiments to identify target genes and link these transcriptional changes back to GRHL2-PR chromatin occupancy. However, several conceptual and technical aspects of the study require clarification to fully support the authors' conclusions.

      R2.1 Given the high sequence similarity among GRHL family members, this raises questions about the specificity of the antibody used for GRHL2 RIME. The authors should address whether the antibody cross-reacts with GRHL1 or GRHL3. For example, GRHL1 shows a higher log fold change than GRHL2 in the RIME data.

      Indeed, GRHL1, GRHL2, and GRHL3 are structurally related. They share a similar domain organization and are all {plus minus}70kDa in size. Sequence similarity is primarily confined to the DNA-binding domain, with GRHL2 and GRHL3 showing 81% similarity in this region, and GRHL1 showing 63% similarity to GRHL2/3 (Ming, Nucleic Acids Res 2018).

      The antibody used, sourced from the Human Protein Atlas, is widely used in the field. It targets an epitope within the transactivation domain (TAD) of GRHL2-a region with relatively low sequence similarity to the corresponding domains in GRHL1 and GRHL3.

      We assessed the specificity of the antibody using western blotting (Supplementary Figure 2A) in T47DS wild-type and GRHL2 knockdown cells. As expected, GRHL2 protein levels were reduced in the knockdown cells providing convincing evidence that the antibody recognizes GRHL2. The remaining signal in shGRHL2 knockdown cells could either be due to remaining GRHL2 protein or due to the antibody detecting GRHL1/3. Furthermore, the observed high log-fold enrichment of GRHL1 in our RIME may reflect known heterodimer formation between GRHL1 and GRHL2, rather dan antibody cross-reactivity. As such, we cannot formally rule out cross-reactivity and have mentioned this in the limitations section (line 497-501).

      R2.2 In addition, in RIME experiments, one would typically expect the bait protein to be among the most highly enriched proteins compared to control samples. If this is not the case, it raises questions about the efficiency of the pulldown, antibody specificity, or potential technical issues. The authors should comment on the enrichment level of the bait protein in their data to reassure readers about the quality of the experiment.

      We agree with the reviewer that this information is crucial for assessing the quality of the experiment. We have therefore added the enrichment levels (log₂ fold change of IgG control over pulldown) to the methods section (line 592).

      As the reviewer notes, GRHL2 was not among the top enriched proteins in our dataset. This is due to unexpectedly high background binding of GRHL2 to the IgG control antibody/beads, for which we currently have no explanation. As a result, although we detected many unique GRHL2 peptides, observed high sequence coverage (>70%), and GRHL2 ranked among the highest in both iBAQ and LFQ values, its relative enrichment was reduced due to the elevated background. During our RIME optimization, Coomassie blue staining of input and IP samples revealed a band at the expected molecular weight of GRHL2 in the pull down samples that was absent in the IgG control (see figure 1 for the reviewer below, 4 right lanes), supporting the conclusion that GRHL2 is specifically enriched in our GRHL2 RIME samples. Combined with enrichment of some of the expected interacting proteins (e.g. KMT2C and KMT2D), we are convinced that the experiment of sufficient quality to support our conclusions.

      Figure 1 for reviewer: Coomassie blue staining of input and IP GRHL2 and IgG RIME samples. NT = non-treated, T = treated.

      R2.3 The authors report log2 fold changes calculated using iBAQ values for the bait versus IgG control pulldown. While iBAQ provides an estimate of protein abundance within samples, it is not specifically designed for quantitative comparison between samples without appropriate normalization. It would be helpful to clarify the normalization strategy applied and consider using LFQ intensities.

      We understand the reviewer's concern. Due to the high background observed in the IgG control sample (see R2.2), the LFQ-based normalization did not accurately reflect the enrichment of GRHL2, which was clearly supported by other parameters such as the number of unique peptides (see rebuttal Table 1). After discussions with our Mass Spectrometry facility, we decided to consider the iBAQ values-which reflect the absolute protein abundance within each sample-as a valid and informative measure of enrichment. In the context of elevated background levels, iBAQ provides an alternative and reliable metric for assessing protein enrichment and was therefore used for our interactor analysis.

      Unique peptides

      IBAQ GRHL2

      IBAQ IgG

      LFQ GRHL2

      LFQ IgG

      GRHL2

      52

      1753400.00

      155355.67

      5948666.67

      3085700.00

      GRHL1

      23

      56988.33

      199.03

      334373.33

      847.23

      *Table 1. Unique peptide, IBAQ and LFQ values of the GRHL2 and IgG pulldowns for GRHL2 and GRHL1 *

      R2.4 Other studies have reported PR RIME, which could be a valuable source to investigate whether GRHL proteins were detected.

      We thank the reviewer for pointing this out. We are aware of the PR RIME, generated by Mohammed et al., which we refer to in the discussion (lines 390-391). This study indeed identified GRHL2 as a PR-interacting protein in MCF7 and T47D cells. Although they do not mention this interaction in the text, the interaction is clearly indicated in one of the figures from their paper, which supports our findings. To our knowledge, no other PR RIME datasets in MCF7 or T47D cells have been published to date.

      R2.5 In line 137, the term "protein score" is mentioned. Could the authors please clarify what this means and how it was calculated.

      We agree that this point was not clearly explained in the original text. The scores presented reflect the MaxQuant protein identification confidence, specifically the sum of peptide-level scores (from Andromeda), which indicates the relative confidence in protein detection. We have now added this clarification to line 137 and to the legend of Figure 1.

      R2.6 In line 140-141. The fact that GRHL2 interacts with chromatin remodelers does not by itself prove that GRHL2 acts as a pioneer factor or chromatin modulator. Demonstrating pioneer function typically requires direct evidence of chromatin opening or binding to closed chromatin regions (e.g., ATAC-seq, nucleosome occupancy assays). I recommend revising this statement or providing supporting evidence.

      We agree that the fact that GRHL2 interacts with chromatin remodelers does not by itself prove that GRHL2 acts as a pioneer factor or chromatin modulator. However, a previous study (Jacobs et al, Nature genetics, 2018) has shown directly that the GRHL family members (including GRHL2) have pioneering function and regulate the accessibility of enhancers. We adapted line 140-141 to state this more clearly. In addition, our newly added data in Figure 2G also support the fact that GRHL2 has a role in regulating chromatin accessibility in T47D cells.

      R2.7 The pulldown Western blot lacks an IgG control in panel D.

      This is correct. As the co-IP in Figure 1D served as a validation of the RIME and was specifically aimed at determining the effect of hormone treatment on the observed PR/GRHL2 interaction, we did not perform this control given the scale of the experiment. However, during RIME optimization, we performed GRHL2 staining of the IgG controls by western blot, shown in figure 2 for the reviewer below. As stated above, some background GRHL2 signal was observed in the IgG samples, but a clear enrichment is seen in the GRHL2 IP.

      Taken together, we believe that the well-controlled RIME, combined with the co-IP presented, provides strong evidence that the observed signal reflects a genuine GRHL-PR interaction.

      Figure 2 for reviewer: WB of input and IP GRHL2 and IgG RIME samples stained for GRHL2. NT = non-treated, T = treated

      R2.8 Depending on the journal and target audience, it may be helpful to briefly explain what R5020 is at its first mention (line 146).

      Thank you. We have adapted this accordingly.

      R2.9 The authors state that three technical replicates were performed for each experimental condition. It would be helpful to clarify the expected level of overlap between biological replicates of RIME experiments. This clarification is necessary, especially given the focus on uniquely enriched proteins in untreated versus treated cells, and the observation that some identified proteins in specific conditions are not chromatin-associated. Replicates or validations would strengthen the findings.

      We use the term technical rather than biological replicates because for cell lines, defining true biological replicates is challenging, as most variability arises from experimental rather than biological differences. To introduce some variation, we split our T47DS cells into three parallel dishes 5 days prior to starting the treatment. We purposely did this, to minimize to minimize the likelihood that proteins identified as uniquely enriched are artifacts. Each of the three technical replicates comes from one of these three parallel splits (so equal passage numbers but propagated in parallel dishes for 5 days before the start of the experiment).

      To generate the three technical replicates for our RIME, we plated cells from the parallel grown splits. Treatments for the three replicates were performed per replicate. Samples were crosslinked, harvested and lysed for subsequent RIME analysis, the three replicates were processed in parallel, for technical and logistical reasons. To clarify the experimental setup, we have updated the methods section accordingly (lines 566-568).

      As for the detection of non-chromatin-associated proteins: We cannot rule out that these are artifacts, as they may arise from residual cytosolic lysate during nuclear extraction. Alternatively, they could reflect a more dynamic subcellular localization of these proteins than currently annotated or appreciated.

      R2.10 The volcano plot for the RIME experiment appears to show three distinct clusters of proteins on the right, which is unusual for this type of analysis. The presence of these apparent groupings may suggest an artifact from the data processing, such as imputation. Can the authors clarify the origin of these groupings? If it is due to imputation or missing values, I recommend applying a stricter threshold, such as requiring detection in all three replicates (3/3) to improve the robustness of the enrichment analysis and increase confidence in the identified interactors.

      We thank the reviewer for pointing this out. As suggested, we re-evaluated the imputation and applied a stricter threshold, requiring detection in all three replicates. Indeed, the separate clusters were due to missing values, therefore we now revised the imputation method by imputing values based on the normal distribution. Using this revised analysis, we identify 2352 GRHL2 interactors instead of 1140, but the number of interacting proteins annotated as transcription factors or chromatin-associated/modifying proteins was still 103. Figure 1B, 1E, and Supplementary Figure 4A have been updated accordingly. We also revised the methods section to reflect this change. We think this suggestion has improved our analysis of the data and we thank the reviewer for pointing this out.

      R2.11 The statement that "PR and GRHL2 frequently overlap" may be overstated given that only ~700 overlapping sites are reported (cut&run).

      We have replaced "frequently overlap" by "can overlap" (line 229-230).

      R2.12 The model in Figure 6 suggests limited chromatin occupancy of PR and GRHL2 in hormone-depleted conditions, consistent with the known requirement of ligand for stable PR-DNA binding. However, Figure 1 shows no major difference in GRHL2-PR interaction between untreated and hormone-treated cells. This raises questions about where and how this interaction occurs in the absence of hormone. Since PR binding to chromatin is typically minimal without ligand, can the authors clarify this given that RIME data reflect chromatin-bound interactions.

      Indeed, the model in figure 6 suggests limited chromatin occupancy of PR and GRHL2 under hormone-depleted conditions. It is, however, important to note that the locus shown represents a gene regulated by both PR and GRHL2 - and not just any gene. We recognize that this was not sufficiently clear in the original version, and we have now clarified this in both the main text (line 331-334) and the figure legend.

      We propose that PR and GRHL2 bind or become enriched at enhancer sites associated with their target genes upon ligand stimulation. This is consistent with the known requirement of ligand for stable PR-DNA binding and with our observation that PR/GRHL2 overlapping peaks are detected only in the ligand-treated condition of the CUT&RUN experiment. Given the broader role of GRHL2, it also binds chromatin independently of progesterone and the progesterone receptor, which is why we included-but did not focus on-GRHL2-only binding events in our model.

      We would also like to clarify that, although RIME includes a nuclear enrichment step that enriches for chromatin-associated proteins, the pulldown is performed on nuclear lysates. Therefore, it captures both chromatin-bound protein complexes and freely soluble nuclear complexes, which unfortunately cannot be distinguished. GRHL2 is well established as a nuclear protein (Zeng et al., Cancers 2024; Riethdorf et al., International Journal of Cancer 2015), and although PR is classically described as translocating to the nucleus upon hormone stimulation, several studies-including our own-have shown that PR is continuously present in the nucleus (Aarts et al., J Mammary Gland Biol Neoplasia 2023; Frigo et al., Essays Biochem. 2021).

      We therefore propose that PR and GRHL2 may already interact in the nucleus without directly binding to chromatin. Given our observation that GRHL2 binding sites on the chromatin are redistributed upon R5020 mediated signaling activation, we hypothesize that such pre-formed PR-GRHL2 nuclear complexes may assist the rapid recruitment of GRHL2 to progesterone-responsive chromatin regions.

      We have expanded the discussion to include a dedicated section addressing this point (line 376-388).

      R2.13 It would be of interest to assess the overlap between the proteins identified in the RIME experiment and the motif analysis results.

      In the discussion section of our original manuscript, we highlighted some overlapping proteins in the RIME and motif analysis, including STAT6 and FOXA1. However, we had not yet systematically analyzed overlap in both analyses. To address this, we now compared all enriched motifs (so not only the top 5 as displayed in our figures) under GRHL2, PR, and GRHL2/PR shared sites from both the CUT&RUN and ChIP-seq datasets with the proteins identified as GRHL2 interactors in our RIME. Although we identified numerous GRHL2-associated proteins, relatively few of them were transcription factors whose binding motifs were also enriched under GRHL2 peaks.

      In our revised manuscript we have added a section in the discussion highlighting our systematic overlap of the results of our RIME experiment and the motif enrichment of the ChIP-seq and CUT&RUN analysis (line 415-436).

      R2.14 The authors chose CUT&RUN to assess chromatin binding of PR and GRHL2. Given that RIME is also based on chromatin immunoprecipitation - ChIP protocol, it would be helpful to clarify why CUT&RUN was selected over ChIP-seq for the DNA-binding assays. What is the overlap with published data?

      As also mentioned in our response to R1.3 and R1.5, we deliberately chose the CUT&RUN approach to minimize artifacts introduced by crosslinking and sonication, thereby reducing background and allowing the identification of high-confidence, direct DNA-binding events. Since GRHL2 physically interacts with PR, ChIP-seq could potentially capture indirect binding of GRHL2 at PR-bound sites (and vice versa). In contrast, CUT&RUN primarily detects direct DNA-protein interactions, providing a more specific and accurate binding profile. Additionally, CUT&RUN serves as an independent validation method for data obtained using ChIP-like protocols.

      Since CUT&RUN, similar to ChIP, can show limited reproducibility (Nordin et al., Nucleic Acids Research, 2024), and to our knowledge few PR CUT&RUN and no GRHL2 CUT&RUN datasets are currently available, it is challenging to directly compare our data with published datasets. Nevertheless, studies performing PR or ER CUT&RUN (Gillis et al., Cancer Research, 2024; Reese et al., Molecular and Cellular Biology, 2022) report a comparable number of peaks-in the same range of thousands-as observed in our data. This suggests that a single CUT&RUN experiment in general may detect fewer events than a single ChIP-seq experiment, but that the peaks that are found are likely to reflect direct binding events.

      Reviewer #2 (Significance (Required)):

      General Assessment: This study investigates the role of the transcription factor GRHL2 in modulating PR function, using RIME and CUT&RUN to explore protein-protein and protein-chromatin interactions. GRHL2 have been implicated in epithelial biology and transcriptional regulation and interaction with steroid hormone receptors has been reported. This study extends the field by showing a functional link between GRHL2 and PR, which has implications for understanding hormone-dependent gene regulation.

      The research will primarily interest a specialized audience in transcriptional regulation, chromatin biology, and hormone receptor signaling.

      Key words for this reviewer: chromatin biology, transcription factor function, epigenomics, and proteomics.

      We agree that with our implemented changes, this study should reach (and appeal to) an audience interested in transcriptional regulation, chromatin biology, hormone receptor signaling and breast cancer.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This study explores the important transcriptional coordination role of Grainyhead-like 2 (GRHL2) on the transcriptional regulatory function of progesterone receptor (PR). In this paper, the authors start with their recruitment characteristics, take into account their regulatory effects on downstream genes and their effects on the occurrence and development of breast cancer, and further clarify the coordination between them in three-dimensional space. The interaction between GRHL2 and PR, and the subsequent important influence on the co-regulated genes by GRHL2 and PR are analyzed. The overall framework of this study is mainly by RNA seq and CUT-TAG analysis, the molecular mechanism underlying the association between GRHL2 and PR and regulation function of two proteins in breast cancer is not clearly clarified. Some details need to be further improved:

      Major comments: R3.1 For Fig.1D, the molecular weight of each protein should be marked in the diagram, and the expression of GRHL2 in the input group should be supplemented.

      We apologize for not including molecular weights in our initial submission. We are not entirely clear what the reviewer means with their statement that "the expression of GRHL2 in the input group should be supplemented". The blot depicted in Figure 1D shows both the input signal and the IP. For the reviewer's information, the full Western blot is depicted below.

      Figure 3 for reviewer: Full WBs of input and IP GRHL2 samples stained for GRHL2 or PR. NT = non-treated, T = treated

      R3.2 In Fig.2B and Fig 5C, it should be describe well whether GRHL2 recruitment is in the absence or presence of R5020? How about the co-occupancy of PR and GRHL2 region, Promoter or enhancer region? It would be better to show histone marks such as H3K27ac and H3K4me1 to annotate the enhancer region.

      As also stated in our response to R1.3, we acknowledge that the ChIP-seq experiments cannot definitively determine whether GRHL2 and PR co-occupy genomic regions under ligand-stimulated conditions, since the GRHL2 dataset was generated in the absence of progesterone stimulation (as indicated in lines 167-169). To clarify this, we have now specified this detail in the legend of figure 2 by noting "untreated GRHL2 ChIP." To directly assess GRHL2 chromatin binding under progesterone-stimulated conditions, we performed CUT&RUN experiments for both GRHL2 and PR under untreated and R5020-treated conditions. These experiments revealed a subset of overlapping PR and GRHL2 binding sites (approximately 5% of all identified PR peaks.

      In our original manuscript, we performed genomic annotation of the GRHL2, PR, and GRHL2/PR overlapping peaks (Figure 2E) and found that most of these sites were located in intergenic regions, where enhancers are typically found, with ~20% located in promoter regions. We appreciate the reviewer's suggestion to further overlap the ChIP-seq peaks with histone marks such as H3K27ac and H3K4me1. We have now incorporated publicly available ATAC-seq and H3K27ac ChIP datasets in our revised manuscript (as also suggested by Reviewer 1) and find that shared GRHL2/PR sites are indeed located in active enhancer regions marked by H3K27ac (see Figure 2F). Additionally, as expected, we find that GRHL2/PR overlapping sites are enriched at open chromatin (Figure 2G).

      R3.3 What is the biological function analysis by KEGG or GO analysis for the overlapping genes from VN plots of RNA-seq with CUT-TAG peaks. The genes co-regulated by GRH2L and PR are further determined.

      For us, it is not entirely clear what reviewer 3 is asking here, but we can explain the following: as it is challenging to integrate HiChIP with CUT&RUN, due to the fundamentally different nature of the two techniques, we chose not to directly assign genes to CUT&RUN peaks. However, we did carefully link the GRHL2, PR, and GRHL2/PR ChIP-seq peaks to their target genes by integrating chromatin looping data from a PR HiChIP analysis. The result from this analysis is depicted in Figure 4B.

      As suggested by this reviewer, we also performed a GO-term analysis on the 79 genes that are regulated by both GRHL2 and PR (we now have 79 genes after the re-analysis as suggested in R1.6). The corresponding results are provided for the reviewer in figure 3 of this rebuttal (below). As this additional analysis does not provide further biological insight beyond what is already presented in Figure 4C, we decided to not include this figure in the manuscript.

      Figure 4 for reviewer: GO-term analysis on the 79 GRHL2-PR co-regulated genes that are transcriptionally regulated by GRHL2 and PR and that also harbor a PR HiChIP loop anchor at or near their TSS

      R3.4 Western blotting should be performed to determine the protein levels of downstream genes co-regulated genes by GRH2L and PR in the absence or presence of R5020.

      We agree that determining the response of co-regulated is important. Therefore, in Figure 4D, we present three representative examples of genes that are directly co-regulated by GRHL2 and PR-specifically, genes that are differentially expressed after 4 hours of R5020 exposure. Although protein levels of the targets are of functional importance, GRHL2 and PR are of transcription factors whose immediate effects are primarily exerted at the level of gene transcription. Therefore, in our opinion, changes in mRNA abundance provide the most direct and mechanistically relevant readout of their regulatory activity.

      R3.5 The author mentioned that this study positions that GRHL2 acts as a crucial modulator of steroid hormone receptor function, while the authors do not provide the evidences that how does GRHL2 regulate PR-mediated transactivation, and how about these two proteins subcellular distribution in breast cancer cells.

      We agree that while our RNA-seq data demonstrate that GRHL2 modulates the expression of PR target genes, and our CUT&RUN experiments show that GRHL2 chromatin binding is reshaped upon R5020 exposure, we have not yet further dissected the molecular mechanism by which GRHL2 functions as a PR co-regulator.

      As also mentioned in our response to R1.2, we did consider several follow-up experiments to address this, including PR CUT&RUN in GRHL2 knockdown cells, CUT&RUN for known co-activators such as KMT2C/D and P300, as well as functional studies involving GRHL2 TAD and DBD mutants. However, due to technical and logistical challenges, we were unable to carry out these experiments within the timeframe of this study.

      That said, we fully recognize that such approaches would provide deeper mechanistic insight into the interplay between PR and GRHL2. We have therefore explicitly acknowledged this limitation in our limitations of the study section (lines 502-507) and consider it an important avenue for future investigation.

      Regarding the subcellular distribution in breast cancer cells: As also mentioned in our response to R2.12, GRHL2 is well established as a nuclear protein (Zeng et al., Cancers 2024; Riethdorf et al., International Journal of Cancer 2015), and although PR is classically described as translocating to the nucleus upon hormone stimulation, several studies-including our own-have shown that PR is continuously present in the nucleus (Aarts et al., J Mammary Gland Biol Neoplasia 2023; Frigo et al., Essays Biochem. 2021). Thus, both proteins mostly reside in the nucleus in breast (cancer) cells both in the absence and presence of hormone stimulation, but dynamic subcellular shuttling is likely to occur.

      Minor comments: Please describe in more detail the relationship between PR and GRHL2 binding independent of the hormone in the discussion section.

      As also mentioned in our response to R2.12, we have expanded the discussion to include a dedicated section addressing this point (lines 376-388).

      Reviewer #3 (Significance (Required)):

      Advance: Compare the study to existing published knowledge, it fills a gap. The authors provide RNA seq and CUT-TAG sequence analysis to show the recruitment of GRHL2 and PR and the co-regulated genes in the absence or presence of progesterone.

      Audience: breast surgery will be interested, and the audiences will cover clinical and basic research.

      My expertise is focused on the epigenetic modulation of steroid hormone receptors in the related cancers, such as breast cancer, prostate cancer, and endometrial carcinoma.

      We agree that with our implemented changes, this study should reach (and appeal to) an audience interested in transcriptional regulation, chromatin biology, hormone receptor signaling and breast cancer.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This study explores the important transcriptional coordination role of Grainyhead-like 2 (GRHL2) on the transcriptional regulatory function of progesterone receptor (PR). In this paper, the authors start with their recruitment characteristics, take into account their regulatory effects on downstream genes and their effects on the occurrence and development of breast cancer, and further clarify the coordination between them in three-dimensional space. The interaction between GRHL2 and PR, and the subsequent important influence on the co-regulated genes by GRHL2 and PR are analyzed. The overall framework of this study is mainly by RNA seq and CUT-TAG analysis, the molecular mechanism underlying the association between GRHL2 and PR and regulation function of two proteins in breast cancer is not clearly clarified. Some details need to be further improved:

      Major comments:

      1. For Fig.1D, the molecular weight of each protein should be marked in the diagram, and the expression of GRHL2 in the input group should be supplemented.
      2. In Fig.2B and Fig 5C, it should be describe well whether GRHL2 recruitment is in the absence or presence of R5020? How about the co-occupancy of PR and GRHL2 region, Promoter or enhancer region? It would be better to show histone marks such as H3K27ac and H3K4me1 to annotate the enhancer region.
      3. What is the biological function analysis by KEGG or GO analysis for the overlapping genes from VN plots of RNA-seq with CUT-TAG peaks. The genes co-regulated by GRH2L and PR are further determined.
      4. Western blotting should be performed to determine the protein levels of downstream genes co-regulated genes by GRH2L and PR in the absence or presence of R5020.
      5. The author mentioned that this study positions that GRHL2 acts as a crucial modulator of steroid hormone receptor function, while the authors do not provide the evidences that how does GRHL2 regulate PR-mediated transactivation, and how about these two proteins subcellular distribution in breast cancer cells.

      Minor comments:

      Please describe in more detail the relationship between PR and GRHL2 binding independent of the hormone in the discussion section.

      Significance

      Advance: Compare the study to existing published knowledge, it fills a gap. The authors provide RNA seq and CUT-TAG sequence analysis to show the recruitment of GRHL2 and PR and the co-regulated genes in the absence or presence of progesterone.

      Audience: breast surgery will be interested, and the audiences will cover clinical and basic research.

      My expertise is focused on the epigenetic modulation of steroid hormone receptors in the related cancers, such as breast cancer, prostate cancer, and endometrial carcinoma.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors present a study investigating the role of GRHL2 in hormone receptor signaling. Previous research has primarily focused on GRHL2 interaction with estrogen receptor (ER) and androgen receptor (AR). In breast cancer, GRHL2 has been extensively studied in relation to ER, while its potential involvement with the progesterone receptor (PR) remains largely unexplored. This is the rational of this study to investigate the relation between PR and GRHL2. The authors demonstrate an interaction between GRHL2 and PR and further explore this relationship at the level of genomic binding sites. They also perform GRHL2 knockdown experiments to identify target genes and link these transcriptional changes back to GRHL2-PR chromatin occupancy. However, several conceptual and technical aspects of the study require clarification to fully support the authors' conclusions.

      1. Given the high sequence similarity among GRHL family members, this raises questions about the specificity of the antibody used for GRHL2 RIME. The authors should address whether the antibody cross-reacts with GRHL1 or GRHL3. For example, GRHL1 shows a higher log fold change than GRHL2 in the RIME data.
      2. In addition, in RIME experiments, one would typically expect the bait protein to be among the most highly enriched proteins compared to control samples. If this is not the case, it raises questions about the efficiency of the pulldown, antibody specificity, or potential technical issues. The authors should comment on the enrichment level of the bait protein in their data to reassure readers about the quality of the experiment.
      3. The authors report log2 fold changes calculated using iBAQ values for the bait versus IgG control pulldown. While iBAQ provides an estimate of protein abundance within samples, it is not specifically designed for quantitative comparison between samples without appropriate normalization. It would be helpful to clarify the normalization strategy applied and consider using LFQ intensities.
      4. Other studies have reported PR RIME, which could be a valuable source to investigate whether GRHL proteins were detected.
      5. In line 137, the term "protein score" is mentioned. Could the authors please clarify what this means and how it was calculated.
      6. In line 140-141. The fact that GRHL2 interacts with chromatin remodelers does not by itself prove that GRHL2 acts as a pioneer factor or chromatin modulator. Demonstrating pioneer function typically requires direct evidence of chromatin opening or binding to closed chromatin regions (e.g., ATAC-seq, nucleosome occupancy assays). I recommend revising this statement or providing supporting evidence.
      7. The pulldown Western blot lacks an IgG control in panel D.
      8. Depending on the journal and target audience, it may be helpful to briefly explain what R5020 is at its first mention (line 146).
      9. The authors state that three technical replicates were performed for each experimental condition. It would be helpful to clarify the expected level of overlap between biological replicates of RIME experiments. This clarification is necessary, especially given the focus on uniquely enriched proteins in untreated versus treated cells, and the observation that some identified proteins in specific conditions are not chromatin-associated. Replicates or validations would strengthen the findings.
      10. The volcano plot for the RIME experiment appears to show three distinct clusters of proteins on the right, which is unusual for this type of analysis. The presence of these apparent groupings may suggest an artifact from the data processing, such as imputation. Can the authors clarify the origin of these groupings? If it is due to imputation or missing values, I recommend applying a stricter threshold, such as requiring detection in all three replicates (3/3) to improve the robustness of the enrichment analysis and increase confidence in the identified interactors.
      11. The statement that "PR and GRHL2 frequently overlap" may be overstated given that only ~700 overlapping sites are reported (cut&run).
      12. The model in Figure 6 suggests limited chromatin occupancy of PR and GRHL2 in hormone-depleted conditions, consistent with the known requirement of ligand for stable PR-DNA binding. However, Figure 1 shows no major difference in GRHL2-PR interaction between untreated and hormone-treated cells. This raises questions about where and how this interaction occurs in the absence of hormone. Since PR binding to chromatin is typically minimal without ligand, can the authors clarify this given that RIME data reflect chromatin-bound interactions.
      13. It would be of interest to assess the overlap between the proteins identified in the RIME experiment and the motif analysis results.
      14. The authors chose CUT&RUN to assess chromatin binding of PR and GRHL2. Given that RIME is also based on chromatin immunoprecipitation - ChIP protocol, it would be helpful to clarify why CUT&RUN was selected over ChIP-seq for the DNA-binding assays. What is the overlap with published data?

      Significance

      General Assessment:

      This study investigates the role of the transcription factor GRHL2 in modulating PR function, using RIME and CUT&RUN to explore protein-protein and protein-chromatin interactions. GRHL2 have been implicated in epithelial biology and transcriptional regulation and interaction with steroid hormone receptors has been reported. This study extends the field by showing a functional link between GRHL2 and PR, which has implications for understanding hormone-dependent gene regulation.

      The research will primarily interest a specialized audience in transcriptional regulation, chromatin biology, and hormone receptor signaling.

      Key words for this reviewer: chromatin biology, transcription factor function, epigenomics, and proteomics.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      The manuscript by Aarts et al. explores the role of GRHL2 as a regulator of the progesterone receptor (PR) in breast cancer cells. The authors show that GRHL2 and PR interact in a hormone-independent manner and, based on genomic analyses, propose that they co-regulate target genes via chromatin looping. To support this model, the study integrates both newly generated and previously published datasets, including ChIP-seq, CUT&RUN, RNA-seq, and chromatin interaction assays, in breast cancer cell models (T47DS and T47D).

      Major comments:

      1. Novelty of GRHL2 in steroid receptor biology The role of GRHL2 as a co-regulator of steroid hormone receptors has previously been described for ER (J Endocr Soc. 2021;5(Suppl 1):A819) and AR (Cancer Res. 2017;77:3417-3430). In the ER study, the authors also employed a GRHL2 ΔTAD T47D cell model. Therefore, while this manuscript extends GRHL2 involvement to PR, the contribution appears incremental rather than conceptual.
      2. Mechanistic depth The study provides limited mechanistic insight into how GRHL2 functions as a PR co-regulator. Key mechanistic questions remain unaddressed, such as whether GRHL2 modulates PR activation, the sequential recruitment of co-activators/co-repressors, engages chromatin remodelers, or alters PR DNA-binding dynamics. Incorporating these analyses would considerably strengthen the mechanistic conclusions.
      3. Definition of GRHL2-PR regulatory regions (Figure 2) The 6,335 loci defined as GRHL2-PR co-regulatory regions are derived from a PR ChIP-seq performed in the presence of hormone and a GRHL2 ChIP-seq performed in its absence. This approach raises doubts about whether GRHL2 and PR actually co-occupy these regions under ligand stimulation. GRHL2 ChIP-seq experiments in both hormone-treated and untreated conditions are necessary to provide stronger support for this conclusion.
      4. Cell model considerations The manuscript relies heavily on the T47DS subclone, which expresses markedly higher PR levels than parental T47D cells (Aarts et al., J Mammary Gland Biol Neoplasia 2023; Kalkhoven et al., Int J Cancer 1995). This raises concerns about physiological relevance. Key findings, including co-IP and qPCR-ChIP experiments, should be validated in additional breast cancer models such as parental T47D, BT474, and MCF-7 cells to generalize the conclusions. Furthermore, data obtained from T47D (PR ChIP-seq, HiChIP, CTCF and Rad21 ChIP-seq) and T47DS (RNA-seq, CUT&RUN) are combined along the manuscript. Given the substantial differences in PR expression between these cell lines, this approach is problematic and should be reconsidered.
      5. CUT&RUN vs ChIP-seq data The CUT&RUN experiments identify fewer than 10% of the PR binding sites reported in the ChIP-seq datasets. This discrepancy likely results from methodological differences (e.g., absence of crosslinking, potential loss of weaker binding events). The overlap of only 158 sites between PR and GRHL2 under hormone treatment (Figure 3B) provides limited support for the proposed model and should be interpreted with greater caution.
      6. Gene expression analyses (Figure 4) The RNA-seq analysis after 24 hours of hormone treatment likely captures indirect or secondary effects rather than the direct PR-GRHL2 regulatory program. Including earlier time points (e.g., 4-hour induction) in the analysis would better capture primary transcriptional responses. The criteria used to define PR-GRHL2 co-regulated genes are not convincing and may not reflect the regulatory interactions proposed in the model. Strong basal expression changes in GRHL2-depleted cells suggest that much of the transcriptional response is PR-independent, conflicting with the model (Figure 6). A more straightforward approach would be to define hormone-regulated genes in shControl cells and then examine their response in GRHL2-depleted cells. Finally, integrating chromatin accessibility and histone modification datasets (e.g., ATAC-seq, H3K27ac ChIP-seq) would help establish whether PR-GRHL2-bound regions correspond to active enhancers, providing stronger functional support for the proposed regulatory model.

      Minor comments

      Page 19: The statement that "PR and GRHL2 trigger extensive chromatin reorganization" is not experimentally supported. ATAC-seq would be an appropriate method to test this directly.

      Prior literature on GRHL2 as a steroid receptor co-regulator should be discussed more thoroughly.

      Significance

      The identification of novel PR co-regulators is an important objective, as the mechanistic basis of PR signaling in breast cancer remains incompletely understood. The main strength of this study lies in highlighting GRHL2 as a factor influencing PR genomic binding and transcriptional regulation, thereby expanding the repertoire of regulators implicated in PR biology. That said, the novelty is limited, given the established roles of GRHL2 in ER and AR regulation. Mechanistic insight is underdeveloped, and the reliance on an engineered T47DS model with supra-physiological PR levels reduces the general impact. Without validation in physiologically relevant breast cancer models and clearer separation of direct versus indirect effects, the overall advance remains modest.

      The manuscript will be of interest to a specialized audience in the fields of nuclear receptor signaling, breast cancer genomics, and transcriptional regulation. Broader appeal, including translational or clinical relevance, is limited in its current form.

    1. Most of the time categorization process, discrimination process, dehumanization processes

      for - genocide - preceded by 10 preliminary stages - 1. classification - 2. symbolization - 3. discrmination - 4. dehumanization - 5. organization - 6. polarization - 7. preparation - 8. persecution - 9. extermination - 10. denial

    1. Analyse de l'Engagement Politique : Concepts, Paradoxes et Contexte

      Résumé Exécutif

      Ce document de synthèse analyse en profondeur les multiples facettes de l'engagement politique en s'appuyant sur les perspectives de la sociologie et de la science politique.

      L'analyse révèle quatre axes majeurs.

      Premièrement, une distinction conceptuelle fondamentale est établie entre la participation politique, qui inclut des actes peu coûteux comme le vote, et l'engagement, qui désigne des formes d'action plus intenses, publiques et risquées.

      L'engagement se décline sur un continuum allant du simple sympathisant au militant permanent, avec des profils variés tels que les "militants par conscience" et les "bénéficiaires directs" de la lutte.

      Deuxièmement, le document explore le paradoxe de l'action collective, tel que formulé par Mancur Olson.

      Ce paradoxe explique pourquoi des individus rationnels peuvent s'abstenir de participer à une action collective même s'ils en partagent les objectifs, à cause de la tentation du "passager clandestin".

      Les solutions à ce paradoxe résident dans les incitations sélectives et, de manière plus sociologique, dans les rétributions symboliques de l'engagement (reconnaissance, plaisir militant, fidélité à ses valeurs) théorisées par Daniel Gaxie.

      Troisièmement, l'analyse aborde l'importance du contexte à travers la notion de Structure des Opportunités Politiques (SOP).

      Ce concept macro-analytique soutient que le succès et les formes d'un mouvement social (pacifiques ou disruptives) dépendent de l'ouverture ou de la fermeture du système politique.

      Bien qu'utile pour comprendre des dynamiques historiques comme le mouvement des droits civiques aux États-Unis, ce concept fait l'objet de critiques importantes pour son statisme et sa vision simplifiée des interactions entre l'État et les mouvements sociaux.

      Enfin, le document souligne le rôle crucial des variables socio-démographiques et des socialisations individuelles.

      L'engagement est fortement corrélé au capital culturel et à la "disponibilité biographique".

      L'analyse met en lumière l'importance des émotions, notamment le "choc moral", en précisant que la capacité à ressentir une indignation face à une situation est elle-même socialement construite.

      L'étude de cas du "Freedom Summer" de 1964 démontre de manière saisissante que l'engagement intense a des conséquences biographiques profondes et durables sur la trajectoire de vie des militants.

      --------------------------------------------------------------------------------

      1. Définir l'Engagement Politique : Au-delà de la Simple Participation

      Une première perplexité soulevée par l'analyse concerne la définition même de l'engagement politique.

      Le terme, tel qu'il est parfois utilisé, tend à regrouper toutes les formes d'activité politique, y compris les moins exigeantes.

      Cependant, la recherche en sociologie politique opère une distinction cruciale entre la participation et l'engagement.

      1.1. Participation vs. Engagement : Une Question d'Intensité et de Risque

      La participation est la catégorie la plus large, englobant toutes les formes de contribution aux affaires de la cité.

      Le vote, l'inscription sur les listes électorales ou la réponse à un sondage sont des formes minimales et peu coûteuses de participation.

      Elles sont souvent individuelles, secrètes (comme le vote dans l'isoloir) et n'engagent l'individu que de manière très limitée.

      L'engagement, en revanche, désigne des formes de participation plus intenses, exigeantes et coûteuses en temps, en énergie et parfois en ressources.

      Il se caractérise par deux dimensions clés :

      L'exposition publique : S'engager, c'est s'exposer publiquement, que ce soit en manifestant, en signant une pétition nominative ou en prenant la parole pour une cause.

      La prise de risque : Cette exposition publique peut entraîner des rétorsions, des controverses, des sanctions professionnelles ou même des risques physiques (violences policières, par exemple).

      La figure de l'intellectuel engagé, comme les signataires du Manifeste des 121 contre la guerre d'Algérie, illustre cette prise de risque.

      L'engagement s'inscrit donc dans une démarche où l'individu accepte un coût personnel potentiellement élevé en échange de la défense d'une cause collective.

      1.2. Le Continuum de l'Intensité de l'Engagement

      L'engagement peut être vu comme un continuum avec différents degrés d'implication.

      Le sympathisant : Il soutient une cause ou une organisation de l'extérieur, sans adhésion formelle.

      Sa participation est souvent ponctuelle, comme le fait de se joindre à une manifestation pour montrer son soutien.

      L'adhérent : Il formalise son soutien en prenant sa carte dans un parti, un syndicat ou une association.

      Cet acte implique souvent une contribution financière (cotisation) et marque une identification plus forte. L'adhérent peut dire "nous" en parlant de l'organisation, mais son implication active peut rester limitée.

      Le militant : Il est véritablement partie prenante des activités de l'organisation.

      Il consacre du temps et de l'énergie de manière régulière, défend activement les positions du groupe, participe aux actions et s'identifie fortement à ses couleurs.

      Au sein même du militantisme, les auteurs McCarthy et Zald distinguent plusieurs statuts au sein des "organisations de mouvement social".

      Statut

      Description

      Volontaires

      Militants bénévoles qui participent sur leur temps libre, sans rémunération. Ils constituent la base de nombreuses organisations.

      Permanents

      Militants salariés par l'organisation pour assurer son fonctionnement quotidien.

      Leur statut peut parfois créer des tensions avec les bénévoles.

      Cadres (Porte-parole)

      Personnes qui incarnent et représentent l'organisation publiquement (président, secrétaire général).

      Ils négocient avec les autorités et s'expriment dans les médias.

      Leur sélection et leur légitimité sont des enjeux cruciaux au sein des collectifs.

      1.3. Profils de Militants et Logiques d'Engagement

      Une autre distinction importante est celle proposée par McCarthy et Zald entre :

      Les bénéficiaires : Ce sont les personnes directement concernées par la lutte et qui en retireront un bénéfice personnel et immédiat en cas de succès (ex: les sans-papiers luttant pour leur régularisation).

      Les militants par conscience : Ce sont des personnes qui soutiennent la cause par conviction, sans attendre de bénéfice direct pour leur situation personnelle (ex: des citoyens français soutenant les sans-papiers).

      Cette distinction est essentielle car les logiques d'engagement et les objectifs peuvent différer entre ces deux groupes, créant parfois des tensions au sein d'un même mouvement.

      1.4. L'Évolution de l'Engagement Partisan

      La thèse d'un déclin de l'engagement, souvent associée à la baisse du nombre d'adhérents dans les partis politiques, est nuancée.

      Une hypothèse plus fructueuse est que les partis politiques dominants n'ont plus besoin de militants comme par le passé.

      Transformés en "machines électorales" peuplées de professionnels de la politique, ils peuvent externaliser des tâches autrefois militantes (collage d'affiches, communication) à des entreprises spécialisées.

      De plus, des mécanismes comme les primaires ouvertes ont réduit le rôle des militants dans la sélection des candidats.

      Ce phénomène n'entraîne pas la fin de l'envie de s'engager, mais plutôt un report de l'engagement vers d'autres espaces, comme le secteur associatif ou les mouvements sociaux, perçus comme plus concrets et désintéressés par des militants déçus de la vie partisane.

      2. Le Paradoxe de l'Action Collective

      L'un des défis théoriques majeurs pour comprendre l'engagement est d'expliquer pourquoi des actions collectives émergent, alors même que la rationalité individuelle pourrait y faire obstacle.

      2.1. Le Modèle de Mancur Olson

      L'économiste Mancur Olson, dans son ouvrage Logique de l'action collective (1965), a rompu avec les théories antérieures qui postulaient l'irrationalité des foules (Gustave Le Bon) ou expliquaient la révolte par des facteurs psychologiques comme la "frustration relative" (Ted Gurr). Olson part du postulat d'un acteur rationnel et calculateur.

      Le paradoxe qu'il met en évidence est le suivant :

      1. Une action collective vise à obtenir un bien collectif, c'est-à-dire un avantage qui profitera à tous les membres d'un groupe, qu'ils aient participé à l'action ou non (ex: une augmentation de salaire pour tous les employés d'une entreprise).

      2. Participer à l'action a un coût individuel (ex: perte de salaire pendant une grève, temps consacré, risques encourus).

      3. L'acteur rationnel sera donc tenté d'adopter la stratégie du "passager clandestin" (free rider) : ne pas payer le coût de l'action tout en espérant bénéficier de ses retombées si les autres se mobilisent.

      Si tout le monde suit ce calcul, l'action collective n'a jamais lieu, même si elle serait bénéfique pour tous.

      2.2. Solutions au Paradoxe : Incitations Sélectives et Rétributions

      Pour Olson, la solution au paradoxe réside dans les incitations sélectives : des bénéfices (ou des coûts) qui s'appliquent uniquement à ceux qui participent (ou ne participent pas) à l'action.

      Incitations sélectives négatives (coûts) : Rendre la non-participation plus coûteuse que la participation. Exemples : la pression sociale, la stigmatisation des "jaunes" lors d'une grève, voire les menaces physiques d'un piquet de grève.

      Incitations sélectives positives (bénéfices) : Offrir des avantages individuels réservés aux participants.

      Olson évoque même des "incitations sélectives érotiques" (le plaisir de rencontrer des gens, de nouer des relations).

      Le politiste Daniel Gaxie a sociologisé cette approche en développant le concept de rétributions de l'engagement.

      Ces gratifications, qui motivent et soutiennent le militantisme, peuvent être de plusieurs natures :

      Matérielles : Obtention d'un logement social, d'un emploi via le réseau de l'organisation.

      Symboliques : Acquisition de responsabilités, de notoriété, de reconnaissance.

      Le fait de passer dans les médias ou d'être le porte-parole d'une lutte est une gratification symbolique puissante.

      Identitaires et morales : Le plaisir d'agir en conformité avec ses valeurs, de "pouvoir se regarder dans la glace".

      Affectives et sociales : Le plaisir de la sociabilité militante, de partager des moments forts avec des camarades, de se sentir membre d'un collectif.

      Ces rétributions expliquent pourquoi des "militants par conscience" ne sont pas totalement désintéressés : ils trouvent un intérêt (au sens sociologique) dans leur engagement.

      Cette analyse, couplée aux critiques d'Albert Hirschman (qui note que le coût et le bénéfice de l'action peuvent se confondre, comme la fierté tirée d'une lutte difficile), permet de dépasser la vision purement utilitariste d'Olson.

      3. Le Rôle du Contexte : La Structure des Opportunités Politiques (SOP)

      Si le modèle d'Olson se concentre sur l'individu (micro), l'approche par la Structure des Opportunités Politiques (SOP) se place à un niveau macro-structurel pour analyser l'influence du contexte politique sur les mouvements sociaux.

      3.1. Définition et Exemple Fondateur

      La SOP désigne l'ensemble des éléments du contexte politique qui facilitent ou entravent l'émergence et le succès d'un mouvement social.

      Le travail de Doug McAdam sur le mouvement pour les droits civiques aux États-Unis est l'exemple fondateur.

      McAdam montre que les organisations noires existaient déjà dans les années 1930 mais piétinaient.

      Leur succès dans les années 1950-60 s'explique par une ouverture de la SOP, due à plusieurs facteurs :

      Économiques : La crise du coton dans le Sud et la migration des Noirs vers les industries du Nord.

      Sociaux : Une "libération cognitive" où les Noirs, découvrant un racisme moins institutionnalisé dans le Nord, réalisent que la ségrégation n'est pas une fatalité.

      Électoraux : La population noire devient un enjeu électoral pour le Parti Démocrate dans le Nord.

      Géopolitiques : En pleine Guerre Froide, la ségrégation raciale fragilise l'image des États-Unis face à l'URSS.

      Cette ouverture a rendu le système politique plus réceptif aux revendications, permettant au mouvement d'obtenir des succès par des actions largement pacifiques.

      Lorsque la SOP s'est refermée dans les années 1970 (arrivée de Nixon, répression du FBI), les formes de protestation se sont radicalisées (Black Power).

      3.2. Formes de Protestation et Types de Systèmes Politiques

      L'idée centrale est que la forme de la SOP influence directement les stratégies des mouvements :

      SOP ouverte (système réceptif, procédures de consultation, etc.) : favorise des actions pacifiques, la négociation et le lobbying.

      SOP fermée (système bloqué, centralisé, peu réceptif) : contraint les mouvements à utiliser des répertoires d'action plus perturbateurs et disruptifs pour se faire entendre.

      L'exemple comparatif entre la France et la Suisse sur la question des OGM est parlant.

      En Suisse, dotée de mécanismes de démocratie directe (votation), les anti-OGM ont pu obtenir des moratoires par des voies institutionnelles.

      En France, système plus centralisé et fermé, ils ont dû recourir à des actions illégales (faucheurs volontaires) pour politiser l'enjeu.

      3.3. Critiques et Limites du Concept

      Malgré son utilité, le concept de SOP a fait l'objet de nombreuses critiques :

      Ambigüité : La notion est souvent une "auberge espagnole" où l'on peut trouver a posteriori n'importe quel facteur contextuel pour expliquer un résultat.

      Statisme : L'approche tend à figer les systèmes politiques dans des typologies statiques (ouvert/fermé), négligeant la dynamique et les fluctuations.

      Oxymore conceptuel : James Jasper souligne la contradiction entre "structure" (stable, durable) et "opportunité" (fugace, subjectivement perçue).

      Vision simpliste : Le modèle postule une séparation étanche entre les "insiders" (système politique) et les "outsiders" (mouvements), alors que les frontières sont poreuses (des militants peuvent être au sein de l'État).

      Déterminisme univoque : Il suggère que le système politique détermine les mouvements, alors que les mouvements sociaux peuvent eux-mêmes transformer et contraindre le système politique.

      En raison de ces limites, le concept de SOP est aujourd'hui moins utilisé dans la recherche, qui privilégie des approches plus dynamiques des interactions.

      4. Les Déterminants Sociaux de l'Engagement

      Au-delà des modèles théoriques, l'engagement dépend fortement de variables socio-démographiques et de processus de socialisation qui prédisposent, ou non, les individus à s'engager.

      4.1. Variables Classiques : Capital Culturel et Disponibilité Biographique

      La recherche confirme de manière constante que l'engagement politique est socialement situé.

      Le capital culturel et scolaire : L'intérêt pour la politique et la compétence politique perçue sont fortement corrélés au niveau de diplôme.

      Les individus les plus diplômés sont souvent ceux qui votent le plus, mais aussi ceux qui manifestent et signent le plus de pétitions.

      La disponibilité biographique : L'engagement intense est plus fréquent chez les jeunes (moins de contraintes familiales et professionnelles) et les "jeunes retraités" (plus de temps libre).

      Les personnes en milieu de carrière avec des responsabilités familiales sont souvent moins disponibles pour un militantisme chronophage.

      4.2. Le Retour des Émotions : Le "Choc Moral" Sociologisé

      Contre l'image d'un acteur purement rationnel, la recherche réintègre la dimension émotionnelle de l'engagement.

      Le choc moral, théorisé par James Jasper, désigne l'indignation ou le scandale ressenti face à une situation qui pousse à l'action.

      Cependant, il est crucial d'expliquer sociologiquement ce choc moral : tout le monde n'est pas choqué par les mêmes situations.

      La capacité à ressentir cette indignation dépend de la socialisation, des valeurs et des expériences passées de l'individu.

      • Un individu socialisé dans un environnement pro-corrida ne ressentira pas le même choc moral devant une mise à mort qu'un militant de la cause animale.

      • Les militants de Réseau Éducation Sans Frontières (RESF) sont souvent des personnes qui ont elles-mêmes bénéficié de la promotion sociale par l'école ; leur attachement à cette institution les prédispose particulièrement à être indignés par l'expulsion d'enfants scolarisés.

      Les émotions ne sont donc pas irrationnelles, mais socialement déterminées.

      4.3. L'Impact Durable : Les Conséquences Biographiques de l'Engagement

      L'étude de Doug McAdam sur le Freedom Summer (1964) offre un aperçu exceptionnel des effets de l'engagement sur la vie des individus.

      Durant cet été, de jeunes militants blancs sont allés dans le Mississippi pour aider les Noirs à s'inscrire sur les listes électorales, un engagement à très haut risque.

      Grâce à des archives uniques, McAdam a pu comparer, 20 ans plus tard, le groupe de ceux qui ont participé et un groupe témoin de ceux qui avaient été acceptés mais ne s'y sont finalement pas rendus.

      Les résultats sont frappants : les participants au Freedom Summer ont eu, en moyenne :

      • Des carrières professionnelles plus chaotiques et des revenus plus faibles.

      • Des vies familiales moins stables (plus de divorces, moins d'enfants).

      • Un niveau d'engagement militant beaucoup plus élevé et durable.

      Cette étude démontre que l'engagement intense n'est pas une simple parenthèse dans une vie, mais un événement fondateur qui a des conséquences biographiques profondes, façonnant durablement les trajectoires professionnelles, familiales et militantes.

      C'est également de cette expérience que sont issues de nombreuses futures leaders du mouvement féministe américain, qui y ont pris goût à l'action collective tout en y découvrant la division sexiste du travail militant.

    1. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses the important problem of the uncoupling of oxidative phosphorylation due to hypoxia-ischemia injury in the neonatal brain and provides insight into the neuroprotective mechanisms of hypothermia treatment.

      Strengths:

      The authors used a combination of in vivo imaging of awake P10 mice and experiments on isolated mitochondria to assess various key parameters of brain metabolism during hypoxia-ischemia with and without hypothermia treatment. This unique approach resulted in a comprehensive data set that provides solid evidence to support the derived conclusions.

      Weaknesses:

      Several potential weaknesses were identified in the original submission, which the authors subsequently addressed in the revised manuscript. Here is the brief list of the questions:

      (1) Is it possible that the observed relatively low baseline OEF and trends of increased OEF and CBF over several hours after the imaging start were partially due to slow recovery from anesthesia?

      (2) What was the pain management, and is there a possibility that some of the observations were influenced by the pain-reducing drugs or their absence?

      (3) Were P10 mice significantly stressed during imaging in the awake state because they didn't have head-restraint habituation training?

      (4) Considering high metabolism and blood flow in the cortex, it could be potentially challenging to predict cortical temperature based on the skull temperature, particularly in the deeper part of the cortex.

      (5) The map of estimated CMRO2 looks quite heterogeneous across the brain surface. Could this be partially resulting from the measurement artefact?

      (6) It would be beneficial to provide more detailed justification for using P10 mice in the experiments.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      (1) This manuscript addresses an important problem of the uncoupling of oxidative phosphorylation due to hypoxia-ischemia injury of the neonatal brain and provides insight into the neuroprotective mechanisms of hypothermia treatment.

      The authors used a combination of in vivo imaging of awake P10 mice and experiments on isolated mitochondria to assess various key parameters of the brain metabolism during hypoxia-ischemia with and without hypothermia treatment. This unique approach resulted in a comprehensive data set that provides solid evidence for the derived conclusions

      We thank the reviewer for the positive feedback.

      (2) The experiments were performed acutely on the same day when the surgery was performed. There is a possibility that the physiology of mice at the time of imaging was still affected by the previously applied anesthesia. This is particularly of concern since the duration of anesthesia was relatively long. Is it possible that the observed relatively low baseline OEF (~20%) and trends of increased OEF and CBF over several hours after the imaging start were partially due to slow recovery from prolonged anesthesia? The potential effects of long exposure to anesthesia before imaging experiments were not discussed.

      We thank the reviewer for this important comment and for pointing out the potential influence of anesthesia on the physiological state of the animals. We apologize for any confusion. To clarify, all PAM imaging experiments were conducted in awake animals. Isoflurane anesthesia was used only during two brief surgical procedures: (1) the installation of the head-restraint plastic head plate and (2) the right common carotid artery (CCA) ligation. Each anesthesia session lasted less than 20 minutes.

      We have revised the Methods section to provide additional details:

      For the subsection Procedures for PAM Imaging on page 17, we clarified the sequence of procedures during the head plate installation, as well as the corresponding anesthesia duration:

      “After the applied glue was solidified (~20 min), the animal was first returned to its cage for full recovery from anesthesia, and then carefully moved to the treadmill and secured to the metal arm-piece with two #4–40 screws for awake PAM imaging. The total duration of anesthesia, including preparation and glue solidification, was approximately 20 minutes.”

      For the subsection Neonatal Cerebral HI and Hypothermia Treatment on page 19, we also clarified the CCA ligation procedure:

      “Briefly, P10 mice of both sexes anesthetized with 2% isoflurane were subjected to the right CCA-ligation. To manage pain, 0.25% Bupivacaine was administered locally prior to the surgical procedures, which took less than 10 minutes. After a recovery period for one hour, awake mice were exposed to 10% O<sub>2</sub> for 40 minutes in a hypoxic chamber at 37 °C.”

      Regarding the reviewer’s concern about the observed trends in OEF and CBF, we agree that residual effects of anesthesia could, in principle, influence physiological parameters. However, we believe this is unlikely in this study for the following reasons. First, all imaging was conducted in awake animals after a clearly defined recovery period. Second, the trend of increasing OEF and CBF over time was consistent across animals and aligned with expected physiological responses following hypoxic-ischemic injury. In particular, the relatively low baseline OEF (0.21 at 37°C) is consistent with our previous study (0.25; (Cao et al., 2018)). The gradual increase in CBF and OEF reflects metabolic compensation and reperfusion following hypoxia-ischemia, as previously described (Lin and Powers, 2018). Therefore, we believe the observed changes are of physiological origin rather than anesthesia-related artifacts.

      (3) The Methods Section does not provide information about drugs administered to reduce the pain. If pain was not managed, mice could be experiencing significant pain during experiments in the awake state after the surgery. Since the imaging sessions were long (my impression based on information from the manuscript is that imaging sessions were ~4 hours long or even longer), the level of pain was also likely to change during the experiments. It was not discussed how significant and potentially evolving pain during imaging sessions could have affected the measurements (e.g., blood flow and CMRO<sub>2</sub>). If mice received pain management during experiments, then it was not discussed if there are known effects of used drugs on CBF, CMRO<sub>2</sub>, and lesion size after 24 hr.

      We thank the reviewer for this valuable comment regarding pain management. We confirm that local analgesia was administered to all animals prior to surgical procedures. Specifically, 0.25% Bupivacaine was applied locally before both the head-restraint plate installation and the CCA ligation. These details have now been clarified in the Methods section:

      For the subsection Procedures for PAM Imaging on page 16, we added:

      “To manage pain, 0.25% Bupivacaine was administered locally prior to the surgical procedures.”

      For the subsection Neonatal Cerebral HI and Hypothermia Treatment on page 18, we added:

      “To manage pain, 0.25% Bupivacaine was administered locally prior to the surgical procedures, which took less than 10 minutes.”

      To our knowledge, Bupivacaine has minimal systemic effects at the dose used and is unlikely to significantly alter CBF, CMRO<sub>2</sub>, or lesion development (Greenberg et al., 1998). No other analgesics (e.g., NSAIDs or opioids) were administered unless distress symptoms were observed—which did not occur in this study.

      Additionally, although imaging sessions were extended (up to 2 hours), animals remained calm and showed no signs of pain or distress during or after the procedures. Throughout the experimental period (up to 24 hours post-surgery), animals were monitored for signs of discomfort (e.g., abnormal activity, breathing, or weight gain), but no additional analgesia was required. The neonatal HI procedures are considered minimally invasive, and based on our protocol and prior experience, local Bupivacaine provides effective analgesia during and after the brief surgeries. We have added a corresponding note in the Discussion section (newly added subsection: Limitations in this study, the last paragraph) on page 15:

      “We observed no signs of distress or pain and did not use stress- or pain-reducing drugs during imaging. However, potential effects of stress or residual pain on CBF and CMRO<sub>2</sub> cannot be fully ruled out. Future studies could incorporate more detailed pain assessment and stress-mitigation strategies to further enhance physiological reliability.”

      (4) Animals were imaged in the awake state, but they were not previously trained for the imaging procedure with head restraint. Did animals receive any drugs to reduce stress? Our experience with well-trained young-adult as well as old mice is that they can typically endure 2 and sometimes up to 3 hours of head-restrained awake imaging with intermittent breaks for receiving the rewards before showing signs of anxiety. We do not have experience with imaging P10 mice in the awake state. Is it possible that P10 mice were significantly stressed during imaging and that their stress level changed during the imaging session? This concern about the potential effects of stress on the various measured parameters was not discussed.

      We thank the reviewer for this important comment regarding the potential effects of stress during awake imaging. The neonatal mice used in our study were P10, a stage at which animals are still physiologically immature and relatively inactive. Due to their small size and limited mobility, these animals did not struggle or show signs of distress during the imaging sessions. All animals remained calm and stable throughout the procedure, and no stress-reducing drugs were administered.

      We agree that, unlike older animals, P10 mice are not amenable to prior behavioral training. However, their underdeveloped motor activity and natural docility at this stage allowed for stable head-restrained imaging without inducing overt stress responses. Although no behavioral signs of stress were observed, we acknowledge that subtle physiological effects cannot be entirely excluded. We have added a brief discussion in the Discussion section (newly added subsection: Limitations in this study, the last paragraph) on page 15:

      “Lastly, for awake imaging, the small size of neonatal mice at P10 aids stability during awake PAM imaging, though it limits the feasibility of prior training, which is typically possible in older animals.”

      (5) The temperature of the skull was measured during the hypothermia experiment by lowering the water temperature in the water bath above the animal's head. Considering high metabolism and blood flow in the cortex, it could be challenging to predict cortical temperature based on the skull temperature, particularly in the deeper part of the cortex.

      We thank the reviewer for this helpful comment and for highlighting an important technical consideration. We acknowledge that we did not directly measure intracortical tissue temperature during the hypothermia experiments. While we recognize that relying on skull temperature may have limitations—particularly in reflecting temperature changes in deeper cortical regions—this approach is consistent with clinical practice, where intracortical temperature is typically not measured. Moreover, prior studies have shown that skull or brain surface temperature generally reflects cortical thermal dynamics to a reasonable extent under controlled conditions (Kiyatkin, 2007). We have added the following note in the Discussion section (newly added subsection: Limitations in this study, the 2<sup>nd</sup> paragraph) on page 14:

      “A technical limitation is the absence of direct intracortical temperature measurements during hypothermia; we relied on skull temperature, which may not fully capture temperature dynamics in deeper cortical layers. However, this approach aligns with clinical practice, where intracortical temperature is not typically measured. Future studies could benefit from more precise intracortical assessments.”

      (6) The map of estimated CMRO<sub>2</sub> (Fig. 4B) looks very heterogeneous across the brain surface. Is it a coincidence that the highest CMRO<sub>2</sub> is observed within the central part of the field of view? Is there previous evidence that CMRO<sub>2</sub> in these parts of the mouse cortex could vary a few folds over a 1-2 mm distance?

      We appreciate the reviewer’s insightful observation regarding the spatial heterogeneity observed in the estimated CMRO<sub>2</sub> map (Fig. 4B). This heterogeneity is not a result of scanning bias, as uniform contour scanning was performed across the entire field of view. The higher CMRO<sub>2</sub> values observed in the central region are unlikely to be artifacts and more likely reflect underlying physiological variability.

      Our CMRO<sub>2</sub> estimation is based on an algorithm we previously developed and validated in other tissues. Specifically, we have successfully applied this algorithm to assess oxygen metabolism in the mouse kidney (Sun et al., 2021) and to monitor vascular adaptation and tissue oxygen metabolism during cutaneous wound healing (Sun et al., 2022). These studies demonstrated the algorithm's capability to capture spatial variations in oxygen metabolism. Although the current application to the brain is novel, the algorithm has been validated in controlled experimental settings and shown to produce consistent results. We acknowledge that the observed range of CMRO<sub>2</sub> appears relatively broad across a 1–2 mm distance; however, such heterogeneity may arise from local differences in vascular density, metabolic demand, or tissue oxygenation — all of which can vary across cortical regions, even within small spatial scales. We have added a brief note in the Discussion (Subsection: Optical CMRO<sub>2</sub> detection in neonatal care) on page 13 to acknowledge this point:

      “Additionally, the spatial heterogeneity in estimated CMRO<sub>2</sub> observed in our data may reflect underlying physiological variability, including differences in vascular structure or metabolic demand across cortical regions. Future studies will aim to further validate and interpret these spatial patterns.”

      (7) The justification for using P10 mice in the experiments has not been well presented in the manuscript.

      We thank the reviewer for pointing out the need to clarify our choice of developmental stage. We chose P10 mice for our hypoxia-ischemia injury model because this stage is widely recognized as developmentally comparable to human term infants in terms of brain maturation. This approach has been validated by several previous studies (Clancy et al., 2007; Mallard and Vexler, 2015; Sheldon et al., 2018). We have added the following clarification to the Methods section (Subsection: Neonatal Cerebral HI and Hypothermia Treatment) on page 18:

      “P10 mice were chosen for our experiments as they are widely used to model near-term infants in humans. At this developmental stage, the brain maturation in mice closely parallels that of near-term infants, making them an appropriate model for studying neonatal brain injury and therapeutic interventions (Clancy et al., 2007; Mallard and Vexler, 2015; Sheldon et al., 2018).”

      (8) It was not discussed how the observations made in this manuscript could be affected by the potential discrepancy between the developmental stages of P10 mice and human babies regarding cellular metabolism and neurovascular coupling.

      We thank the reviewer for raising this important point regarding developmental differences between P10 mice and human infants. We have discussed this issue by adding the following statement to the Discussion section (newly added subsection: Limitations in this study, the 1<sup>st</sup> paragraph) on page 15, where we summarize the overall study design and model selection:

      “While P10 mice are widely used to model near-term human infants, developmental differences in cellular metabolism and neurovascular coupling may affect the observed outcomes and limit direct clinical translation (Clancy et al., 2007; Mallard and Vexler, 2015; Sheldon et al., 2018). Nevertheless, the P10 model remains a valuable and widely accepted tool for studying neonatal hypoxia-ischemia mechanisms and evaluating therapeutic interventions.”

      (9) Regarding the brain temperature measurements, the authors should use a new cohort of mice, implant the miniature thermocouples 1 mm, 0.5 mm, and immediately below the skull in different mice, and verify the temperature in the brain cortex under conditions applied in the experiments. The same approach could be applied to a few mice undergoing 4-hr-long hypothermia treatment in a chamber, which will provide information about the brain temperature that resulted in observed protection from the injury.

      We thank the reviewer for this helpful recommendation. We fully agree that direct intracortical temperature measurement would provide more accurate insight into thermal dynamics during hypothermia treatment. However, the primary aim of this study was not to characterize the precise intracortical temperature response under hypothermic conditions, but rather to examine the effects of hypothermia on CMRO<sub>2</sub> and mitochondrial function. Due to the substantial time and resources required to perform direct intracortical temperature monitoring—and considering the technical focus of the current work—we respectfully suggest reserving such investigations for a future study specifically focused on thermal dynamics in hypoxia-ischemia models.

      We have acknowledged this limitation in the subsection Limitations in this study of the Discussion on page 15, noting that skull temperature was used as an approximation of brain temperature and that this approach is consistent with clinical practice, where intracortical temperature is typically not measured. We also note that future studies may benefit from more precise assessments using intracortical probes.

      (10) The mean values presented in Fig. 4G are much lower than the peak values in the 2D panels and potentially were calculated as the average values over the entire field of view. Please provide more details on how CMRO<sub>2</sub> was estimated and if the validity of the measurements is expected across the entire field of view. If there are parts of the field of view where the estimation of CMRO<sub>2</sub> is more reliable for technical reasons, maybe one way to compute the mean values is to restrict the usable data to the more centralized part of the field of view.

      We thank the reviewer for this thoughtful comment. We confirm that CMRO<sub>2</sub> values shown in Figure 4G were calculated as spatial averages over the entire field of view (FOV; ~5 × 3 mm<sup>2</sup>) encompassing both hemicortices, as shown in Figure 1C. Regarding the observed CMRO<sub>2</sub> values, The apparent difference likely reflects a comparison between two different post-HI time points. Specifically, the ~0.5 value shown for the 37°C ipsilateral group in Figure 4G reflects the average CMRO<sub>2</sub> measured 24 hours after HI, while the ~1.5 value in Figure 2D (red line) corresponds to CMRO<sub>2</sub> during the early 0–2 hour post-HI period. The temporal difference accounts for the apparent discrepancy in magnitude. We understand the importance of consistency across the field of view and have clarified this point in the subsection Procedures for PAM Imaging in the Methods on page 17 “For the imaging field covering both hemicortices between the Bregma and Lambda of the neonatal mouse (5 × 3 mm<sup>2</sup> as shown in Figure 1C, with each hemicortex measuring 2.5 × 3 mm<sup>2</sup>)”, as well as in the Figure 4 legend on page 34 “Correlation of CMRO<sub>2</sub> and post-HI brain infarction in mouse neonates at 24 hours”.

      In our model and setup, CMRO<sub>2</sub> estimation is spatially robust across the FOV under standard imaging conditions. We recognize, however, that certain peripheral regions may be more prone to signal attenuation. Future refinement of region selection could further improve spatial averaging strategies. For the current study, full-FOV averaging was used consistently across all groups to maintain comparability.

      (11) Minor: Results presented in Supplementary Tables have too many significant digits.

      Thank you for the helpful suggestion. We have revised Supplementary Tables S1 and S2 to reduce the number of significant digits and improve clarity.

      Reviewer #2 (Public review)

      (1) In this study, authors have hypothesized that mitochondrial injury in HIE is caused by OXPHOS-uncoupling, which is the cause of secondary energy failure in HI. In addition, therapeutic hypothermia rescues secondary energy failure. The methodologies used are state-of-the art and include PAM technique in live animal, bioenergetic studies in the isolated mitochondria, and others.

      The study is comprehensive and impressive. The article is well written and statistical analyses are appropriate.

      We thank the reviewer for the positive feedback.

      (2) The manuscript does not discuss the limitation of this animal model study in view of the clinical scenario of neonatal hypoxia-ischemia.

      We thank the reviewer for this valuable feedback. In response, we have added a dedicated “Limitations in this study” subsection in the Discussion, where we address the potential limitations of this animal model in the context of the clinical scenario of neonatal hypoxia-ischemia in the first paragraph on page 14, including the developmental differences between P10 mice and human infants.

      (3) I see many studies on Pubmed on bioenergetics and HI. Hence, it is unclear what is novel and what is known.

      We thank the reviewer for this important comment regarding the novelty of our study in the context of existing research on bioenergetics and hypoxia-ischemia (HI). To better clarify the novel aspects of our work, we have highlighted the relevant content in the Abstract (page 4) and Introduction (page 5). Specifically, while many studies have explored HI-related bioenergetic dysfunction, the mechanisms by which therapeutic hypothermia modulates CMRO<sub>2</sub> and mitochondrial function post-HI remain poorly understood.

      Abstract on page 4: “However, it is unclear how post-HI hypothermia helps to restore the balance, as cooling reduces CMRO<sub>2</sub>. Also, how transient HI leads to secondary energy failure (SEF) in neonatal brains remains elusive. Using photoacoustic microscopy, we examined the effects of HI on CMRO<sub>2</sub> in awake 10-day-old mice, supplemented by bioenergetic analysis of purified cortical mitochondria.”

      Introduction on page 5: “The use of awake mouse neonates avoided the confounding effects of anesthesia on CBF and CMRO<sub>2</sub> (Cao et al., 2017; Gao et al., 2017; Sciortino et al., 2021; Slupe and Kirsch, 2018). In addition, we measured the oxygen consumption rate (OCR), reactive oxygen species (ROS), and the membrane potential of mitochondria that were immediately purified from the same cortical area imaged by PAM. This dual-modal analysis enabled a direct comparison of cerebral oxygen metabolism and cortical mitochondrial respiration in the same animal. Moreover, we compared the effects of therapeutic hypothermia on oxygen metabolism and mitochondrial respiration, and correlated the extent of CMRO<sub>2</sub>-reduction with the severity of infarction at 24 hours after HI. Our results suggest that blocking HI-induced OXPHOS-uncoupling is an acute effect of hypothermia and that optical detection of CMRO<sub>2</sub> may have clinical applications in HIE.”

      In this study, we propose that uncoupled oxidative phosphorylation (OXPHOS) underlies the secondary energy failure observed after HI, and we demonstrate that hypothermia suppresses this pathological CMRO<sub>2</sub> surge, thereby protecting mitochondrial integrity and preventing injury. Additionally, our use of photoacoustic microscopy (PAM) in awake neonatal mice represents a novel, non-invasive approach to track cerebral oxygen metabolism, with potential clinical relevance for guiding hypothermia therapy.

      (4) What are the limitations of ex-vivo mitochondrial studies?

      We thank the reviewer for this insightful comment. We acknowledge that ex-vivo mitochondrial assays do not fully replicate in vivo physiological conditions, as they lack systemic factors such as blood flow, cellular interactions, and intact tissue architecture. However, these assays are well-established and widely accepted in the field for evaluating mitochondrial function under controlled conditions (Caspersen et al., 2008; Niatsetskaya et al., 2012). Despite their limitations, they enable direct comparisons of mitochondrial activity across experimental groups and provide valuable mechanistic insights that complement in vivo observations.

      (5) PAM technique limits the resolution of the image beyond 500-750 micron depth. Assessing basal ganglia may not be possible with this approach?

      We thank the reviewer for this important comment. We agree that the imaging depth of PAM is limited and may not allow assessment of deeper brain structures such as the basal ganglia. However, in our neonatal HI model—as in many clinical cases of HIE—cortical injury is typically more severe and represents a major focus for mechanistic and therapeutic investigations. The cortical regions assessed with PAM are thus highly relevant to the pathophysiology of neonatal HI. We have now acknowledged this depth limitation in the third paragraph of the newly added Limitations in this study subsection of the Discussion on page 15:

      “Another limitation of this study is the restricted imaging depth of the PAM technique, which is typically less than 1 mm and therefore does not allow assessment of deeper brain structures such as the basal ganglia. However, in both our neonatal HI model and most clinical cases of neonatal hypoxia-ischemia, cortical injury tends to be more prominent and functionally significant. As such, our cortical measurements remain highly relevant for investigating the mechanisms of injury and evaluating therapeutic interventions.”

      (6) Hypothermia in present study reduces the brain temperature from 37 to 29-32 degree centigrade. In clinical set up, head temp is reduced to 33-34.5 in neonatal hypoxia ischemia. Hence a drop in temperature to 29 degrees is much lower relative to the clinical practice. How the present study with greater drop in head temperature can be interpreted for understanding the pathophysiology of therapeutic hypothermia in neonatal HIE. Moreover, in HIE model using higher temperature of 37 and dropping to 29 seems to be much different than the clinical scenario. Please discuss.

      We thank the reviewer for raising this important point regarding temperature ranges in our study. In Figure 1, we used a broader temperature range (down to 29°C) to explore the general relationship between temperature and CMRO<sub>2</sub> in uninjured neonatal mice. This experiment was not intended to model therapeutic hypothermia directly, but rather to characterize the baseline physiological responses.

      For all experiments involving hypothermia as a therapeutic intervention following HI, we consistently maintained a brain temperature of 32°C, which falls within the clinically accepted mild hypothermia range for neonatal HIE (typically 33–34.5°C). We believe this temperature closely mimics clinical practice and supports the translational relevance of our findings.

      (7) NMR was assessed ex-vivo. How does it relate to in vivo assessment. Infants admitted in Neonatal intensive Care Unit, frequently get MRI with spectroscopy. How do the MRS findings in human newborns with HIE correlate with the ex-vivo evaluation of metabolites.

      We thank the reviewer for this insightful question. While our study assessed brain metabolites ex vivo, similar metabolic changes have been observed in vivo using proton magnetic resonance spectroscopy (¹H-MRS) in infants with HIE. Specifically, reductions in N-acetylaspartate (NAA) — a marker of neuronal integrity — have been reported in neonates with severe brain injury, aligning with our ex vivo findings. This correlation between in vivo and ex vivo assessments supports the translational relevance of our model for studying metabolic disruption in neonatal HIE. We have added this point to the subsection Using Optically Measured CMRO<sub>2</sub> to Detect Neonatal HI Brain Injury of the Results on page 8, along with a supporting clinical reference (Lally et al., 2019):

      “In addition, in vivo proton MRS in infants with HIE has also shown a reduction in NAA, particularly in cases of severe injury (Lally et al., 2019). This reduction in NAA, observed in neonatal intensive care settings, reflects neuronal and axonal loss or dysfunction and serves as a biomarker for injury severity. The alignment between our ex vivo observations and in vivo MRS findings in clinical studies reinforces the translational relevance of our model for investigating metabolic disturbances in neonatal HIE.”

      Reviewer #3 (Public review)

      (1) In Sun et al. present a comprehensive study using a novel photoacoustic microscopy setup and mitochondrial analysis to investigate the impact of hypoxia-ischemia (HI) on brain metabolism and the protective role of therapeutic hypothermia. The authors elegantly demonstrate three connected findings: (1) HI initially suppresses brain metabolism, (2) subsequently triggers a metabolic surge linked to oxidative phosphorylation uncoupling and brain damage, and (3) therapeutic hypothermia mitigates HI-induced damage by blocking this surge and reducing mitochondrial stress.

      The study's design and execution are great, with a clear presentation of results and methods. Data is nicely presented, and methodological details are thorough.

      We thank the reviewer for the positive feedback.

      (2) However, a minor concern is the extensive use of abbreviations, which can hinder readability. As all the abbreviations are introduced in the text, their overuse may render the text hard to read to non-specialist audiences. Additionally, sharing the custom Matlab and other software scripts online, particularly those used for blood vessel segmentation, would be a valuable resource for the scientific community. In addition, while the study focuses on the short-term effects of HI, exploring the long-term consequences and definitively elucidating HI's impact on mitochondria would further strengthen the manuscript's impact.

      We thank the reviewer for these valuable suggestions. Please find our point-by-point responses below:

      Abbreviations: To improve readability, we have added a List of Abbreviations on page 3 to help readers, especially non-specialists, navigate the terminology more easily.

      MATLAB Code Availability: The methodology for blood vessel segmentation was described in detail in our previous publication (Sun et al., 2020). We have now updated the subsection Quantification of Cerebral Hemodynamics and Oxygen Metabolism by PAM of the Methods on page 18 to provide additional details and have indicated that the MATLAB scripts are available upon request.

      “Briefly, this process involves generating a vascular map using signal amplitude from the Hilbert transformation, selecting a region slightly larger than the vessel of interest, and applying Otsu’s thresholding method to remove background pixels. Isolated or spurious boundary fragments are then removed to improve boundary smoothness. The customized MATLAB code used for vessel segmentation is available upon request.”

      Long-Term Effects of Hypothermia: We agree that exploring long-term outcomes would enhance the broader impact of this research. While our study focuses on the acute phase following HI, prior studies have shown long-term neuroprotective benefits of therapeutic hypothermia, such as enhanced white matter development (Koo et al., 2017). We have added this point to the fourth paragraph in the subsection Limitations in this study of the Discussion on page 15:

      “While our study focuses on the acute effects of hypothermia, previous research has shown long-term neuroprotective benefits, including improved white matter development post-injury (Koo et al., 2017). These findings highlight hypothermia's potential for both immediate and extended recovery, warranting further study of long-term outcomes.”

      (3) Extensive use of abbreviations.

      Thank you for the helpful suggestion. To improve readability for a broader audience, we have added a List of Abbreviations on page 3 of the manuscript to assist readers in navigating terminology used throughout the text. This has been included as Response #2 to Reviewer #3.

      (4) Share code used to conduct the study.

      Thank you for the suggestion. The methodology for vessel segmentation was previously published (Sun et al., 2020), and we have noted in the subsection Quantification of Cerebral Hemodynamics and Oxygen Metabolism by PAM of the Methods on page 18 that the MATLAB code is available upon request. This has also been included as Response #2 to Reviewer #3.

      Reference:

      Cao R, Li J, Kharel Y, Zhang C, Morris E, Santos WL, Lynch KR, Zuo Z, Hu S. 2018. Photoacoustic microscopy reveals the hemodynamic basis of sphingosine 1-phosphate-induced neuroprotection against ischemic stroke. Theranostics 8:6111–6120. doi:10.7150/thno.29435

      Caspersen CS, Sosunov A, Utkina-Sosunova I, Ratner VI, Starkov AA, Ten VS. 2008. An Isolation Method for Assessment of Brain Mitochondria Function in Neonatal Mice with Hypoxic-Ischemic Brain Injury. Developmental Neuroscience 30:319–324. doi:10.1159/000121416

      Clancy B, Kersh B, Hyde J, Darlington RB, Anand KJS, Finlay BL. 2007. Web-based method for translating neurodevelopment from laboratory species to humans. Neuroinformatics 5:79–94. doi:10.1385/ni:5:1:79

      Greenberg RS, Zahurak M, Belden C, Tunkel DE. 1998. Assessment of oropharyngeal distance in children using magnetic resonance imaging. Anesth Analg 87:1048–1051. doi:10.1097/00000539-199811000-00014

      Kiyatkin EA. 2007. Brain temperature fluctuations during physiological and pathological conditions. Eur J Appl Physiol 101:3–17. doi:10.1007/s00421-007-0450-7

      Koo E, Sheldon RA, Lee BS, Vexler ZS, Ferriero DM. 2017. Effects of therapeutic hypothermia on white matter injury from murine neonatal hypoxia-ischemia. Pediatr Res 82:518–526. doi:10.1038/pr.2017.75

      Lally PJ, Montaldo P, Oliveira V, Soe A, Swamy R, Bassett P, Mendoza J, Atreja G, Kariholu U, Pattnayak S, Sashikumar P, Harizaj H, Mitchell M, Ganesh V, Harigopal S, Dixon J, English P, Clarke P, Muthukumar P, Satodia P, Wayte S, Abernethy LJ, Yajamanyam K, Bainbridge A, Price D, Huertas A, Sharp DJ, Kalra V, Chawla S, Shankaran S, Thayyil S, MARBLE consortium. 2019. Magnetic resonance spectroscopy assessment of brain injury after moderate hypothermia in neonatal encephalopathy: a prospective multicentre cohort study. Lancet Neurol 18:35–45. doi:10.1016/S1474-4422(18)30325-9

      Lin W, Powers WJ. 2018. Oxygen metabolism in acute ischemic stroke. J Cereb Blood Flow Metab 38:1481–1499. doi:10.1177/0271678X17722095

      Mallard C, Vexler Z. 2015. Modeling ischemia in the immature brain: how translational are animal models? Stroke 46:3006–3011. doi:10.1161/STROKEAHA.115.007776

      Niatsetskaya ZV, Sosunov SA, Matsiukevich D, Utkina-Sosunova IV, Ratner VI, Starkov AA, Ten VS. 2012. The Oxygen Free Radicals Originating from Mitochondrial Complex I Contribute to Oxidative Brain Injury Following Hypoxia–Ischemia in Neonatal Mice. J Neurosci 32:3235–3244. doi:10.1523/JNEUROSCI.6303-11.2012

      Sheldon RA, Windsor C, Ferriero DM. 2018. Strain-Related Differences in Mouse Neonatal Hypoxia-Ischemia. Dev Neurosci 40:490–496. doi:10.1159/000495880

      Sun N, Bruce AC, Ning B, Cao R, Wang Y, Zhong F, Peirce SM, Hu S. 2022. Photoacoustic microscopy of vascular adaptation and tissue oxygen metabolism during cutaneous wound healing. Biomed Opt Express, BOE 13:2695–2706. doi:10.1364/BOE.456198

      Sun N, Ning B, Bruce AC, Cao R, Seaman SA, Wang T, Fritsche-Danielson R, Carlsson LG, Peirce SM, Hu S. 2020. In vivo imaging of hemodynamic redistribution and arteriogenesis across microvascular network. Microcirculation 27:e12598. doi:10.1111/micc.12598

      Sun N, Zheng S, Rosin DL, Poudel N, Yao J, Perry HM, Cao R, Okusa MD, Hu S. 2021. Development of a photoacoustic microscopy technique to assess peritubular capillary function and oxygen metabolism in the mouse kidney. Kidney International 100:613–620. doi:10.1016/j.kint.2021.06.018

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      The medicinal leech preparation is an amenable system in which to understand how the underlying cellular networks for locomotion function. A previously identified non-spiking neuron (NS) was studied and found to alter the mean firing frequency of a crawl-related motoneuron (DE-3), which fires during the contraction phase of crawling. The data are mostly solid. Identifying upstream neurons responsible for crawl motor patterning is essential for understanding how rhythmic behavior is controlled.

      Review of Revision: 

      On a positive note, the rationale for the study is clearer to me now after reading the authors' responses to both reviewers, but that information, as described in the authors' responses, is minimally incorporated into the current revised paper. Incorporating a discussion of previous work on the NS cell has, indeed, improved the paper. 

      I suggested earlier that the paper be edited for clarity but not much text has been changed since the first draft. I will provide an example of the types of sentences that are confusing. The title of the paper is: "Phase-specific premotor inhibition modulates leech rhythmic motor output". Are the authors referring to the inhibition created by premotor neurons (e.g., on to the motoneurons) or the inhibition that the premotor neurons receive? 

      In this case, this is an interesting ambiguity: NS is inhibited and that inhibition is directly transmitted to the motoneurons because both cells are electrically coupled.  We believe that the title does not disguise the findings conveyed by the manuscript.

      I also find the paper still confusing with regard to the suggested "functional homology" with the vertebrate Renshaw cells. When the authors set up this expectation of homology (should be analogy) in the introduction and other sections of the paper, one would assume that the NS cell would be directly receiving excitation from a motoneuron (like DE-3) and, in turn, the motoneuron would then receive some sort of inhibitory input to regulate its firing frequency. Essentially, I have always viewed the Renshaw cells as nature's clever way to monitor the ongoing activity of a motoneuron while also providing recurrent feedback or "recurrent inhibition" to modify that cell's excitatory state. The authors present their initial idea below on line 62. Authors write: "These neurons are present as bilateral pairs in each segmental ganglion and are functional homologs of the mammalian Renshaw cells (Szczupak, 2014). These spinal cord cells receive excitatory inputs from motoneurons and, in turn, transmit inhibitory signals to the motoneurons (Alvarez and Fyffe, 2007)." 

      We agree with Reviewer #2: the correct term is "analogous," not "homologous." Thanks for pointing this out. We changed the term throughout the text.

      The Reviewer is also right in the appreciation of the role of Renshaw cells. NS plays exactly the role that the Reviewer expresses. The ONLY difference is that NS is inhibited by the motoneurons, and in turn transmits this inhibition to the motoneurons via the rectifying electrical junctions. Attending the confusion that our description caused in the Reviewer, we have modified the cited sentence accordingly now in lines 65-67.

      Minor note:

      I suggest re-writing this last sentence as "these" is confusing. Change to: 'In the spinal cord, Renshaw interneurons receive excitatory inputs from motoneurons and, in turn, transmit inhibitory signals to them (Alvarez and Fyffe, 2007).'] 

      Please, see the changes mentioned above.

      Furthermore, the authors note that (line 69 on): "In the context of this circuit the activity of excitatory motoneurons evokes chemically mediated inhibitory synaptic potentials in NS. Additionally, the NS neurons are electrically coupled......In physiological conditions this coupling favors the transmission of inhibitory signals from NS to motoneurons." Based on what is being conveyed here, I see a disconnect with the "functional homology" being presented earlier. I may be missing something, but the Renshaw analogy seems to be quite different compared to what looks like reciprocal inhibition in the leech. If the authors want to make the analogy to Renshaw cells clearer, then they should make a simple ball and stick diagram of the leech system and visually compare it to the Renshaw/motoneuron circuit with regard to functionality. This simple addition would help many readers. 

      We have simplified the description regarding the Renshaw cell (lines 65-67) to avoid the “details” of the connectivity between the two circuits.

      This report focuses on NS neurons and their role in crawling; we mention the analogy with Renshaw cells to widen the interest of the results. We do not think that making a special diagram to compare how the two neurons play a similar role via different connections among the players is useful in the context of this manuscript.

      The Abstract, Authors write (line 19), "Specifically, we analyzed how electrophysiological manipulation of a premotor nonspiking (NS) neuron, that forms a recurrent inhibitory circuit (homologous to vertebrate Renshaw cells)...."

      First, a circuit would not be homologous to a cell, and the term homology implies a strict developmental/evolutionary commonality. At best, I would use the term functionally analogous but even then I am still not sure that they are functionally that similar (see comments above). 

      Reviewer #2 is right. We changed the sentence in line 20.

      Line 22: "The study included a quantitative analysis of motor units active throughout the fictive crawling cycle that shows that the rhythmic motor output in isolated ganglia mirrors the phase relationships observed in vivo." This sentence must be revised to indicate that not all of the extracellular units were demonstrated to be motor units. Revise to: "The study included a quantitative analysis of identified and putative motor units active throughout the fictive crawling cycle that shows.....' 

      Line 187 regarding identifying units as motoneurons: Authors write, "While multiple extracellular recordings have been performed previously (Eisenhart et al., 2000), these results (Figure 4) present the first quantitative analysis of motor units activated throughout the crawling cycle in this type of recordings." The authors cannot assume that the units in the recorded nerves belong only to motoneurons. Based on their first rebuttal, the authors seem to be reluctant to accept the idea that the extracellularly recorded units might represent a different class of neurons. They admit that some sensory neurons (with somata located centrally) do, indeed, travel out the same nerves recorded, but go on to explain why they would not be active. 

      The leech has a variety of sensory organs that are located in the periphery, and some of these sensory neurons do show rhythmic activity correlated with locomotor activity (see Blackshaw's early work). The numerous stretch receptors, in fact, have very large axons that pass through all the nerves recorded in the current paper. 

      In Fig. 4, it is interesting that the waveforms of all the units recorded in the PP nerve exhibit a reversal in waveform as compared to those in the DP nerve, which might indicate (based on bipolar differential recording) that the units in the PP nerve are being propagated in the opposite direction (i.e., are perhaps afferent). Rhythmic presynaptic inhibition and excitation is commonly seen for stretch receptors within the CNS (see the work of Burrows) and many such cells are under modulatory control. 

      Most likely, the majority of the units are from motoneurons, but we do not really know at this point. The authors should reframe their statements throughout the paper as: 'While multiple extracellular recordings have been performed previously (Eisenhart et al., 2000), these results (Figure 4) present the first quantitative analysis of multiple extracellular units, using spike sorting methods, which are activated throughout the crawling cycle.' In cases where the identity of the unit is known, then it is fine to state that, but when the identity of the unit is not known, then there should be some qualification and stated as 'putative motor units' 

      We understand the concern of Reviewer #2 regarding the type of neurons active during dopamine-induced crawling in isolated ganglia. However, we believe there is sufficient evidence to support that the recorded spikes originate from motoneurons. As readers may share the same concern, we have added a paragraph explaining why spikes from somatic sensory neurons such as P or T cells, or from stretch receptors, are unlikely to contribute (lines 206-214). We included the term putative in the abstract.

      The Methods section:

      Needs to include the full parameters that were used to assess whether bursting activity was qualified in ways to be considered crawling activity or not. Typically, crawl-like burst periods of no more than 25 seconds have been the limit for their qualification as crawling activity. In Fig 2F, for example, the inter-burst period is over 35 seconds; that coupled with an average 5 second burst duration would bring the burst period to 40 seconds, which is substantially out of range for there to be bursting relevant to crawl activity. Simply put, long DE-3 burst periods are often observed but may not be indicative of a crawl state as the CV motoneurons are no longer out of phase with DE-3. A number of papers have adopted this criterion. 

      We now indicate in the methods the range of period values measured in our experiments.  For the reviewer informatio we show here histograms depicting the variability of period and duty cycle values recorded in our experiments (control conditions). The Reviewer can see that the bursting activity of DE-3 fall within what has been published.

      Author response image 1.

      Crawling in isolated ganglia. A. Histogram of periods end-to-end during crawling in isolated ganglia. The dotted line indicates the mean obtained from the averages of all experiments. The solid black line represents the mean of all cycles across all experiments. B. As in A, for the duty cycle calculated using end-to-end periods.  (n = 210 cycles from 45 ganglia obtained from 32 leeches in all cases).

      Reviewer #1 (Recommendations for the authors): 

      Minor comments-

      Line 100: "In the frame of the recurrent inhibitory circuit, NS is the target of inhibitory signals". Suggestion: 'Within the framework of the recurrent inhibitory circuit, NS is the target of inhibitory signals.' 

      Changed as suggested (line 107).

      Line 163: "This series of experiments proves that, as predicted based on the known circuit (Figure 164 1C), inhibitory signals onto NS premotor neurons were transmitted to DE-3 motoneurons and counteracted their excitatory drive during crawling, limiting their firing frequency". I think this sentence is too strong plus needs some editing. Suggestion: 'As predicted based on the known circuit (Figure 164 1C), this series of experiments indicates that inhibitory signals onto NS premotor neurons are transmitted to DE-3 motoneurons, thus limiting their firing frequency and counteracting their excitatory drive during crawling."

      Changed as suggested.

      Lines 86, 292 and 304 and Fig 4 legend: "Different from DE-3, In-Phase units showed a marked decrease in the maximum bFF along time." Suggestion: Replace the word "along" with 'across' time. Also replace those words in the Fig 4 legend and Line 80...."along" (replace with 'across') the different stages of crawling. 

      Changed as suggested.

      Line 311: "bursts and a concurrent inhibitory input via NS (Figure 7). Coherent with this interpretation, the activity level of the Anti- Phase units was not influenced by these inhibitory signals". Suggestion: Replace the word "coherent" with 'consistent'. 

      Changed as suggested.

      Line 332: "...offer the particular advantage of allowing electrical manipulation of individual neurons in wildtype adults," I am unsure what the authors are attempting to convey. Not sure what they mean by "wildtype" in this context and why that would matter. 

      “wildtype” was eliminated

      We thank Reviewer #2 for the suggested edits to the text.

    1. Reviewer #1 (Public review):

      Summary:

      This study advances the lab's growing body of evidence exploring higher-order learning and its neural mechanisms. They recently found that NMDA receptor activity in the perirhinal cortex was necessary for integrating stimulus-stimulus associations with stimulus-shock associations (mediated learning) to produce preconditioned fear, but it was not necessary for forming stimulus-shock associations. On the other hand, basolateral amygdala NMDA receptor activity is required for forming stimulus-shock memories. Based on these facts, the authors assessed: 1. why the perirhinal cortex is necessary for mediated learning but not direct fear learning and 2. the determinants of perirhinal cortex versus basolateral amygdala necessity for forming direct versus indirect fear memories. The authors used standard sensory preconditioning and variants designed to manipulate the novelty and temporal relationship between stimuli and shock and, therefore, the attentional state under which associative information might be processed. Under experimental conditions where information would presumably be processed primarily in the periphery of attention (temporal distance between stimulus/shock or stimulus pre-exposure), perirhinal cortex NMDA receptor activation was required for learning indirect associations. On the other hand, when information would likely be processed in focal attention (novel stimulus contiguous with shock), basolateral amygdala NMDA activity was required for learning direct associations. Together, the findings indicate that the perirhinal cortex and basolateral amygdala subserve peripheral and focal attention, respectively. The authors provide support for their conclusions using careful, hypothesis-driven experimental design, rigorous methods, and integrating their findings with the relevant literature on learning theory, information processing, and neurobiology. Therefore, this work will be highly interesting to several fields.

      Strengths:

      (1) The experiments were carefully constructed and designed to test hypotheses that were rooted in the lab's previous work, in addition to established learning theory and information processing background literature.

      (2) There are clear predictions and alternative outcomes. The provided table does an excellent job of condensing and enhancing the readability of a large amount of data.

      (3) In a broad sense, attention states are a component of nearly every behavioral experiment. Therefore, identifying their engagement by dissociable brain areas and under different learning conditions is an important area of research.

      (4) The authors clearly note where they replicated their own findings, report full statistical measures, effect sizes, and confidence intervals, indicating the level of scientific rigor.

      (5) The findings raise questions for future experiments that will further test the authors' hypotheses; this is well discussed.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study advances the lab's growing body of evidence exploring higher-order learning and its neural mechanisms. They recently found that NMDA receptor activity in the perirhinal cortex was necessary for integrating stimulus-stimulus associations with stimulus-shock associations (mediated learning) to produce preconditioned fear, but it was not necessary for forming stimulus-shock associations. On the other hand, basolateral amygdala NMDA receptor activity is required for forming stimulus-shock memories. Based on these facts, the authors assessed: (1) why the perirhinal cortex is necessary for mediated learning but not direct fear learning, and (2) the determinants of perirhinal cortex versus basolateral amygdala necessity for forming direct versus indirect fear memories. The authors used standard sensory preconditioning and variants designed to manipulate the novelty and temporal relationship between stimuli and shock and, therefore, the attentional state under which associative information might be processed. Under experimental conditions where information would presumably be processed primarily in the periphery of attention (temporal distance between stimulus/shock or stimulus pre-exposure), perirhinal cortex NMDA receptor activation was required for learning indirect associations. On the other hand, when information would likely be processed in focal attention (novel stimulus contiguous with shock), basolateral amygdala NMDA activity was required for learning direct associations. Together, the findings indicate that the perirhinal cortex and basolateral amygdala subserve peripheral and focal attention, respectively. The authors provide support for their conclusions using careful, hypothesis-driven experimental design, rigorous methods, and integrating their findings with the relevant literature on learning theory, information processing, and neurobiology. Therefore, this work will be highly interesting to several fields.

      Strengths:

      (1) The experiments were carefully constructed and designed to test hypotheses that were rooted in the lab's previous work, in addition to established learning theory and information processing background literature.

      (2) There are clear predictions and alternative outcomes. The provided table does an excellent job of condensing and enhancing the readability of a large amount of data.

      (3) In a broad sense, attention states are a component of nearly every behavioral experiment. Therefore, identifying their engagement by dissociable brain areas and under different learning conditions is an important area of research.

      (4) The authors clearly note where they replicated their own findings, report full statistical measures, effect sizes, and confidence intervals, indicating the level of scientific rigor.

      (5) The findings raise questions for future experiments that will further test the authors' hypotheses; this is well discussed.

      Weaknesses:

      As a reader, it is difficult to interpret how first-order fear could be impaired while preconditioned fear is intact; it requires a bit of "reading between the lines".

      We appreciate the Reviewer’s point and have attempted to address on lines 55-63 of the revised paper: “In a recent pair of studies, we extended these findings in two ways. First, we showed that S1 does not just form an association with shock in stage 2; it also mediates an association between S2 and the shock. Thus, S2 enters testing in stage 3 already conditioned, able to elicit fear responses (Wong et al., 2019). Second, we showed that this mediated S2-shock association requires NMDAR-activation in the PRh, as well as communication between the PRh and BLA (Wong et al., 2025). These findings raise two critical questions: 1) why is the PRh engaged for mediated conditioning of S2 but not for direct conditioning of S1; and 2) more generally, what determines whether the BLA and/or PRh is engaged for conditioning of the S1 and/or S2?”

      Reviewer #2 (Public review):

      Summary:

      This paper continues the authors' research on the roles of the basolateral amygdala (BLA) and the perirhinal cortex (PRh) in sensory preconditioning (SPC) and second-order conditioning (SOC). In this manuscript, the authors explore how prior exposure to stimuli may influence which regions are necessary for conditioning to the second-order cue (S2). The authors perform a series of experiments which first confirm prior results shown by the author - that NMDA receptors in the PRh are necessary in SPC during conditioning of the first-order cue (S1) with shock to allow for freezing to S2 at test; and that NMDA receptors in the BLA are necessary for S1 conditioning during the S1-shock pairings. The authors then set out to test the hypothesis that the PRh encodes associations in a peripheral state of attention, whereas the BLA encodes associations in a focal state of attention, similar to the A1 and A2 states in Wagner's theory of SOP. To do this, they show that BLA is necessary for conditioning to S2 when the S2 is first exposed during a serial compound procedure - S2-S1-shock. To determine whether pre-exposure of S2 will shift S2 to a peripheral focal state, the authors run a design in which S2-S1 presentations are given prior to the serial compound phase. The authors show that this restores NMDA receptor activity within the PRh as necessary for the fear response to S2 at test. They then test whether the presence of S1 during the serial compound conditioning allows the PRh to support the fear responses to S2 by introducing a delay conditioning paradigm in which S1 is no longer present. The authors find that PRh is no longer required and suggest that this is due to S2 remaining in the primary focal state.

      Strengths:

      As with their earlier work, the authors have performed a rigorous series of experiments to better understand the roles of the BLA and PRh in the learning of first- and second-order stimuli. The experiments are well-designed and clearly presented, and the results show definitive differences in functionality between the PRh and BLA. The first experiment confirms earlier findings from the lab (and others), and the authors then build on their previous work to more deeply reveal how these regions differ in how they encode associations between stimuli. The authors have done a commendable job of pursuing these questions.

      Table 1 is an excellent way to highlight the results and provide the reader with a quick look-up table of the findings.

      Weaknesses:

      The authors have attempted to resolve the question of the roles of the PRh and BLA in SPC and SOC, which the authors have explored in previous papers. Laudably, the authors have produced substantial results indicating how these two regions function in the learning of first- and second-order cues, providing an opportunity to narrow in on possible theories for their functionality. Yet the authors have framed this experiment in terms of an attentional framework and have argued that the results support this particular framework and hypothesis - that the PRh encodes peripheral and the BLA encodes focal states of learning. This certainly seems like a viable and exciting hypothesis, yet I don't see why the results have been completely framed and interpreted this way. It seems to me that there are still some alternative interpretations that are plausible and should be included in the paper.

      We appreciate the Reviewer’s point and have attempted to address it on lines 566-594 of the Discussion: “An additional point to consider in relation to Experiments 3A, 3B, 4A and 4B is the level of surprise that rats experienced following presentations of the familiar S2 in stage 2. Specifically, in Experiments 3A and 3B, S2 was followed by the expected S1 (low surprise) and its conditioning required activation of NMDA receptors in the PRh and not the BLA. By contrast, in Experiments 4A and 4B, S2 was followed by omission of the expected S1 (high surprise) and its conditioning required activation of NMDA receptors in the BLA and not the PRh. This raises the possibility that surprise, or prediction error, also influences the way that S2 is processed in focal and peripheral states of attention. When prediction error is low, S2 is processed in the peripheral state of attention: hence, learning under these circumstances requires NMDA receptor activation in the PRh and not the BLA. By contrast, when prediction error is high, S2 is preserved in the focal state of attention: hence, learning under these circumstances requires NMDA receptor activation in the BLA and not the PRh. The impact of prediction error on the processing of S2 could be assessed using two types of designs. In the first design, rats are pre-exposed to S2-S1 pairings in stage 1 and this is followed by S2-S3-shock pairings in stage 2. The important feature of this design is that, in stage 2, the S2 is followed by surprise in omission of S1 and presentation of S3. Thus, if a large prediction error maintains processing of the familiar S2 in the BLA, we might expect that its conditioning in this design would require NMDA receptor activation in the BLA (in contrast to the results of Experiment 3B) and no longer require NMDA receptor activation in the PRh (in contrast to the results of Experiment 3A). In the second design, rats are pre-exposed to S2 alone in stage 1 and this is followed by S2-[trace]-shock pairings in stage 2. The important feature of this design is that, in stage 2, the S2 is not followed by the surprising omission of any stimulus. Thus, if a small prediction error shifts processing of the familiar S2 to the PRh, we might expect that its conditioning in this design would no longer require NMDA receptor activation in the BLA (in contrast to the results of Experiment 4B) but, instead, require NMDA receptor activation in the PRh (in contrast to the results of Experiment 4A). Future studies will use both designs to determine whether prediction error influences the processing of S2 in the focus versus periphery of attention and, thereby, whether learning about this stimulus requires NMDA receptor activation in the BLA or PRh.”

      Reviewer #3 (Public review):

      Summary:

      This manuscript presents a series of experiments that further investigate the roles of the BLA and PRH in sensory preconditioning, with a particular focus on understanding their differential involvement in the association of S1 and S2 with shock.

      Strengths:

      The motivation for the study is clearly articulated, and the experimental designs are thoughtfully constructed. I especially appreciate the inclusion of Table 1, which makes the designs easy to follow. The results are clearly presented, and the statistical analyses are rigorous. My comments below mainly concern areas where the writing could be improved to help readers more easily grasp the logic behind the experiments.

      Weaknesses:

      (1) Lines 56-58: The two previous findings should be more clearly summarized. Specifically, it's unclear whether the "mediated S2-shock" association occurred during Stage 2 or Stage 3. I assume the authors mean Stage 2, but Stage 2 alone would not yet involve "fear of S2," making this expression a bit confusing.

      We apologise for the confusion and have revised the summary of our previous findings on lines 55-63. The revised text now states: “In a recent pair of studies, we extended these findings in two ways. First, we showed that S1 does not just form an association with shock in stage 2; it also mediates an association between S2 and the shock. Thus, S2 enters testing in stage 3 already conditioned, able to elicit fear responses (Wong et al., 2019). Second, we showed that this mediated S2-shock association requires NMDAR-activation in the PRh, as well as communication between the PRh and BLA (Wong et al., 2025). These findings raise two critical questions: 1) why is the PRh engaged for mediated conditioning of S2 but not for direct conditioning of S1; and 2) more generally, what determines whether the BLA and/or PRh is engaged for conditioning of the S1 and/or S2?”

      (2) Line 61: The phrase "Pavlovian fear conditioning" is ambiguous in this context. I assume it refers to S1-shock or S2-shock conditioning. If so, it would be clearer to state this explicitly.

      Apologies for the ambiguity - we have omitted the term “Pavlovian” which may have been the source of confusion: The revised text on lines 60-63 now states: “These findings raise two critical questions: 1) why is the PRh engaged for mediated conditioning of S2 but not for direct conditioning of S1; and 2) more generally, what determines whether the BLA and/or PRh is engaged for conditioning of the S1 and/or S2?”

      (3) Regarding the distinction between having or not having Stage 1 S2-S1 pairings, is "novel vs. familiar" the most accurate way to frame this? This terminology could be misleading, especially since one might wonder why S2 couldn't just be presented alone on Stage 1 if novelty is the critical factor. Would "outcome relevance" or "predictability" be more appropriate descriptors? If the authors choose to retain the "novel vs. familiar" framing, I suggest providing a clear explanation of this rationale before introducing the predictions around Line 118.

      We have incorporated the suggestion regarding “predictability” while also retaining “novelty” as follows. 

      L76-85: “For example, different types of arrangements may influence the substrates of conditioning to S2 by influencing its novelty and/or its predictive value at the time of the shock, on the supposition that familiar stimuli are processed in the periphery of attention and, thereby, the PRh (Bogacz & Brown, 2003; Brown & Banks, 2015; Brown & Bashir, 2002; Martin et al., 2013; McClelland et al., 2014; Morillas et al., 2017; Murray & Wise, 2012; Robinson et al., 2010; Suzuki & Naya, 2014; Voss et al., 2009; Yang et al., 2023) whereas novel stimuli are processed in the focus of attention and, thereby, the amygdala (Holmes et al., 2018; Qureshi et al., 2023; Roozendaal et al., 2006; Rutishauser et al., 2006; Schomaker & Meeter, 2015; Wright et al., 2003).”

      L116-120: “Subsequent experiments then used variations of this protocol to examine whether the engagement of NMDAR in the PRh or BLA for Pavlovian fear conditioning is influenced by the novelty/predictive value of the stimuli at the time of the shock (second implication of theory) as well as their distance or separation from the shock (third implication of theory; Table 1).”

      (4) Line 121: This statement should refer to S1, not S2.

      (5) Line 124: This one should refer to S2, not S1.

      We have checked the text on these lines for errors and confirmed that the statements are correct. The lines encompassing this text (L121-130) are reproduced here for convenience:

      (1) When rats are exposed to novel S2-S1-shock sequences, conditioning of S2 and S1 will be disrupted by a DAP5 infusion into the BLA but not into the PRh (Experiments 2A and 2B);

      (2) When rats are exposed to S2-S1 pairings and then to S2-S1-shock sequences, conditioning of S2 will be disrupted by a DAP5 infusion into the PRh but not the BLA whereas conditioning of S1 will be disrupted by a DAP5 infusion into the BLA not the PRh (Experiments 3A and 3B);

      (3) When rats are exposed to S2-S1 pairings and then to S2 (trace)-shock pairings, conditioning of S2 will be disrupted by a DAP5 into the BLA not the PRh (Experiments 4A and 4B).

      (6) Additionally, the rationale for Experiment 4 is not introduced before the Results section. While it is understandable that Experiment 4 functions as a follow-up to Experiment 3, it would be helpful to briefly explain the reasoning behind its inclusion.

      Experiment 4 follows from the results obtained in Experiment 3; and, as noted, the reasoning for its inclusion is provided locally in its introduction. We attempted to flag this experiment earlier in the general introduction to the paper; but this came at the cost of clarity to the overall story. As such, our revised paper retains the local introduction to this experiment. It is reproduced here for convenience:

      “In Experiments 3A and 3B, conditioning of the pre-exposed S1 required NMDAR-activation in the BLA and not the PRh; whereas conditioning of the pre-exposed S2 required NMDAR-activation in the PRh and not the BLA. We attributed these findings to the fact that the pre-exposed S2 was separated from the shock by S1 during conditioning of the S2-S1-shock sequences in stage 2: hence, at the time of the shock, S2 was no longer processed in the focal state of attention supported by the BLA; instead, it was processed in the peripheral state of attention supported by the PRh.

      “Experiments 4A and 4B employed a modification of the protocol used in Experiments 3A and 3B to examine whether a pre-exposed S1 influences the processing of a pre-exposed S2 across conditioning with S2-S1-shock sequences. The design of these experiments is shown in Figure 4A. Briefly, in each experiment, two groups of rats were exposed to a session of S2-S1 pairings in stage 1 and, 24 hours later, a session of S2-[trace]-shock pairings in stage 2, where the duration of the trace interval was equivalent to that of S1 in the preceding experiments. Immediately prior to the trace conditioning session in stage 2, one group in each experiment received an infusion of DAP5 or vehicle only into either the PRh (Experiment 4A) or BLA (Experiment 4B). Finally, all rats were tested with presentations of the S2 alone in stage 3. If the substrates of conditioning to S2 are determined only by the amount of time between presentations of this stimulus and foot shock in stage 2, the results obtained in Experiments 4A and 4B should be the same as those obtained in Experiments 3A and 3B: acquisition of freezing to S2 will require activation of NMDARs in the PRh and not the BLA. If, however, the presence of S1 in the preceding experiments (Experiments 3A and 3B) accelerated the rate at which processing of S2 transitioned from the focus of attention to its periphery, the results obtained in Experiments 4A and 4B will differ from those obtained in Experiments 3A and 3B. That is, in contrast to the preceding experiments where acquisition of freezing to S2 required NMDAR-activation in the PRh and not the BLA, here acquisition of freezing to S2 should require NMDAR-activation in the BLA but not the PRh.”

      Reviewer #1 (Recommendations for the authors):

      I greatly enjoyed reading and reviewing this manuscript, and so I only have boilerplate recommendations.

      (1) I might add a couple of sentences discussing how/why preconditioned fear could be intact while first-order fear is impaired. Of course, if I am interpreting the provided interpretation correctly, the reason is that peripheral processing is still intact even when BLA NMDA receptors are blocked, and so mediated conditioning still occurs. Does this mean that mediated conditioning does not require learning the first-order relationship, and that they occur in parallel? Perhaps I just missed this, but I cannot help but wonder whether/how the psychological processes at play might change when first-order learning is impaired, so this would be greatly appreciated.

      As noted above, we have revised the general introduction (around lines 55-59) to clarify that the direct S1-shock and mediated S2-shock associations form in parallel. Hence, manipulations that disrupt first-order fear to the S1 (such as a BLA infusion of the NMDA receptor antagonist, DAP5) do not automatically disrupt the expression of sensory preconditioned fear to the S2.

      (2) Adding to the above - does the SOP or another theory predict serial vs parallel information flow from focal state to peripheral, or perhaps it is both to some extent?

      SOP predicts both serial and parallel processing of information in its focal and peripheral states. That is, some proportion of the elements that comprise a stimulus may decay from the focal state of attention to the periphery (serial processing); hence, at any given moment, the elements that comprise a stimulus can be represented in both focal and peripheral states (parallel processing).

      Given the nature of the designs and tools used in the present study (between-subject assessment of a DAP5 effect in the BLA or PRh), we selected parameters that would maximize the processing of the S2 and S1 stimuli in one or the other state of activation; hence the results of the present study. We are currently examining the joint processing of stimulus elements across focal and peripheral states using simultaneous recordings of activity in the BLA and PRh. These recordings are collected from rats trained in the different stages of a within-subject sensory preconditioning protocol. The present study created the basis for this work, which will be published separately in due course.

      (3) The organization of PRh vs BLA is nice and consistent across each figure, but I would suggest adding any kind of additional demarcation beyond the colors and text, maybe just more space between AB / CD. The figure text indicating PRh/BLA is a bit small.

      Thank you for the suggestion – we have added more space between the top and bottom panels of the figure.

      (4) Line 496 typo ..."in the BLA but not the BLA".

      Apologies for the type - this has been corrected.

      Reviewer #2 (Recommendations for the authors):

      I found the experiments to be extremely well-designed and the results convincing and exciting. The hypothesis of the focal and peripheral states of attention being encoded by BLA and PRh respectively, is enticing, yet as indicated in the public review, this does not seem to be the only possible interpretation. This is my only serious comment for the authors.

      (1) I think it would be worth reframing the article slightly to give credence to alternative hypotheses. Not to say that the authors' intriguing hypothesis shouldn't be an integral part of the introduction, but no alternatives are mentioned. In experiment 2, could the fact that S2 is already being a predictor of S1, not block new learning to S2? In the framework of stimulus-stimulus associations, there would be no surprise in the serial-compound stage of conditioning at the onset of S1. This may prevent direct learning of the S2-shock association within the BLA. This type of association may as well (S2 predicts S1, but it's omitted), which could support learning by S2. fall under the peripheral/focal theory, but I don't think it's necessary to frame this possibility in terms of a peripheral/focal theory. To build on this alternative interpretation, the absence of S1 in experiment 4 may induce a prediction error. The peripheral and focal states appear to correspond to A2 and A1 in SOP extremely well, and I think it would potentially add interest and support. If the authors do intend to make the paper a strong argument for their hypothesis, perhaps a few additional experiments may be introduced. If the novelty of S2 is critical for S2 not to be processed in a focal state during the serial compound stage, could pre-exposure of S2 alone allow for dependence of S2-shock on the PRh? Assuming this is what the authors would predict, this might disentangle the S-S theory mentioned above from the peripheral/focal theory. Or perhaps run an experiment S2-X in stage 1 and S2-S1-shock in stage 2? This said, I think the experiments are more than sufficient for an exciting paper as is, and I don't think running additional experiments is necessary. I would only argue for this if the authors make a hard claim about the peripheral/focal theory, as is the case for the way the paper is currently written.

      We appreciate the reviewer’s excellent point and suggestions. We have included an additional paragraph in the Discussion on page 24 (lines 566-594).  “An additional point to consider in relation to Experiments 3A, 3B, 4A and 4B is the level of surprise that rats experienced following presentations of the familiar S2 in stage 2. Specifically, in Experiments 3A and 3B, S2 was followed by the expected S1 (low surprise) and its conditioning required activation of NMDA receptors in the PRh and not the BLA. By contrast, in Experiments 4A and 4B, S2 was followed by omission of the expected S1 (high surprise) and its conditioning required activation of NMDA receptors in the BLA and not the PRh. This raises the possibility that surprise, or prediction error, also influences the way that S2 is processed in focal and peripheral states of attention. When prediction error is low, S2 is processed in the peripheral state of attention: hence, learning under these circumstances requires NMDA receptor activation in the PRh and not the BLA. By contrast, when prediction error is high, S2 is preserved in the focal state of attention: hence, learning under these circumstances requires NMDA receptor activation in the BLA and not the PRh. The impact of prediction error on the processing of S2 could be assessed using two types of designs. In the first design, rats are pre-exposed to S2-S1 pairings in stage 1 and this is followed by S2-S3-shock pairings in stage 2. The important feature of this design is that, in stage 2, the S2 is followed by surprise in omission of S1 and presentation of S3. Thus, if a large prediction error maintains processing of the familiar S2 in the BLA, we might expect that its conditioning in this design would require NMDA receptor activation in the BLA (in contrast to the results of Experiment 3B) and no longer require NMDA receptor activation in the PRh (in contrast to the results of Experiment 3A). In the second design, rats are pre-exposed to S2 alone in stage 1 and this is followed by S2-[trace]-shock pairings in stage 2. The important feature of this design is that, in stage 2, the S2 is not followed by the surprising omission of any stimulus. Thus, if a small prediction error shifts processing of the familiar S2 to the PRh, we might expect that its conditioning in this design would no longer require NMDA receptor activation in the BLA (in contrast to the results of Experiment 4B) but, instead, require NMDA receptor activation in the PRh (in contrast to the results of Experiment 4A). Future studies will use both designs to determine whether prediction error influences the processing of S2 in the focus versus periphery of attention and, thereby, whether learning about this stimulus requires NMDA receptor activation in the BLA or PRh.”

      (3) I was surprised the authors didn't frame their hypothesis more in terms of Wagner's SOP model. It was minimally mentioned in the introduction or the authors' theory if it were included more in the introduction. I was wondering whether the authors may have avoided this framing to avoid an expectation for modeling SOP in their design. If this were the case, I think the paper stands on its own without modeling, and at least for myself, a comparison to SOP would not require modeling of SOP. If this was the authors' concern for avoiding it, I would suggest to the authors that they need not be concerned about it.

      We appreciate the endorsement of Wagner’s SOP theory as a nice way of framing our results. We are currently working on a paper in which we use simulations to show how Wagner’s theory can accommodate the present findings as well as others in the literature on sensory preconditioning. For this reason, we have not changed the current paper in relation to this point.

    1. Reviewer #1 (Public review):

      I have to preface my evaluation with a disclosure that I lack the mathematical expertise to fully assess what seems to be the authors' main theoretical contribution. I am providing this assessment to the best of my ability, but I cannot substitute for a reviewer with more advanced mathematical/physical training.

      Summary:

      This paper describes a new theoretical framework for measuring parsimony preferences in human judgments. The authors derive four metrics that they associate with parsimony (dimensionality, boundary, volume, and robustness) and measure whether human adults are sensitive to these metrics. In two tasks, adults had to choose one of two flower beds which a statistical sample was generated from, with or without explicit instruction to choose the flower bed perceptually closest to the sample. The authors conduct extensive statistical analyses showing that humans are sensitive to most of the derived quantities, even when the instructions encouraged participants to choose only based on perceptual distance. The authors complement their study with a computational neural network model that learns to make judgments about the same stimuli with feedback. They show that the computational model is sensitive to the tasks communicated by feedback and only uses the parsimony-associated metrics when feedback trains it to do so.

      Strengths:

      (1) The paper derives and applies new mathematical quantities associated with parsimony. The mathematical rigor is very impressive and is much more extensive than in most other work in the field, where studies often adopt only one metric (such as the number of causes or parameters). These formal metrics can be very useful for the field.

      (2) The studies are preregistered, and the statistical analyses are strong.

      (3) The computational model complements the behavioral findings, showing that the derived quantities are not simply equivalent to maximum-likelihood inference in the task.

      (4) The speculations in the discussion section (e.g., the idea that human sensitivity is driven by the computational demands each metric requires) are intriguing and could usefully guide future work.

      Weaknesses:

      (1) The paper is very hard to understand. Many of the key details of the derived metrics are in the appendix, with very little accessible explanation in the main text. The figures helped me understand the metrics somewhat, although I am still not sure how some of them (such as boundary or robustness as measured here) are linked to parsimony. I understand that this is addressed by the derivations in the appendix, but as a computational cognitive scientist, I would have benefited from more accessible explanations. Important aspects of the human studies are also missing from the main text, such as the sample size for Experiment 2.

      (2) It is not fully clear whether the sensitivity of human participants to some of the quantities convincingly reported here actually means that participants preferred shapes according to the corresponding aspect of parsimony. The title and framing suggest that parsimony "guides" human decision-making, which may lead readers to conclude that humans prefer more parsimonious shapes. I am not sure the sensitivity findings alone support this framing, but it might just be my misunderstanding of the analyses.

      (3) The stimulus set included only four combinations of shapes, each designed to diagnostically target one of the theoretical quantities. It is unclear whether the results are robust or specific to these particular 4 stimuli.

      (4) The study is framed as measuring "decision-making," but the task resembles statistical inference (e.g., which shape generated the data) or perceptual judgment. This is a minor point since "decision-making" is not well defined in the literature, yet the current framing in the title gave me the initial impression that humans would be making preference choices and learning about them over time with feedback.

    2. Author response:

      Reviewer #1 (Public review)

      I have to preface my evaluation with a disclosure that I lack the mathematical expertise to fully assess what seems to be the authors' main theoretical contribution. I am providing this assessment to the best of my ability, but I cannot substitute for a reviewer with more advanced mathematical/physical training.

      Summary:

      This paper describes a new theoretical framework for measuring parsimony preferences in human judgments. The authors derive four metrics that they associate with parsimony (dimensionality, boundary, volume, and robustness) and measure whether human adults are sensitive to these metrics. In two tasks, adults had to choose one of two flower beds which a statistical sample was generated from, with or without explicit instruction to choose the flower bed perceptually closest to the sample. The authors conduct extensive statistical analyses showing that humans are sensitive to most of the derived quantities, even when the instructions encouraged participants to choose only based on perceptual distance. The authors complement their study with a computational neural network model that learns to make judgments about the same stimuli with feedback. They show that the computational model is sensitive to the tasks communicated by feedback and only uses the parsimony-associated metrics when feedback trains it to do so.

      Strengths:

      (1)  The paper derives and applies new mathematical quantities associated with parsimony. The mathematical rigor is very impressive and is much more extensive than in most other work in the field, where studies often adopt only one metric (such as the number of causes or parameters). These formal metrics can be very useful for the field.

      (2)  The studies are preregistered, and the statistical analyses are strong.

      (3)  The computational model complements the behavioral findings, showing that the derived quantities are not simply equivalent to maximum-likelihood inference in the task.

      (4)  The speculations in the discussion section (e.g., the idea that human sensitivity is driven by the computational demands each metric requires) are intriguing and could usefully guide future work.

      Weaknesses:

      (1) The paper is very hard to understand. Many of the key details of the derived metrics are in the appendix, with very little accessible explanation in the main text. The figures helped me understand the metrics somewhat, although I am still not sure how some of them (such as boundary or robustness as measured here) are linked to parsimony. I understand that this is addressed by the derivations in the appendix, but as a computational cognitive scientist, I would have benefited from more accessible explanations. Important aspects of the human studies are also missing from the main text, such as the sample size for Experiment 2.

      (2) It is not fully clear whether the sensitivity of human participants to some of the quantities convincingly reported here actually means that participants preferred shapes according to the corresponding aspect of parsimony. The title and framing suggest that parsimony "guides" human decision-making, which may lead readers to conclude that humans prefer more parsimonious shapes. I am not sure the sensitivity findings alone support this framing, but it might just be my misunderstanding of the analyses.

      (3) The stimulus set included only four combinations of shapes, each designed to diagnostically target one of the theoretical quantities. It is unclear whether the results are robust or specific to these particular 4 stimuli.

      (4) The study is framed as measuring "decision-making," but the task resembles statistical inference (e.g., which shape generated the data) or perceptual judgment. This is a minor point since "decision-making" is not well defined in the literature, yet the current framing in the title gave me the initial impression that humans would be making preference choices and learning about them over time with feedback.

      We are grateful for the supportive comments highlighting the rigor of our experimental design and data analysis. The Reviewer lists four points under “weaknesses”, to which we reply below. 

      (1)  The paper is very hard to understand

      In the revised version of the paper, we will expand the main text to include a more detailed and intuitive description of the terms of the Fisher Information Approximation, in particular clarifying the interpretation of robustness and boundary as parsimony. We also will include more details that are now given only in Methods, such as the sample size for the second experiment. 

      (2) Sensitivity of human participants 

      We do argue, and believe, that our data show that people tend to prefer simpler shapes. However, giving a well-posed definition of "preference" in this context turns out to be nontrivial.

      At the very least, any statement such as "people prefer shape A over B" should be qualified with something like “when the distance of the data from both shapes is the same.” In other words, one should control for goodness-of-fit. Even before making any reference to our behavioral model, this phenomenon (a preference for the simpler model when goodness of fit is matched between models) is visible in Figure 3a, where the effective decision boundary used by human participants is closer to the more complex model than the cyan line representing the locus of points with equal goodness of fit under the two models (or equivalently, with the same Euclidean distance from the two shapes). The goal of our theory and our behavioral model is precisely to systematize this sort of control, extending it beyond just goodness-of-fit and allowing us to control simultaneously for multiple features of model complexity that may affect human behavior in different ways. In other words, it allows us not only to ask whether people prefer shape A over B after controlling for the distance of the data to the shapes, but also to understand to what extent this preference is driven by important geometrical features such as dimensionality, volume, curvature, and boundaries of the shapes. More specifically, and importantly, our theory makes it possible to measure the strength of the preference, rather than merely asserting its existence. In our modeling framework, the existence of a preference for simpler shapes is captured by the fact that the estimated sensitivities to the complexity penalties are positive (and although they differ in magnitude, all are statistically reliable).

      (3) Generalization to different shapes  

      Thank you for bringing up this important topic. First, note that while dimensionality and volume are global properties of models and only take two possible values in our human tasks, the boundary and robustness penalties depend on the model and on the data and therefore assume a continuum of values through the tasks (note also that the boundary penalty is relevant for all task types, not just the one designed specifically to study it, because all models except the zero-dimensional dot have boundaries). Therefore, our experimental setting is less restrictive of what it may seem, because it explores a range of possible values for two of the four model features. However, we agree that it would be interesting to repeat our experiment with a broader range of models, perhaps allowing their dimensionality and volume to vary more. In the same spirit, it would be interesting to study the dependence of human behavior on the amount of available data. We believe that these are all excellent ideas for further study that exceed the scope of the present paper. We will include these important points in a revised Discussion. 

      (4) Usage of “decision making” vs “perceptual judgment”

      Thank you. We will clarify better in the text that our usage of “decision making” overlaps with the idea of a perceptual judgment and that our experiments do not tackle sequential aspects of repeated decisions. 

      Reviewer #2 (Public review):

      This manuscript presents a sophisticated investigation into the computational mechanisms underlying human decision-making, and it presents evidence for a preference for simpler explanations (Occam's razor). The authors dissect the simplicity bias into four different components, and they design experiments to target each of them by presenting choices whose underlying models differ only in one of these components. In the learning tasks, participants must infer a "law" (a logical rule) from observed data in a way that operationalizes the process of scientific reasoning in a controlled laboratory setting. The tasks are complex enough to be engaging but simple enough to allow for precise computational modeling.

      As a further novel feature, authors derive a further term in the expansion of the logevidence, which arises from boundary terms. This is combined with a choice model, which is the one that is tested in experiments. Experiments are run, but with humans and with artificial intelligence agents, showing that humans have an enhanced preference for simplicity as compared to artificial neural networks.

      Overall, the work is well written, interesting, and timely, bridging concepts in statistical inference and human decision making. Although technical details are rather elaborate, my understanding is that they represent the state of the art.

      I have only one main comment that I think deserves more comments. Computing the complexity penalty of models may be hard. It is unlikely that humans can perform such a calculation on the fly. As authors discuss in the final section, while the dimensionality term may be easier to compute, others (e.g., the volume term, which requires an integral) may be considerably harder to compute (it is true that they should be computed once and for all for each task, but still...). I wonder whether the sensitivity of human decision making with reference to the different terms is so different, and in particular whether it aligns with computational simplicity, or with the possibility of approximating each term by simple heuristics. Indeed, the sensitivity to the volume term is significantly and systematically lower than that of other terms. I wonder whether this relation could be made more quantitative using neural networks, using as a proxy of computational hardness the number of samples needed to reach a given error level in learning each of these terms.

      Thank you. The computational complexity associated with calculating the different terms and its potential connection to human sensitivity to the terms is an intriguing topic. As we hinted at in the discussion, we agree with the reviewer that this is a natural candidate for further research, which likely deserves its own study and exceeds the scope of the present paper. 

      As a minor aside, at least for the present task the volume term may not be that hard to compute, because it can be expressed with the number of distinguishable probability distributions in the model (Balasubramanian 1996). Given the nature of our task, where noise is Gaussian, isotropic and with known variance, the geometry of the model is actually the Euclidean geometry of the plane, and the volume is simply the (log of the) length of the line that represents the one-dimensional models, measured in units of the standard deviation of the noise.

      Reviewer #3 (Public review):

      Summary:

      This is a very interesting paper that documents how humans use a variety of factors that penalize model complexity and integrate over a possible set of parameters within each model. By comparison, trained neural networks also use these biases, but only on tasks where model selection was part of the reward structure. In the situation where training emphasizes maximum-likelihood decisions, only neural networks, but not humans, were able to adapt their decision-making. Humans continue to use model integration simplicity biases.

      Strengths:

      This study used a pre-registered plan for analyzing human data, which exceeds the standards compared to other current studies.

      The results are technically correct.

      Weaknesses:

      The presentation of the results could be improved.

      We thank the reviewer for their appreciation of our experimental design and methodology, and for pointing out (in the separate "recommendations to authors") a few passages of the paper where the presentation could be improved. We will clarify these passages in the revision.

    1. Reviewer #1 (Public review):

      Summary:

      In the present manuscript, de Bos and Kutay investigate the functional implications of persistent microtubule-ER contacts as cells go through mitosis. To do so, they resorted to investigating phosphorylation mutants of the ER-Microtubule crosslinker Climp63. They found that phosphodeficient Climp63 mutants induce a severe SAC-dependent mitotic delay after normal chromosome alignment, with an impressive mitotic index of approximately 75%. Strikingly, this was often associated with massive nuclear fragmentation into up to 30 micronuclei that are able to recruit both core and non-core nuclear envelope components. One particular residue (S17) that is phosphorylated by Cdk1 seems to account for most, if not all, these phenotypes. Furthermore, the authors use the impact on mitosis as an indirect way to map the microtubule binding domain of Climp63, which has remained controversial, and found that it is mostly restricted to the N-terminal 28 residues of Climp63. Of note, despite the strong impact on mitosis, persistent microtubule-ER contacts did not affect the distribution of other organelles during mitosis, such as mitochondria or lysosomes.

      Strengths:

      Overall, this work provides important mechanistic insight into the functional implications of ER-microtubule network remodelling during mitosis and should be of great interest to a vast readership of cell biologists.

      Weaknesses:

      Some of the key findings appear somewhat preliminary and would be worth exploring further to substantiate some of the claims and clarify the respective impact on mitosis and nuclear envelope reassembly on the resulting micronuclei.

      The following suggestions would significantly clarify some key points:

      (1) The striking increase in mitotic index in cells expressing the Climp63 phosphodefective mutant, together with their live cell imaging data indicating extensive mitotic delays that can be relieved by SAC inhibition, suggests that SAC silencing is significantly delayed or even impossible to achieve. The fact that most chromosomes align in 12 min, irrespective of the expression of the Climp63 phosphodefective mutant, suggests that initial microtubule-kinetochore interactions are not compromised, but maybe cannot be stably maintained. Alternatively, the stripping of SAC proteins from kinetochores by dynein along attached microtubules might be compromised, despite normal microtubule-kinetochore attachments. The authors allude to both these possibilities, but unfortunately, they never really test them. This could easily be done by immunofluorescence with a Mad1 or c-Mad2 antibody to inspect which fraction of kinetochores (co-stained with a constitutive kinetochore marker, such as CENP-A or CENP-C) are positive for these SAC proteins. If just a small fraction, then the stability of some attachments is likely the cause. If most/all kinetochores retain Mad1/c-Mad2, then it is probably an issue of silencing the SAC.

      (2) The authors use the increase in mitotic index (H3 S10 phosphorylation levels) as a readout for the MT binding efficiency of Climp63 and respective mutants. Although suggestive, this is fairly indirect and requires additional confirmation. For example, the authors could perform basic immunofluorescence in fixed cells to inspect co-localization of Climp63 (and its mutants) with microtubules.

      (3) The authors refer in the discussion that the striking nuclear fragmentation seen upon mitotic exit of cells expressing Climp63 phosphodefective mutant has not been reported before, and yet it is strikingly similar to what has been previously observed in cells treated with taxol (they cite Samwer et al. 2017, but they might elect to cite also Mitchison et al., Open Biol, 2017 and most relevantly Jordan et al., Cancer Res, 1996). This striking similarity and given the extensive mitotic delay observed in the Climp63 phosphodefective mutant, it is tempting to speculate that these cells are undergoing mitotic slippage (i.e., cells exit mitosis without ever satisfying the SAC) because they are unable to silence/satisfy the SAC. Indeed, the scattered micronuclei morphology has also been observed in cells undergoing mitotic slippage (e.g., Brito and Rieder, Curr Biol., 2006). The experiment suggested in point #1 should also shed light on this problem. The authors might want to consider discussing this possible explanation to interpret the observed phenotypes.

      (4) One of the most significant implications of the findings reported in this paper is that microtubule proximity does not seem to impact the assembly of either core or non-core nuclear envelope proteins on micronuclei (that possibly form due to mitotic slippage, rather than normal anaphase). These results challenge some models explaining nuclear envelope defects in micronuclei derived from lagging chromosomes due to the proximity of microtubules, and, as the authors point out at the very end, other reasons might underlie these defects. Along this line, the authors might elect to cite Afonso et al. Science, 2014, and Orr et al., Cell Reports, 2022, who provide evidence that a spindle midzone-based Aurora B gradient, rather than microtubules per se, underlie the nuclear envelope defects commonly seen in micronuclei derived from lagging chromosomes during anaphase.

    2. Reviewer #2 (Public review):

      Mitotic phosphorylation of the ER-microtubule linker CLIMP63 was discovered decades ago and was shown to release CLIMP63 from microtubules. Here, the authors describe for the first time the significance of CLIMP63 phosphorylation for mitotic division in cells. Expression of non-phosphorylatable CLIMP63 led to a massive re-localization of ER into the area of the mitotic spindle. This was not unexpected, as another ER-microtubule linker, STIM1, is phosphorylated during mitosis to release it from microtubules, and unphosphorylatable STIM1 also leads to an invasion of the ER into the spindle. The authors map CLIMP63's microtubule-binding domain and define S17 as the critical residue that needs to be phosphorylated for release from microtubules and as a target of Cdk1, albeit with an indirect assay that is based on the ability of overexpressed mutants to disrupt mitosis. The authors further demonstrate that aberrant, microtubule-tethered membranes in the spindle disrupt spindle function. This is in line with the group's prior findings that chromosome-tethered membranes lead to severe chromosome segregation defects. Cells overexpressing phospho-deficient CLIMP63 arrested in prometaphase with an active checkpoint. When these cells were forced to exit mitosis, a large number of micronuclei formed. Interestingly, these micronuclei had different compositions and properties from previously described ones, suggesting that there are diverse paths for a cell to become multinucleated. Lastly, the authors asked whether mitochondria and lysosomes depend on ER for their distribution in mitotic cells. However, the position of these other organelles was unchanged in cells in which ER was re-localized due to the overexpression of phospho-deficient CLIMP63. This is an interesting observation in the context of how the interior organisation of mitotic cells is achieved.

      Suggestions:

      (1) The authors should confirm the mapping of the microtubule-binding domain by more direct assays, such as microtubule co-pelleting or proximity ligation assays.

      (2) The authors should clarify why they performed phenotypic studies and live microscopy experiments (Figures 4 and 5) using the CLIMP63(3A) mutant, despite knowing that the relevant phosphorylation site was S17. Were the phenotypes different for S17A versus the triple mutant?

    1. Reviewer #2 (Public review):

      Summary:

      This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important, and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues that require substantial improvement. In several instances, the authors conclude that there are no sex-associated differences for specific parameters, yet inspection of the data suggests visible trends that are not properly quantified. The authors should either apply more appropriate statistical approaches to test these trends or provide stronger evidence that the observed differences are not significant. In other analyses, the authors report the differences between sexes based on a pulled analysis of TCR sequences from all the donors, which could result in differences driven by one or two single donors (e.g., having particular HLA variants) rather than reflect sex-related differences.

      Strengths:

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

      Weaknesses:

      Major:

      (1) The authors state that there is "no clear separation in PCA for both TRA and TRB across all subsets." However, Figure 2 shows a visible separation for DP thymocytes (especially TRA, and to a lesser degree TRB) and also for TRA of Tregs. This apparent structure should be acknowledged and discussed rather than dismissed.

      (2) Supplementary Figures 2-5 involve many comparisons, yet no correction for multiple testing appears to be applied. After appropriate correction, all the reported differences would likely lose significance. These analyses must be re-evaluated with proper multiple-testing correction, and apparent differences should be tested for reproducibility in an external dataset (for example, the pediatric thymus and peripheral blood repertoires later used for motif validation).

      (3) Supplementary Figure 6 suggests that women consistently show higher Rényi entropies across all subsets. Although individual p-values are borderline, the consistent direction of change is notable. The authors should apply an integrated statistical test across subsets (for example, a mixed-effects model) to determine whether there is an overall significant trend toward higher diversity in females.

      (4) Figures 4B and S8 clearly indicate enrichment of hydrophobic residues in female CDR3s for both TRA and TRB (excluding alanine, which is not strongly hydrophobic). Because CDR3 hydrophobicity has been linked to increased cross-reactivity and self-reactivity (see, e.g., Stadinski et al., Nat Immunol 2016), this observation is biologically meaningful and consistent with higher autoimmune susceptibility in females.

      (5) The majority of "hundreds of sex-specific motifs" are probably donor-specific motifs confounded by HLA restriction. This interpretation is supported by the failure to validate motifs in external datasets (pediatric thymus, peripheral blood). The authors should restrict analysis to public motifs (shared across multiple donors) and report the number of donors contributing to each motif.

      (6) When comparing TCRs to VDJdb or other databases, it is critical to consider HLA restriction. Only database matches corresponding to epitopes that can be presented by the donor's HLA should be counted. The authors must either perform HLA typing or explicitly discuss this limitation and how it affects their conclusions.

      (7) Although the age distributions of male and female donors are similar, the key question is whether HLA alleles are similarly distributed. If women in the cohort happen to carry autoimmune-associated alleles more often, this alone could explain observed repertoire differences. HLA typing and HLA comparison between sexes are therefore essential.

      (8) In some analyses (e.g., Figures 8C-D) data are shown per donor, while others (e.g., Fig. 8A-B) pool all sequences. This inconsistency is concerning. The apparent enrichment of autoimmune or bacterial specificities in females could be driven by one or two donors with particular HLAs. All analyses should display donor-level values, not pooled data.

      (9) The reported enrichment of matches to certain specificities relative to the database composition is conceptually problematic. Because the reference database has an arbitrary distribution of epitopes, enrichment relative to it lacks biological meaning. HLA distribution in the studied patients and HLA restrictions of antigens in the database could be completely different, which could alone explain enrichment and depletions for particular specificities. Moreover, differences in Pgen distributions across epitopes can produce apparent enrichment artifacts. Exact matches typically correspond to high-Pgen "public" sequences; thus, the enrichment analysis may simply reflect variation in Pgen of specific TCRs (i.e., fraction of high-Pgen TCRs) across epitopes rather than true selection. Consequently, statements such as "We observed a significant enrichment of unique TRB CDR3aa sequences specific to self-antigens" should be removed.

      (10) The overrepresentation of self-specific TCRs in females is the manuscript's most interesting finding, yet it is not described in detail. The authors should list the corresponding self-antigens, indicate which autoimmune diseases they relate to, and show per-donor distributions of these matches.

      (11) The concept of polyspecificity is controversial. The authors should clearly explain how polyspecific TCRs were defined in this study and highlight that the experimental evidence supporting true polyspecificity is very limited (e.g., just a single TCR from Figure 5 from Quiniou et al.).

      Minor:

      (1) Clarify why the Pgen model was used only for DP and CD8 subsets and not for others.

      (2) The Methods section should define what a "high sequence reliability score" is and describe precisely how the "harmonized" database was constructed.

      (3) The statement "we generated 20,000 permuted mixed-sex groups" is unclear. It is not evident how this permutation corrects for individual variation or sex bias. A more appropriate approach would be to train the Pgen model separately for each individual's nonproductive sequences (if the number of sequences is large enough).

    1. Reviewer #1 (Public review):

      Summary:

      The study from Wu and Turrigiano investigates how disruption of taste coding in a mouse model of autism spectrum disorders (ASDs) affects aversive learning in the context of a conditioned taste aversion (CTA) paradigm. The experiments combine 2-photon calcium imaging of neurons in the gustatory portion of the anterior insular cortex (i.e., gustatory cortex) with behavioral training and testing. The authors rely on Shank3 knockout mice as a model for ASDs. The authors found that Shank3 mice learn CTA more slowly and extinguish the memory more rapidly than control subjects. Calcium imaging identified impairments in taste-evoked activity associated with memory encoding and extinction. During memory encoding, the authors found less suppressed neuronal activity and increased correlated variability in Shank3 mice compared to controls. During extinction, they observed a faster loss of taste selectivity and degradation of taste discriminability in mutants compared to controls.

      Strengths:

      This is a well-written manuscript that presents interesting findings. The results on the learning and extinction deficits in Shank3 mice are of particular interest. Analyses of neural activity are well conducted and provide important information on the type of impaired cortical activity that may correlate with behavioral deficits.

      Weaknesses:

      (1) The experiments rely on three groups: CS-only WT, CTA WT, and CTA KO. Can the authors provide a rationale for not having a CS-only KO group?

      (2) The authors design an effective behavioral paradigm comparing consumption of water and saccharin and tracking extinction (Figure 3). This paradigm shows differences in licking across distinct behavioral conditions. For instance, during T1, licking to water strongly differs from licking to saccharin for both WT and KO. During T2, licking to water strongly differs from licking to saccharin only for WT (much less for KO), and licking to saccharin in WT differs from that in KO. These differences in taste sampling across conditions could contribute to some of the effects on neural activity and discriminability reported in Figures 5 and 6. That is sucrose and water trials may be highly discriminable because in one case the mouse licks and in the other it does not (or licks much less). The author may want to address this issue.

      (3) Are there any omission trials following CTA? If so, they should be quantified and reported. How are the omission trials treated with regard to the analyses?

      (4) The authors describe the extinction paradigm as "alternative choice". In decision-making, alternative choice paradigms typically require 2 lateral spouts to report decisions following the sampling from a central spout. To avoid confusion, the authors may want to define their paradigm as alternative sampling.

      (5) Figure 4 reports that CTA increases the proportion of neurons that consistently respond to saccharin and water across days. While the saccharin result could be an effect of aversive learning, it is less clear why the phenomenon would generalize to water as well. Can the authors provide an explanation?

      (6) The recordings are performed in the part of the anterior insular cortex that is typically defined as "gustatory cortex" (GC). Given the functional heterogeneity of the anterior insular cortex (AIC) and given that the authors do not sample all of the anteroposterior extent of AIC, I would suggest being more explicit about their positioning in GC. Also, some citations (e.g., Gogolla et al, 2014) refer to the posterior insular cortex, which is considered more inherently multimodal than GC. GC multimodality is typically associative in nature, as only a few neurons respond to sound and light in naïve animals.

      (7) It would be useful to add summary figures showing the extent of viral spread as well as GRIN lens placement.

      (8) I encourage the authors to add Ns every time percentages are reported. How many neurons have been recorded in each condition? Can the authors provide the average number of neurons recorded per session and per animal?

      (9) It looks like some animals learned more than others (Figure 1E or Figure 3C). Is it possible to compare neural activity across animals that showed different degrees of learning?

    2. Reviewer #2 (Public review):

      Wu and Turrigiano investigated how cortical taste coding during conditioned taste aversion (CTA) learning is affected in Shank3 knockout (KO) mice, a model of monogenic ASD. Using longitudinal two-photon calcium imaging of AIC neurons, the authors show that Shank3 KO mice exhibit reduced suppression of activity in a subset of neurons and a higher correlated variability in neural activity. This is accompanied by slower learning and faster extinction of aversive taste memories. These results suggest that Shank3 loss compromises the flexibility and stability of cortical representations underlying adaptive behaviour.

      Major strengths:

      (1) Conceptual significance: The study connects a molecular ASD risk gene (Shank3) to flexible sensory encoding, bridging genetics, systems neuroscience, and behaviour.

      (2) Technical rigour: Longitudinal calcium imaging with cell-registration across learning and extinction sessions is technically demanding and well-executed.

      (3) Behavioural paradigm: The use of both acquisition and extinction paradigms provides a more nuanced picture of learning dynamics.

      (4) Analyses: Correlated variability, discriminability indices, and population decoding analyses are robust and appropriate for addressing behavioural and network-level coding changes.

      Major weaknesses:

      (1) Causality: The paper infers that increased correlated variability causes learning deficits, but no causal tests (e.g., optogenetic modulation of inhibition or interneuron rescue) are presented to confirm this.

      (2) Behavioural scope: The study focuses exclusively on taste aversion; generalisation to other flexible learning paradigms (e.g., reversal or probabilistic tasks) is not addressed.

      (3) Mechanistic insights: While providing interesting findings of altered sensory perception and extinction of learning-related signals in AIC, it offered nearly no mechanistic insights. This makes the interpretation, especially on how generalisable these findings are, difficult. Also, different reported findings are "potentially" connected, but the exact relation between increased correlated variability and faster loss of taste selectivity cannot be assessed.

    3. Reviewer #3 (Public review):

      In this study, Wu & Turrigiano investigate an ethologically relevant form of associative learning (conditioned taste aversion - CTA) and its extinction in the Shank3 KO mouse model of ASD. They also examine the underlying circuits in the anterior insular cortex (AIC) simultaneously, using two-photon calcium imaging through a GRIN lens. They report that Shank3 KO mice learn CTA slower and suggest that this is mediated by a reduction in tastant-stimulus activity suppression of AIC neurons and a reduced signal-to-noise ratio due to increased noise correlations in AIC neurons. Interestingly, once Shank3 KO mice acquire CTA, they extinguish the aversive memory more rapidly than wild-type mice. This accelerated extinction is accompanied by a faster loss of neuronal and population-level taste selectivity and coding in the AIC compared to WT mice.

      This is an important study that uses in vivo methods to assess circuit dysfunction in a mouse model of ASD, related to sensory perception valence (in this case, taste). The study is well executed, the data are of high quality, and the analytical procedures are detailed. Furthermore, the behavioural paradigm is well thought out, particularly the approach for assessing extinction through repeated retrieval sessions (T1-T5), which effectively tests discrimination between saccharin and water rather than relying solely on lick counts or total consumption as a measure of extinction. Finally, the statistical tests used are appropriate and justified.

      There is, however, a missing link between the behavioural findings and the underlying mechanisms. More specifically:

      (1) The authors don't make a causal link between the behaviour and AIC neurophysiology, both the percentage of suppressed cells and the coactivity measurements. For the % of suppressed cells, it seems that both WT and KO cells are suppressed in the transition between CST1 and CST2 (Figure 1L), yet only the WT mice exhibit CTA (at least by CST2). For the taste-elicited coactivity measure, it seems that there is an increase in coactivity from CST1 to CST2 in WT (Figure 2C - blue, although not statistically tested?), but persistently higher coactivity in KO. Is this change of coactivity in WT important for the expression of CTA? Plotting behavioral performance (from Figure 1G) against coactivity (from Figure 2C) for each animal would be informative.

      (2) Shank3 KO cells already show an increase in baseline coactivity (Figure 2- figure supplement 1), and the authors never examine CS-only responses in the KO group, therefore making it difficult to determine whether elevated coactivity and noise correlations reflect a generalized AIC abnormality in Shank3 KOs (perhaps through impaired PV-mediated inhibition in insular cortex - Gogolla et al, 2014) that is not directly responsible/related to CTA?

      (3) How do the authors interpret the large range of lick ratios (Figure 1G) for WT (almost bi-modal distribution)? Is there a within-subject correlation with any of the neurophysiological measurements to suggest a relationship between AIC neurophysiology and behavioural expression of CTA?

      (4) Indeed, CTA appears to be successfully achieved for Shank3 KO mice delayed by 1 day, as the level of saccharin aversion during the first retrieval session (T1) is comparable between Shank3 KO and WTs. In this context, not extending the first part of the paradigm to include CST3 seems to be a missed opportunity. Doing so would have allowed for within-cell and within-subject comparison of taste-elicited pairwise correlation across the learning and to investigate the neural mechanism of delayed extinction in KOs more effectively.

      (5) How to interpret Figure 5F: Absolute discriminability is lower for T5 for CTA WT and CTA KO compared to CS-only? Why would AIC neurons have less information on taste identity by the end of extinction than during the unconditioned (CS-only) condition? And if that is the case, how is decoding accuracy in Figure 6C higher in T5 for CTA WT vs CS-only?

    1. Concerning elapid venoms, the low immunogenicity of 3FTXs makes generating homogeneous antivenoms difficult [2]. The elapid 3FTXs are peptides with associated non-enzymatic activity, ranging from 60 to 85 amino acids. They contain eight highly conserved cysteine residues that form 4 disulfide bridges that stabilize their hydrophobic core, from which emerge three loops that bear 3–5 antiparallel beta-strands. Besides, some 3FTXs also contain an extra pair of cysteine residues that forms one more disulfide bridge located at one of the loops. 3FTXs encompass many proteins with diverse functions like cytotoxicity (e.g. cardiotoxins) [3], [4] and neurotoxicity (e.g. α-neurotoxins, fasciculins, muscarinic toxins, L-type calcium channel blockers) [5], [6], [7]. Snake venom composition from elapids and from their related colubrids show that PLA2 and 3FTXs are not only the most abundant protein families [8], [9], but also, the most toxic ones [10].

      what is 3ftx? - they are the most abundent in elapid snakes

    1. In this current context of scientific explosionat all levels (although the exponential growth is not thesame in all scientific disciplines), we find the advent ofnew disciplines and subdisciplines that help us toclassify the areas of knowledge.Thus, to order this informative explosion, itwas convenient to establish a classification system forthe different areas of study. The UNESCO InternationalNomenclature for the fields of Science and Technologywas proposed in 1973 and 1974 by the Science andTechnology Policy Divisions of Science andTechnology of UNESCO and adopted by the Scientificand Technical Research Advisory Commission. It is aknowledge classification system widely used in themanagement of research projects and doctoral theses.And, as a sign that science always brings newhorizons to knowledge, new actors are alwaysappearing in this classification system.In the field that occupies us, however, we findourselves with a great absence. The "Astrobiology",does not appear in the listings of UNESCO. But yes, wefind in them the term "Exobiology" [2, 3]. This "partial"absence denotes the novelty that is still today toscientifically consider the study of life outside Earth.Indeed, until very recently and by manyscientists, it was considered "Exobiology" or"Astrobiology" (which we will consider synonyms), ascience without an area of study. This was especiallytrue until 1995, when Michel Mayor and Didier Quelozdiscovered the first extrasolar planet, 51 Pegasi b.Fortunately, today things are beginning to change andmore and more scientists believe that life will be aubiquitous phenomenon, which will occur anywhere inthe universe where the conditions are right for it.Life will then be an epiphenomenon, an eventthat has no choice but to occur, as soon as thecomplexity of the chemical organization of matterreaches the critical point of interaction between thetrace elements, the essential elements for life. At thebase of it we will find carbon, hydrogen, oxygen,nitrogen, phosphorus and sulphur.As life will be a ubiquitous phenomenon,finally today we already intuit that not even a planet isnecessary for life to prosper, and that life could bemaintained in interstellar space, without planetarysubstratum. But before continuing, it is convenient tofix some definitions.The debate on what is life? has occupied allgenerations of thinkers. It is a very difficult concept todefine. Currently there is consensus in affirming thatlife is a self-contained, autopoietic chemical system(self-sufficient exchanging energy with theenvironment in which it is located), capable ofreproducing itself and experiencing evolution [4]. It isa broad definition. In it the minerals could fit, and eventhe stars themselves, as we will see later.So, in view of the complexity of theknowledge that we are slowly acquiring about theuniverse, and given the challenges posed by thepossibility of assuming that life will be found virtuallyanywhere, it is convenient to establish a series of ethicalvalues that allow a positive integration in the culturalbaggage of society of the new limits of knowledge thatscience gives us.For this reason, a "Philosophy of Science" -code UNESCO 7205.01- was established, under whichsince the 80s we can find the "Philosophy of Biology".Before delving into the Philosophy ofAstrobiology, we will give its definition, based on theconcepts of "Philosophy" and "Astrobiology".

      Authors argue that the growth of the sciences in human culture has driven the need to expand the ontology of scientific categories. As astrobiology matures, more complex studies across disciplines are needed to address evolving areas - e.g., exobiology, philosophy of astrobiology, or my own term exoastronomy which I coined in 2018. These are missing from the UNESCO International nomenclature as of 2025/2026.

    1. Synthèse du Webinaire : Utiliser Canva pour les Actions Associatives

      Résumé Exécutif

      Ce document de synthèse résume les points clés et les enseignements du webinaire "Apprendre à utiliser Canva pour vos actions associatives", organisé par Solidatech.

      La session, animée par des expertes de Canva, visait à doter les associations des connaissances nécessaires pour utiliser efficacement la plateforme Canva dans leurs communications, avec un focus particulier sur la création d'affiches pour le recrutement de bénévoles.

      Les principaux points à retenir sont les suivants :

      1. Canva Solidaire : L'information la plus cruciale pour les associations est l'existence de "Canva Solidaire", une offre qui donne un accès gratuit et complet à Canva Pro pour les associations loi 1901 éligibles, permettant d'intégrer jusqu'à 10 membres d'équipe.

      2. Principes de Conception Graphique : Une bonne conception d'affiche repose sur cinq piliers fondamentaux : la hiérarchisation de l'information, le branding (identité visuelle), la visibilité (impact visuel), la lisibilité (confort de lecture) et la composition (équilibre des éléments).

      3. Fonctionnalités Clés : La plateforme Canva est un outil tout-en-un puissant et intuitif. Les fonctionnalités essentielles présentées incluent l'utilisation de modèles (templates), la personnalisation via le "Kit d'Identité Visuelle" (marque), la manipulation des calques, et la déclinaison rapide des créations pour différents formats (réseaux sociaux, impression).

      4. Intelligence Artificielle (IA) : Canva intègre des outils d'IA accessibles ("Studio Magique") qui permettent de réaliser des tâches complexes simplement, comme la suppression ou la génération d'arrière-plans, la capture de texte sur une image aplatie, et même la génération de code HTML pour des formulaires.

      5. Ressources et Formation : Les participants ont été encouragés à explorer la Canva Design School, une section de la plateforme offrant des cours et tutoriels gratuits.

      De plus, pour trouver des modèles spécifiquement créés par des graphistes français, il est conseillé d'utiliser le mot-clé de recherche "FR association".

      En conclusion, le webinaire a positionné Canva comme un allié stratégique pour les associations, leur permettant de professionnaliser leur communication visuelle avec des ressources limitées, tout en favorisant la collaboration et l'efficacité.

      --------------------------------------------------------------------------------

      1. Introduction et Contexte du Webinaire

      Le webinaire a été organisé par Solidatech pour accompagner les associations dans leur transformation numérique. L'événement a accueilli deux intervenantes expertes de la communauté Canva pour présenter la plateforme et ses applications concrètes pour le secteur associatif.

      Organisateur : Solidatech, représenté par Camille.

      Intervenantes Canva :

      Anne-Gaël : Community Manager de la communauté des "Créators" (graphistes créant les modèles pour la bibliothèque Canva) et des "Édus Créateurs" (enseignants créant du contenu pédagogique).    ◦ Alisée : Directrice artistique, Brand Consultante et ambassadrice Canva, spécialisée dans l'accompagnement des porteurs de projet et des associations.

      Thème Principal : Utiliser Canva pour créer des supports de communication, spécifiquement des affiches de recrutement de bénévoles, en lien avec la Journée Internationale des Bénévoles.

      2. Présentation des Organisations

      Solidatech

      Solidatech est une coopérative d'utilité sociale et environnementale dont la mission est d'aider les associations à renforcer leur impact grâce au numérique. L'organisation accompagne plus de 45 000 associations. Son action repose sur deux piliers :

      1. Réaliser des économies :

      Logiciels : Identification de solutions gratuites ou obtention de remises sur des logiciels payants.    ◦ Matériel : Fourniture de matériel reconditionné (par leur coopérative d'insertion Les Ateliers du Bocage) et de matériel neuf (en partenariat avec Dell).

      2. Monter en compétence sur le numérique :

      Formation : Organisme de formation certifié proposant des formations sur les enjeux du numérique et sur des outils spécifiques.    ◦ Diagnostic : Outil de diagnostic numérique gratuit pour évaluer la maturité numérique d'une association.    ◦ Ressources : Mise à disposition de contenus gratuits (articles, newsletters, webinaires).

      Canva

      Canva est une entreprise australienne fondée en 2013 par Mélanie Perkins avec la mission de "donner au monde le pouvoir de créer" (Empower the world to design). L'objectif est de démocratiser le design en rendant la création visuelle simple et accessible à tous, notamment grâce à un système de glisser-déposer.

      Indicateur Clé

      Chiffre

      Présence mondiale

      190 pays

      Employés

      Plus de 5 000

      Utilisateurs actifs mensuels

      260 millions

      Revenu annualisé

      3,5 milliards de dollars

      Créations depuis 2013

      40 milliards

      Créations par seconde

      Plus de 400

      Utilisateurs (étudiants/enseignants)

      Plus de 100 millions

      Organisations à but non lucratif

      Plus d'un million

      Les valeurs de Canva incluent le fait d'être une "bonne personne", de simplifier la complexité, de viser l'excellence et d'œuvrer pour le bien commun.

      3. L'Offre Canva Solidaire pour les Associations

      Une partie importante de la présentation a été consacrée à Canva Solidaire, l'offre dédiée au secteur associatif.

      Principe : Canva Solidaire est l'équivalent de Canva Pro, mais offert gratuitement aux organisations éligibles.

      Avantages : Accès à toutes les fonctionnalités de Canva Pro, y compris plus de modèles, de photos, d'éléments, le Kit d'Identité Visuelle, la planification de contenu, et la possibilité d'intégrer jusqu'à 10 personnes gratuitement dans l'équipe.

      Éligibilité : L'offre s'adresse principalement aux associations loi 1901. Sont exclues les administrations publiques, les organisations éducatives (qui ont leur propre programme gratuit), et les clubs sportifs professionnels, entre autres.

      Procédure d'inscription :

      1. Se rendre sur la page dédiée de Canva Solidaire.  

      2. Cliquer sur "Demander un compte Canva Solidaire".   

      3. S'inscrire ou se connecter avec un compte Canva existant.  

      4. Rechercher le nom de son association. Dans la plupart des cas, Canva la reconnaît via son numéro de déclaration en préfecture et valide le compte automatiquement.  

      5. Si l'association n'est pas trouvée, il est nécessaire de joindre des documents justificatifs (déclaration en préfecture, statuts de l'association).  

      6. Le support Canva confirme ensuite l'accès par e-mail.

      4. Prise en Main de la Plateforme Canva

      Alisée a présenté une cartographie des fonctionnalités principales de l'interface Canva pour familiariser les utilisateurs, même débutants.

      Page d'accueil : Présente des raccourcis vers différents formats (présentations, réseaux sociaux, vidéos) et des menus pour accéder aux modèles, aux projets existants et à la planification.

      Modèles (Templates) : Le point de départ recommandé pour les débutants. Il s'agit d'une vaste bibliothèque de créations réalisées par les "Créators".

      Astuce : Pour trouver des formats spécifiquement français (ex: marque-page), il est conseillé d'ajouter une astérisque (*) à la recherche.

      Menu de gauche (dans l'éditeur) :

      Design/Modèles : Pour rechercher et appliquer un nouveau modèle.  

      Éléments : Contient les formes, illustrations, photos, vidéos, et audios.  

      Marque : Section cruciale où l'association peut configurer son identité visuelle (logos, couleurs, polices). Une fois configuré, ce kit peut être appliqué en un clic à n'importe quel design pour garantir la cohérence.  

      Importer : Pour ajouter ses propres fichiers (images, logos, vidéos).  

      Texte, Projets, Applications : Autres outils de création et d'organisation.

      Sauvegarde automatique : Canva enregistre les créations en temps réel, évitant ainsi toute perte de travail en cas de problème technique.

      5. Principes Fondamentaux de la Création d'Affiches Efficaces

      Pour créer une affiche percutante, Alisée a détaillé cinq principes de design essentiels :

      1. La Hiérarchisation : Organiser les informations de la plus importante à la moins importante.

      Le titre doit attirer l'œil en premier, suivi des informations clés (date, lieu), puis des détails secondaires. L'œil humain "hiérarchise avant de comprendre".

      2. Le Branding : Utiliser de manière cohérente les éléments de l'identité visuelle de l'association (couleurs, logo, polices, style d'illustration).

      Cela permet une reconnaissance immédiate et renforce le professionnalisme. Par exemple, utiliser du vert pour une association écologique.

      3. La Visibilité : S'assurer que l'affiche est visible et attire l'attention.

      Cela passe par le choix des polices, la présence claire du logo, et l'intégration d'un appel à l'action ("Call to Action") clair et engageant (ex : "Rejoignez-nous !", "Devenez bénévole").

      4. La Lisibilité : Garantir que le message est facile et agréable à lire. Il faut prêter attention au contraste des couleurs, à la taille des polices (éviter les polices fantaisistes pour les paragraphes longs), à l'espacement entre les lignes (interlignage) et aux marges. Le regard a tendance à balayer une page en "Z".

      5. La Composition : L'agencement global des éléments sur la page.

      Il faut travailler avec les alignements, les marges, les espaces négatifs (le "vide") pour créer un équilibre visuel et guider le regard du spectateur, assurant une bonne compréhension du message.

      6. Les Fonctionnalités d'Intelligence Artificielle (IA) de Canva

      Le webinaire a présenté quelques outils d'IA intégrés dans le Studio Magique de Canva, conçus pour simplifier des tâches complexes.

      Génération d'arrière-plan : Possibilité de sélectionner une photo, de supprimer l'arrière-plan existant et d'en générer un nouveau à partir d'une simple description textuelle (prompt).

      Par exemple, transformer une photo de bénévoles sur une plage en une scène dans la nature.

      Capture de texte : Cet outil permet de "détecter" le texte sur une image aplatie (comme un PDF ou un JPEG) et de le rendre entièrement modifiable.

      C'est très utile pour mettre à jour une ancienne affiche dont on n'a plus le fichier source.

      Génération de code : Une fonctionnalité plus avancée a été montrée, où l'IA de Canva a généré le code HTML pour un formulaire de contact destiné au recrutement de bénévoles.

      Ce code peut ensuite être intégré sur un site web ou dans un document.

      7. Déclinaison des Contenus pour Différents Supports

      Un enjeu majeur pour les associations est d'adapter leurs visuels pour différents canaux (flyer, publication Instagram, bannière web, etc.).

      Deux méthodes ont été présentées :

      1. Méthode 1 (Multi-formats dans un seul document) :

      ◦ Dans un design existant (ex: une affiche A4), on peut ajouter une nouvelle "page" et lui assigner un type de format différent (ex: publication Instagram, vidéo, présentation).  

      ◦ Cela permet de conserver tous les éléments de base et de les réorganiser manuellement pour chaque format au sein d'un seul et même projet.

      2. Méthode 2 (Fonction "Redimensionner" - Canva Pro) :

      ◦ Cette fonction permet de dupliquer automatiquement un design dans un ou plusieurs autres formats.  

      ◦ L'utilisateur sélectionne les nouveaux formats désirés (ex: Story Instagram, Bannière Facebook).  

      ◦ Canva crée de nouvelles versions du design aux bonnes dimensions, en tentant d'adapter les éléments.

      Des ajustements manuels sont souvent nécessaires.   

      Conseil d'experte : Il est crucial d'utiliser l'option "Copier et redimensionner" plutôt que "Redimensionner ce design" pour conserver le fichier original intact.

      8. Ressources Complémentaires et Formation Continue

      Pour permettre aux associations d'aller plus loin, les intervenantes ont partagé deux ressources clés :

      Trouver des modèles français : En utilisant le code de recherche FR association dans la barre de recherche de modèles, les utilisateurs peuvent accéder à une sélection de templates créés spécifiquement par la communauté des "Créators" français pour les besoins du secteur associatif.

      Canva Design School : Accessible directement depuis le menu de la plateforme, c'est une "école de design" gratuite intégrée.

      Elle propose des cours, des leçons vidéos en français, et des activités pratiques pour maîtriser des outils spécifiques (vidéo, IA, etc.) et se perfectionner en design graphique.

      9. Session de Questions-Réponses : Points Clés

      La fin du webinaire a permis de clarifier plusieurs points importants :

      Droit d'utilisation des images : Toutes les images de la bibliothèque Canva sont libres de droit pour une utilisation dans des créations.

      Il est possible de vendre des produits (t-shirts, tasses) avec un design créé sur Canva, à condition qu'il s'agisse d'une composition originale (texte, autres éléments ajoutés) et non d'une simple image de la bibliothèque apposée sur le produit.

      Nombre de polices : Pour une affiche, il est recommandé d'utiliser deux à trois polices (typos) maximum pour garantir la clarté et l'harmonie visuelle.

      Newsletters : Canva permet de créer le design d'une newsletter, mais n'est pas un outil d'envoi d'e-mails.

      Le design doit être exporté (par exemple en lien HTML) pour être intégré dans un outil de mailing dédié (ex: Mailchimp).

      Confidentialité : Les créations réalisées sur un compte Canva sont privées et ne sont pas ajoutées à la bibliothèque publique de modèles.

      Langue de l'IA : Les outils d'IA de Canva comprennent et fonctionnent parfaitement avec des instructions en français.

    1. Reviewer #2 (Public review):

      This study examined the effects of several cardenolides, including N,S-ring containing variants, on sequestration and performance metrics in monarch larvae. The authors confirm that some cardenolides, which are toxic to non-adapted herbivores, are sequestered by monarchs and enhance performance. Interestingly, N,S-ring-containing cardenolides did not have the same effects and were poorly sequestered, with minimal recovery in frass, suggesting an alternate detoxification or metabolic strategy. These N,S-containing compounds are also known to be less potent defences against non-adapted herbivores. The authors further report that mixtures of cardenolides reduce herbivore performance and sequestration compared to single compounds, highlighting the important role of phytochemical diversity in shaping plant-herbivore interactions.

      Overall, this study is clearly written, well-conducted and has the potential to make a valuable contribution to the field. However, I have one major concern regarding the interpretations of the mixture results. From what I understand of the methods, all tested mixtures contain all five compounds. As such, it is not possible to determine whether reduced performance and sequestration result from the complete mixture or from the presence of a single compound, such as voruscharin for performance and uscharin for sequestration. For instance, if all compounds except voruscharin (or uscharin) were combined, would the same pattern emerge? I suspect not, since the effects of the individual N,S-containing compounds alone are generally similar to those of the full mixture (Figure S3). By taking the average of all single compounds, the individual effects of the N,S-containing ones are being inflated by the non-N,S-containing ones (in the main text, Figure 4). In the mix, of course, they are not being 'diluted', as they are always present. This interpretation is further supported by the fact that in the equimolar mix, the relative proportion of voruscharin decreases (from 50% in the 'real mix'), and the target measurements of performance and sequestration tend to increase in the equimolar mix compared to the real mix.

      Despite this issue, the discussion of mixtures in the context of plant defence against both adapted and non-adapted herbivores is fascinating and convincing. The rationale that mixtures may serve as a chemical tool-kit that targets different sets of herbivores is compelling. The non-N,S cardenolides are effective against non-adapted herbivores and the N,S-containing cardenolides are effective against adapted herbivores. However, the current experiments focus exclusively on an adapted species. It would be especially interesting to test whether such mixtures reduce overall herbivory when both adapted and non-adapted species are present.

      It remains possible that mixtures, even in the absence of voruscharin or uscharin, genuinely reduce sequestration or performance; however, this would need to be tested directly to address the abovementioned concern.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Faiz et al. investigate small molecule-driven direct lineage reprogramming of mouse postnatal mouse astrocytes to oligodendrocyte lineage cells (OLCs). They use a combination of in vitro, in vivo, and computational approaches to confirm lineage conversion and to examine the key underlying transcription factors and signaling pathways. Lentiviral delivery of transcription factors previously reported to be essential in OLC fate determination-Sox10, Olig2, and Nkx2.2-to astrocytes allows for lineage tracing. They found that these transcription factors are sufficient in reprogramming astrocytes to iOLCs, but that the OLCs range in maturity level depending on which factor they are transfected with. They followed up with scRNA-seq analysis of transfected and control cultures 14DPT, confirming that TF-induced astrocytes take on canonical OLC gene signatures. By performing astrocyte lineage fate mapping, they further confirmed that TF-induced astrocytes give rise to iOLCs. Finally, they examined the distinct genetic drivers of this fate conversion using scRNA-seq and deep learning models of Sox10- astrocytes at multiple time points throughout the reprogramming. These findings are certainly relevant to diseases characterized by the perturbation of OLC maturation and/or myelination, such as Multiple Sclerosis and Alzheimer's Disease. Their application of such a wide array of experimental approaches gives more weight to their findings and allows for the identification of additional genetic drivers of astrocyte to iOLC conversion that could be explored in future studies. Overall, I find this manuscript thoughtfully constructed and only have a few questions to be addressed. 

      (1) The authors suggest that Sox10- and Olig2- transduced astrocytes result in distinct subpopulations iOLCs. Considering it was discussed in the introduction that these TFs cyclically regulate one another throughout differentiation, could they speculate as to why such varying iOLCs resulted from the induction of these two TFs? 

      We thank the Reviewer for the opportunity to speculate. We hypothesize that Sox10 and Olig2 may induce different OLCs as a result of differential activation of downstream genes within the gene regulatory network, which are important for OPC, committed OLC and mature OL identity [1]. In support of this, we found different expression levels of genes involved in downstream OLC specification networks [1], including Sox6, Tcfl2 and Myrf, at D14 (Author response image 1), following further analysis of our RNA-seq data.

      Author response image 1.

      Expression of OLC regulatory network genes in Sox10- and Olig2- cultures. Violin plots show gene expression levels (log-normalized) of downstream OLC regulatory genes (Sox6, Zeb2, Tcf7l2, Myrf, Zfp488, Nfatc2, Hes5, Id2) between Sox10 and Olig2 treated OLCs at 14 days post transduction. Analysis was performed on oligodendrocyte progenitor and mature oligodendrocyte clusters (from Manuscript Figure 1D, clusters 3 and 8).

      (2) In Figure 1B it appears that the Sox10- MBP+ tdTomato+ cells decreases from D12 to D14. Does this make sense considering MBP is a marker of more mature OLCs? 

      Thank you for this comment. To address this, we compared the number of MBP+tdTomato+ Sox10 cells across reprogramming timepoints. We saw no difference between the number of MBP+tdTomato+ OLs at D12 and D14 (Author response image 2, p = 0.2314). However,  we do see a [nonsignificant] decrease in MBP+tdTomato+ Sox10 cells from D12 to D22 (Manuscript Supplementary Figure 3B, Author response image 2, p= 0.0543), which suggests that culture conditions are not optimal for longer-term cell survival [2], [3], [4].  

      Author response image 2.

      Comparison of Sox10- induced MBP+tdTomato+ iOLCs over time. Quantification of MBP<sup>+</sup>tdTomato<sup>+</sup> iOLs in Sox10 cultures at D8 (n=5), D10 (n=5), D12 (n=5), D14 (n=7) and D22 (n=3) post transduction. Data are presented as mean ± SEM, each data point represents one individual cell culture experiment, Brown-Forsythe and Welch ANOVA on transformed percentages with Dunnett’s T3 multiple comparisons test (*= p<0.05).  

      (3) Previous studies have shown that MBP expression and myelination in vitro occurs at the earliest around 4-6 weeks of culturing. When assessing whether further maturation would increase MBP positivity, authors only cultured cells up to 22 DPT and saw no significant increase. Has a lengthier culture timeline been attempted? 

      We agree with the Reviewer that previous studies of pluripotent stem cell derived (hESCs or iPSCs) have shown MBP+ OLCs in vitro around 4-6 weeks [5], [6], [7]. However,  studies of neural stem cells [8] or fibroblasts [9] conversion show OLC appearance after 7 and 24 days, respectively, demonstrating that OLCs can be generated in vitro within 1-3 weeks of plating. Moreover, as noted above in response to #2, we see fewer MBP+ cells at  22DPT, suggesting that extended time in culture may require additional factors for support. Therefore, we did not attempt longer timepoints. 

      (4) Figure S4D is described as "examples of tdTomatonegzsGreen+OLCmarker+ cells that arose from a tdTomatoneg cell with an astrocyte morphology." The zsGreen+ tdTomato- cell is not convincingly of "astrocyte morphology"; it could be a bipolar OLC. To strengthen the conclusions and remove this subjectivity, more extensive characterizations of astrocyte versus OLC morphology in the introduction or results are warranted. This would make this observation more convincing since there is clearly an overlap in the characteristics of these cell types.  

      We thank the reviewer for this excellent suggestion. To assess astrocyte morphology, we measured the cell size, nucleus size, number of branches and branch thickness of 70 Aldh1l1+tdTomato+ astrocytes in tamoxifen-labelled Aldh1l1-CreERT2;Ai14 cultures (new Supplemental Table 1). To assess OPC morphology, we  performed IHC for PDGFRa in iOLC cultures and measured the same parameters in 70 PDGFRa+ OPCs (new Supplemental Table 1).  We found that astrocytes were characterized by larger branch thickness, cell length and nucleus size, while OPCs showed a larger number of branches (new Supplemental Figure 1, and Author response image 3 below). Based on this framework, the AAV9-GFAP::zsGreen<sup>pos</sup>Aldh1l1-tdTomato<sup>neg</sup> and AAV9-GFAP::zsGreen<sup>pos</sup>Aldh1l1-tdTomato<sup>pos</sup>starting cells tracked fall within the bounds of ‘astrocytes’. We have revised the manuscript to include this more rigorous characterization (Line 119-124, Page 4; Line 307-312, Page 9; Line 323-326, Page 9). We also demonstrate (below) that the GFAP::zsGreen<sup>pos</sup> Aldh1l1-tdTomato<sup>pos</sup> and GFAP::zsGreen<sup>pos</sup>Aldh1l1-tdTomato<sup>neg</sup> starting cell depicted in Figure 2G and Supplemental Figure 5D is consistent with astrocyte morphology (Author response image 3). 

      Author response image 3.

      Morphological characterization of astrocytes, oligodendrocyte lineage cells, and starting cells. Quantification of the (A) cell length, (B) nucleus size, (C) number of branches, and (D) branch thickness iAldh1l1+tdTomato+ and PDGFRα+ OPCs (n= 70 per cell type, data are presented as mean ± SEM). Orange line indicates parameter value for GFAP::zsGreen<sup>pos</sup>Aldh1l1-tdTomato<sup>pos</sup> starting cell in Figure 2G. Green line indicates parameter value for GFAP::zsGreen<sup>pos</sup> Aldh1l1-tdTomato<sup>neg</sup> starting cell in Supplemental Figure 5D.

      Reviewer #2 (Public Review):             

      The study by Bajohr investigates the important question of whether astrocytes can generate oligodendrocytes by direct lineage conversion (DLR). The authors ectopically express three transcription factors - Sox10, Olig2 and Nkx6.2 - in cultured postnatal mouse astrocytes and use a combination of Aldh1|1-astrocyte fate mapping and live cell imaging to demonstrate that Sox10 converts astrocytes to MBP+ oligodendrocytes, whereas Olig2 expression converts astrocytes to PDFRalpha+ oligodendrocyte progenitor cells. Nkx6.2 does not induce lineage conversion. The authors use single-cell RNAseq over 14 days post-transduction to uncover molecular signatures of newly generated iOLs.  

      The potential to convert astrocytes to oligodendrocytes has been previously analyzed and demonstrated. Despite the extensive molecular characterization of the direct astrocyteoligodendrocyte lineage conversion, the paper by Bajohr et al. does not represent significant progress. The entire study is performed in cultured cells, and it is not demonstrated whether this lineage conversion can be induced in astrocytes in vivo, particularly at which developmental stage (postnatal, adult?) and in which brain region. The authors also state that generating oligodendrocytes from astrocytes could be relevant for oligodendrocyte regeneration and myelin repair, but they don't demonstrate that lineage conversion can be induced under pathological conditions, particularly after white matter demyelination. Specific issues are outlined below. 

      We thank the reviewer for this summary. We agree that there are a handful of reports of astrocytelike cells to OLC conversion [10], [11]. However, our study is the first study to confirm bonafide astrocyte to OLC conversion, which is important given the recent controversy in the field of in vivo astrocyte to neuron reprogramming [12]. In addition, the extensive characterization of the molecular timeline of reprogramming, highlights that although conversion of astrocytes is possible by ectopic expression of any of the three factors, the subtypes of astrocytes converted and maturity of OLCs produced may vary depending on the choice of TF delivered. Our findings will inform future in vivo studies of iOLC generation that aim to understand the impact of brain region, age, pathology, and sex, which are especially important given the diversity of astrocyte responses to disease [13], [14], [15].

      (1) The authors perform an extensive characterization of Sox10-mediated DLR by scRNAseq and demonstrate a clear trajectory of lineage conversion from astrocytes to terminally differentiated MBP+ iOLCs. A similar type of analysis should be performed after Olig2 transduction, to determine whether transcriptomics of olig2 conversion overlaps with any phase of sox10 conversion.

      We thank the Reviewer for this excellent comment. We chose to include an in-depth analysis of Sox10 in the manuscript, as Sox10-transduced cultures showed a higher percentage of mature iOLCs compared to Olig2 in our studies. We have added this specific rationale to the manuscript (Line 329-330-Page 9). 

      Nonetheless, we also agree that understanding the underpinnings of Olig2-mediated conversion is important. Therefore, we used Cell Oracle [16] to understand the regulation of cell identity by Olig2.  in silico overexpression of Olig2 in our control time course dataset (D0, D3, D8 and D14) showed cell movement from cluster 1, characterized by astrocyte genes [Mmd2[17], Entpd2[18], H2-D1[19]], towards cluster 5, characterized by OPC genes [Pdgfra[20], Myt1[21]] validating astrocyte to OLC conversion by Olig2 (Author response image 4).

      We hypothesize that reprogramming via Sox10 and Olig2 take different conversion paths to oligodendrocytes for the following reasons. 

      (1) Differential astrocyte gene expression at D14 when cells are exposed to Sox10 and Olig2 (Manuscript Figure 1D-E [Sox10 characterized by Lcn2[19], C3[19]; Olig2 characterized by Slc6a11[22], Slc1a2[23]].

      (2) Differential expression of key OLC gene regulatory network genes at D14 between cells treated with Sox10 and Olig2 (Author response image 1). 

      Author response image 4.

      in silico modeling of Olig2 reprogramming (A) UMAP clustering of Cre control treated cells from 0, 3, 8, and 14 days post transduction (DPT). (B) UMAP clustering from (A) overlayed with timepoint and treatment group. (C) Cell Oracle modeling of predicted cell trajectories following Olig2 knock in (KI), overlaid onto UMAP plot. Arrows indicate cell movement prediction with Olig2 KI perturbation.  

      (2) A complete immunohistochemical characterization of the cultures should be performed at different time points after Sox10 and Olig2 transduction to confirm OL lineage cell phenotypes. 

      We performed a complete immunohistochemical characterization of Ai14 cultures transduced with GFAP::Sox10-Cre and GFAP::Olig2-Cre. This system allows permanent labelling and therefore, enabled the tracking of transduced cells through the process or DLR, which we believe is the most appropriate way to characterize iOLC conversion efficiencies. We then confirmed the conversion of Aldh1l1+ astrocytes in Aldh1l1-CreERT2;Ai14 cultures transduced with GFAP::Sox10-zsGreen and GFAP::Olig2-zsGreen. In this system, GFAP drives the expression of zsGreen, and therefore, may not faithfully track all cells and lead to an underestimate of the numbers of converted cells. For example, iOLCs from Aldh1l1<sup>neg</sup> astrocytes or iOLCs that have lost zsGreen expression following conversion. Therefore we use this system only to confirm astrocyte origin.

      Nonetheless, we appreciate this comment and recognize that there may be differences in conversion efficiencies when analyzing Aldh1l1+ astrocytes versus all transduced cells. Therefore, we have softened the language in the manuscript (see below) regarding Olig2 and Sox10 generating different OLC phenotypes and now claim iOLC generation from both Sox10 and Olig2. We thank the Reviewer for this comment, and believe it has strengthened the discussion. 

      Line 240, Page 7

      Line 261-263, Page 8

      Line 304-307, Page 8/9

      Line 413-414, Page 11

      References

      (1) E. Sock and M. Wegner, “Using the lineage determinants Olig2 and Sox10 to explore transcriptional regulation of oligodendrocyte development,” Dev Neurobiol, vol. 81, no. 7, pp. 892–901, Oct. 2021, doi: 10.1002/dneu.22849.

      (2) B. A. Barres, M. D. Jacobson, R. Schmid, M. Sendtner, and M. C. Raff, “Does oligodendrocyte survival depend on axons?,” Current Biology, vol. 3, no. 8, pp. 489–497, Aug. 1993, doi: 10.1016/0960-9822(93)90039-Q.

      (3) A.-N. Cho et al., “Aligned Brain Extracellular Matrix Promotes Differentiation and Myelination of Human-Induced Pluripotent Stem Cell-Derived Oligodendrocytes,” ACS Appl. Mater. Interfaces, vol. 11, no. 17, pp. 15344–15353, May 2019, doi: 10.1021/acsami.9b03242.

      (4) E. G. Hughes and M. E. Stockton, “Premyelinating Oligodendrocytes: Mechanisms Underlying Cell Survival and Integration,” Front. Cell Dev. Biol., vol. 9, Jul. 2021, doi: 10.3389/fcell.2021.714169.

      (5) M. Ehrlich et al., “Rapid and efficient generation of oligodendrocytes from human induced pluripotent stem cells using transcription factors,” Proc Natl Acad Sci U S A, vol. 114, no. 11, pp. E2243–E2252, Mar. 2017, doi: 10.1073/pnas.1614412114.

      (6) Y. Liu, P. Jiang, and W. Deng, “OLIG gene targeting in human pluripotent stem cells for motor neuron and oligodendrocyte differentiation,” Nat Protoc, vol. 6, no. 5, pp. 640–655, May 2011, doi: 10.1038/nprot.2011.310.

      (7) S. A. Goldman and N. J. Kuypers, “How to make an oligodendrocyte,” Development, vol. 142, no. 23, pp. 3983–3995, Dec. 2015, doi: 10.1242/dev.126409.

      (8) M. Faiz, N. Sachewsky, S. Gascón, K. W. A. Bang, C. M. Morshead, and A. Nagy, “Adult Neural Stem Cells from the Subventricular Zone Give Rise to Reactive Astrocytes in the Cortex after Stroke,” Cell Stem Cell, vol. 17, no. 5, pp. 624–634, Nov. 2015, doi:10.1016/j.stem.2015.08.002.

      (9) F. J. Najm et al., “Transcription factor–mediated reprogramming of fibroblasts to expandable, myelinogenic oligodendrocyte progenitor cells,” Nat Biotechnol, vol. 31, no. 5, pp. 426–433, May 2013, doi: 10.1038/nbt.2561.

      (10) A. Mokhtarzadeh Khanghahi, L. Satarian, W. Deng, H. Baharvand, and M. Javan, “In vivo conversion of astrocytes into oligodendrocyte lineage cells with transcription factor Sox10; Promise for myelin repair in multiple sclerosis,” PLoS One, vol. 13, no. 9, p. e0203785, Sep. 2018, doi: 10.1371/journal.pone.0203785.

      (11) S. Farhangi, S. Dehghan, M. Totonchi, and M. Javan, “In vivo conversion of astrocytes to oligodendrocyte lineage cells in adult mice demyelinated brains by Sox2,” Mult Scler Relat Disord, vol. 28, pp. 263–272, Feb. 2019, doi: 10.1016/j.msard.2018.12.041.

      (12) L.-L. Wang, C. Serrano, X. Zhong, S. Ma, Y. Zou, and C.-L. Zhang, “Revisiting astrocyte to neuron conversion with lineage tracing in vivo,” Cell, vol. 184, no. 21, pp. 5465-5481.e16, Oct. 2021, doi: 10.1016/j.cell.2021.09.005.

      (13) I  Matias, J. Morgado, and F. C. A. Gomes, “Astrocyte Heterogeneity: Impact to Brain Aging and Disease,” Front. Aging Neurosci., vol. 11, Mar. 2019, doi: 10.3389/fnagi.2019.00059.

      (14) N. Habib et al., “Disease-associated astrocytes in Alzheimer’s disease and aging,” Nat Neurosci, vol. 23, no. 6, pp. 701–706, Jun. 2020, doi: 10.1038/s41593-020-0624-8.

      (15)  M. A. Wheeler et al., “MAFG-driven astrocytes promote CNS inflammation,” Nature, vol. 578, no. 7796, pp. 593–599, Feb. 2020, doi: 10.1038/s41586-020-1999-0.

      (16) K. Kamimoto, B. Stringa, C. M. Hoffmann, K. Jindal, L. Solnica-Krezel, and S. A. Morris, “Dissecting cell identity via network inference and in silico gene perturbation,” Nature, vol. 614, no. 7949, pp. 742–751, Feb. 2023, doi: 10.1038/s41586-022-05688-9.

      (17) P. Kang et al., “Sox9 and NFIA coordinate a transcriptional regulatory cascade during the initiation of gliogenesis,” Neuron, vol. 74, no. 1, pp. 79–94, Apr. 2012, doi:10.1016/j.neuron.2012.01.024.

      (18) K. Saito et al., “Microglia sense astrocyte dysfunction and prevent disease progression in an Alexander disease model,” Brain, vol. 147, no. 2, pp. 698–716, Nov. 2023, doi:10.1093/brain/awad358.

      (19) S. A. Liddelow et al., “Neurotoxic reactive astrocytes are induced by activated microglia,” Nature, vol. 541, no. 7638, pp. 481–487, Jan. 2017, doi: 10.1038/nature21029.

      (20) Q. Zhu et al., “Genetic evidence that Nkx2.2 and Pdgfra are major determinants of the timing of oligodendrocyte differentiation in the developing CNS,” Development, vol. 141, no. 3, pp. 548–555, Feb. 2014, doi: 10.1242/dev.095323.

      (21) J. A. Nielsen, J. A. Berndt, L. D. Hudson, and R. C. Armstrong, “Myelin transcription factor 1 (Myt1) modulates the proliferation and differentiation of oligodendrocyte lineage cells,” Mol Cell Neurosci, vol. 25, no. 1, pp. 111–123, Jan. 2004, doi:10.1016/j.mcn.2003.10.001.

      (22) J. Liu, X. Feng, Y. Wang, X. Xia, and J. C. Zheng, “Astrocytes: GABAceptive and GABAergic Cells in the Brain,” Front. Cell. Neurosci., vol. 16, Jun. 2022, doi:10.3389/fncel.2022.892497.

      (23) A. Sharma et al., “Divergent roles of astrocytic versus neuronal EAAT2 deficiency on cognition and overlap with aging and Alzheimer’s molecular signatures,” Proceedings of the National Academy of Sciences, vol. 116, no. 43, pp. 21800–21811, Oct. 2019, doi:10.1073/pnas.1903566116

    1. Reviewer #1 (Public review):

      This study explores the connectivity patterns that could lead to fast and slow undulating swim patterns in larval zebrafish using a simplified theoretical framework. The authors show that a pattern of connectivity based only on inhibition is sufficient to produce realistic patterns with a single frequency. Two such networks couple with inhibition but with distinct time constants can produce a range of frequencies. Adding excitatory connections further increases the range of obtainable frequencies, albeit at the expense of sudden transitions in mid-frequency range.

      Strengths:

      (1) This is an eloquent approach to answering the question of how spinal locomotor circuits generate coordinated activity using a theoretical approach based on moving bump models of brain activity.

      (2) The models make specific predictions on patterns of connectivity while discounting the role of connectivity strength or neuronal intrinsic properties in shaping the pattern.

      (3) The models also propose that there is an important association between cell-type-specific intersegmental patterns and the recruitment of speed-selective subpopulations of interneurons.

      (4) Having a hierarchy of models creates a compelling argument for explaining rhythmicity at the network level. Each model builds on the last and reveals a new perspective on how network dynamics can control rhythmicity. I liked that each model can be used to probe questions in the next/previous model.

      Comments on revisions:

      I am very happy to see the simplified biophysical model supporting the original findings. The authors have done an excellent job addressing my comments.

      Just a small note, please change C. Elegans to C. elegans.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1)How is this simplified model representative of what is observed biologically? A bump model does not naturally produce oscillations. How would the dynamics of a rhythm generator interact with this simplistic model?

      Bump models naturally produce sequential activity, and can be engineered to repeat this sequential activity periodically (Zhang, 1996; Samsonovich and McNaughton, 1997; Murray and Escola, 2017). This is the basis for the oscillatory behavior in the model presented here. As we describe in our paper, such a model is consistent with numerous neurobiological observations about cell-type-specific connectivity patterns. The reviewer is, however, correct to point out that our model does not incorporate other key neurobiological features--in particular, intracellular dynamical properties--that have been shown to play important roles in rhythm generation. Our aim in this work is to establish a circuit-level mechanism for rhythm generation, complementary to classical models that rely on intracellular dynamics for rhythm generation. Whether and how these mechanisms work together is something that we plan to explore in future work, and we have added a sentence to the Discussion to this effect.

      (2) Would this theoretical construct survive being expressed in a biophysical model? It seems that it should, but even a simple biological model with the basic patterns of connectivity shown here would greatly increase confidence in the biological plausibility of the theory.

      We thank the reviewer for pointing out this way to strengthen our paper. We implemented the connectivity developed in the rate models in a spiking neuron model which used EI-balanced Poisson noise as input drive. We found that we could reproduce all the main results of our analysis. In particular, with a realistic number of neurons, we observed swimming activity characterized by (i) left-right alternation, (ii) rostal-caudal propagation, and (iii) variable speed control with constant phase lag. The spiking model demonstrates that the connectivity-motif based mechanisms for rhythmogenesis that we propose are robust in a biophysical setting.

      We included these results in the updated manuscript in a new Results subsection titled “Robustness in a biophysical model.”

      (3) How stable is this model in its output patterns? Is it robust to noise? Does noise, in fact, smooth out the abrupt transitions in frequency in the middle range?

      The newly added spiking model implementation of the network demonstrates that the core mechanisms of our models are robust to noise,  since the connectivity is randomly chosen and the input drive is Poisson noise.

      To test the effect of noise as it is parametrically varied, we also added noise directly to the rate models in the form of white noise input to each unit. Namely, the rate model was adapted to obey the stochastic differential equation

      \[

      \tau_i \frac{dr_i(t)}{dt} = -r_i(t) + \left[ \sum_j W_{ij} r_j(t - \Delta_{ij}) + D_i + \sigma\xi_t \right]_+

      \]

      Here $\xi_t$ is a standard Gaussian white noise and $\sigma$ sets the strength of the noise. We found that the swimming patterns were robust at all frequencies up to $\sigma =  0.05$. Above this level, coherent oscillations started to break down for some swim frequencies. To investigate whether the noise smoothed out abrupt transitions, we swept through different values of noise and modularity of excitatory connections. The results showed very minor improvement in controllability (see figure below), but this was not significant enough to include in the manuscript.

      Author response image 1.

      (4) All figure captions are inadequate. They should have enough information for the reader to understand the figure and the point that was meant to be conveyed. For example, Figure 1 does not explain what the red dot is, what is black, what is white, or what the gradations of gray are. Or even if this is a representative connectivity of one node, or if this shows all the connections? The authors should not leave the reader guessing.

      All figure captions have been updated to enhance clarity and address these concerns.

      Reviewer #2 (Public review):

      (1) Figure 1A, if I interpret Figure 1B correctly, should there not be long descending projections as well that don't seem to be illustrated?

      Thank you for highlighting this potential point of confusion. The diagram in question was only intended to be a rough schematic of the types of connections present in the model. We have added additional descending connections as requested

      (2)Page 5, It would be good to define what is meant by slow and fast here, as this definition changes with age in zebrafish (what developmental age)?

      We have updated the manuscript to include the sentence: “These values were chosen to coincide with observed ranges from larval zebrafish.” with appropriate citation.

      Reviewer #3 (Public review):

      (1) The authors describe a single unit as a neuron, be it excitatory or inhibitory, and the output of the simulation is the firing rate of these neurons. Experimentally and in other modeling studies, motor neurons are incorporated in the model, and the output of the network is based on motor neuron firing rate, not the interneurons themselves. Why did the authors choose to build the model this way?

      We chose to leave out the motor neurons from our models for a few reasons. While motor neurons read out the rhythmic activity generated by the interneurons and may provide some feedback, they are not required for rhythmogenesis. In fact, interneuron activity (especially in the excitatory V2a neurons (Agha et al., 2024)) is highly correlated with the ventral root bursts within the same segment. This suggests that motor neurons are primarily a local readout of the rhythmic activity of interneurons; therefore, the rhythmic swimming activity can be deduced directly from the interneurons themselves.

      Moreover, there is a lack of experimental observation of the connectivity between all the cell types considered in our model and motor neurons. Hence, it was unclear how we should include them in the model. To address this, we are currently developing a data-driven approach that will determine the proper connectivity between the motor neurons and the interneurons, including intrasegmental connections.

      (2) In the single population model (Figure 1), the authors use ipsilateral inhibitory connections that are long-range in an ascending direction. Experimentally, these connections have been shown to be local, while long-range ipsilateral connections have been shown to be descending. What were the reasons the authors chose this connectivity? Do the authors think local ascending inhibitions contribute to rostrocaudal propagation, and how?

      The long-range ascending ipsilateral inhibitory connections arises from a limitation of our modeling framework. The V1 neurons that provide these connections have been shown experimentally to fire later than other neurons (especially descending V2a  neurons) within the same hemisegment (Jay et al., J Neurosci, 2023); however, our model can only produce synchronized local activity. Hence, we replace local phase offsets with spatial offsets to produce correctly structured recurrent phasic inputs. We are currently investigating a data-driven method for determining intrasegmental connectivity which should be able to produce the local phase offset and address this concern; however, this is beyond the scope of the current paper.

      (3) In the two-population model, the authors show independent control of frequency and rhythm, as has been reported experimentally. However, in these previous experimental studies, frequency and amplitude are regulated by different neurons, suggesting different networks dedicated to frequency and amplitude control. However, in the current model, the same population with the same connections can contribute to frequency or amplitude depending on relative tonic drive. Can the authors please address these differences either by changes in the model or by adding to the Discussion?

      Our prior  experimental results that suggested a separation of frequency and amplitude control circuits focus on motor neuron recruitment, instead of interneuron activity (Jay et al., J Neurosci 2023; Menelaou and McLean, Nat Commun 2019). To avoid potential confusion about amplitudes of interneurons vs. of motor neurons, we have removed the results from Figure 3 about control of amplitude in the 2-population model, instead focusing this figure on the control of frequency via speed-module recruitment. For the same reason, we have removed the panel showing the effects of targeted ablations on interneuron amplitudes in Figure 7. We have kept the result about amplitude control in our Supplemental Figure S2 for the 8-population model, but we try to make it clear in the text that any relationship between interneuron amplitude and motor neuron amplitude would depend on how motor neurons are modeled, which we do not pursue in this work.

      (4) It would be helpful to add a paragraph in the Discussion on how these results could be applicable to other model systems beyond zebrafish. Cell intrinsic rhythmogenesis is a popular concept in the field, and these results show an interesting and novel alternative. It would help to know if there is any experimental evidence suggesting such network-based propagation in other systems, invertebrates, or vertebrates.

      We have expanded a paragraph in the Discussion to address these questions. In particular, we highlight how a recent study of mouse locomotor circuits produced a model with similar key features (Komi et al., 2024). These authors made direct use of experimentally determined connectivity structure and cell-type distributions, which informed a model that produced purely network-based rhythmogenesis. We also point out that inhibition-dominated connectivity has been used for understanding oscillatory behavior in neural circuits outside the context of motor control (Zhang, 1996; Samsonovich and McNaughton, 1997; Murray and Escola, 2017). Finally, we address a study that used the cell-type specific connectivity within the C. Elegans locomotor circuit as the architecture for an artificial motor control system and found that the resulting system could more efficiently learn motor control tasks than general machine learning architectures (Bhattasali et al. 2022). Like our model, the Komi et al. and Bhattasali et al. models generate rhythm via structured connectivity motifs rather than via intracellular dynamical properties, suggesting that these may be a key mechanism underlying locomotion across species.

      Reviewer #1 (Recommendations for the authors):

      (1) Express this modeling construct in a simple biophysical model.

      See the new Results subsection titled “Robustness in a biophysical model.”

      (2) Please cite the classic models of Kopell, Ermentrout, Williams, Sigvardt etc., especially where you say "classic models".

      We have added relevant citations including the mentioned authors.

      (3) "Rhythmogenesis remain incompletely understood" changed to "Rhythmogenesis remains incompletely understood".

      We chose not to make this change since the ‘remain’ refers to the plural ‘core mechanisms’ not the singular ‘rhythmogenesis’.

      Reviewer #3 (Recommendations for the authors):

      (1) The figures are well made; however, it would help to add more details to the figure legends. For example, what neuron's firing rate is shown in Figure 1C? What is the red dot in 1B? Figures 3E,F,G: what is being plotted? Mean and SD? Blue dot in Figure 5C?

      All figure captions have been updated to enhance clarity and address these concerns.

      (2) A, B text missing in Figure 7.

      We have revised this figure and its caption; please see our response to Comment 3 above.

      (3) It would be nice to see the tonic drive pattern that is fed to the model for each case, along with the different firing rates in the figures. It would help understand how the tonic drive is changed to rhythmic activity.

      The tonic drive in the rate models is implemented as a constant excitatory input that is uniform across all units within the same speed-population. There is no patterning in time or location to this drive.

      References

      (1) Moneeza A Agha, Sandeep Kishore, and David L McLean. Cell-type-specific origins of locomotor rhythmicity at different speeds in larval zebrafish. eLife, July 2024

      (2) Nikhil Bhattasali, Anthony M Zador, and Tatiana Engel. Neural circuit architectural priors for embodied control. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 12744–12759. Curran Associates, Inc., 2022.

      (3) Salif Komi, August Winther, Grace A. Houser, Roar Jakob Sørensen, Silas Dalum Larsen, Madelaine C. Adamssom Bonfils, Guanghui Li, and Rune W. Berg. Spatial and network principles behind neural generation of locomotion. bioRxiv, 2024

      (4) James M Murray and G Sean Escola. Learning multiple variable-speed sequences in striatum via cortical tutoring. eLife, 6:e26084, May 2017.

      (5) Alexei Samsonovich and Bruce L McNaughton. Path integration and cognitive mapping in a continuous attractor neural network model. Journal of Neuroscience, 17(15):5900–5920, 1997.

      (6) K Zhang. Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: a theory. Journal of Neuroscience, 16(6):2112–2126, 1996.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      We thank the Reviewers for their thorough attention to our paper and the interesting discussion about the findings. Before responding to more specific comments, here some general points we would like to clarify:

      (1) Ecological niche models are indeed correlative models, and we used them to highlight environmental factors associated with HPAI outbreaks within two host groups. We will further revise the terminology that could still unintentionally suggest causal inference. The few remaining ambiguities were mainly in the Discussion section, where our intent was to interpret the results in light of the broader scientific literature. Particularly, we will change the following expressions:

      -  “Which factors can explain…” to  “Which factors are associated with…” (line 75);

      -  “the environmental and anthropogenic factors influencing” to “the environmental and anthropogenic factors that are correlated with” (line 273);

      -  “underscoring the influence” to “underscoring the strong association” (line 282).

      (2) We respectfully disagree with the suggestion that an ecological niche modelling (ENM) approach is not appropriate for this work and the research question addressed therein. Ecological niche models are specifically designed to estimate the spatial distribution of the environmental suitability of species and pathogens, making them well suited to our research questions. In our study, we have also explicitly detailed the known limitations of ecological niche models in the Discussion section, in line with prior literature, to ensure their appropriate interpretation in the context of HPAI.

      (3) The environmental layers used in our models were restricted to those available at a global scale, as listed in Supplementary Information Resources S1 (https://github.com/sdellicour/h5nx\_risk\_mapping/blob/master/Scripts\_%26\_data/SI\_Resource\_S1.xlsx). Naturally, not all potentially relevant environmental factors could be included, but the selected layers are explicitly documented and only these were assessed for their importance. Despite this limitation, the performance metrics indicate that the models performed well, suggesting that the chosen covariates capture meaningful associations with HPAI occurrence at a global scale.

      Reviewer #1 (Public review):

      The authors aim to predict ecological suitability for transmission of highly pathogenic avian influenza (HPAI) using ecological niche models. This class of models identify correlations between the locations of species or disease detections and the environment. These correlations are then used to predict habitat suitability (in this work, ecological suitability for disease transmission) in locations where surveillance of the species or disease has not been conducted. The authors fit separate models for HPAI detections in wild birds and farmed birds, for two strains of HPAI (H5N1 and H5Nx) and for two time periods, pre- and post-2020. The authors also validate models fitted to disease occurrence data from pre-2020 using post-2020 occurrence data. I thank the authors for taking the time to respond to my initial review and I provide some follow-up below.

      Detailed comments:

      In my review, I asked the authors to clarify the meaning of "spillover" within the HPAI transmission cycle. This term is still not entirely clear: at lines 409-410, the authors use the term with reference to transmission between wild birds and farmed birds, as distinct to transmission between farmed birds. It is implied but not explicitly stated that "spillover" is relevant to the transmission cycle in farmed birds only. The sentence, "we developed separate ecological niche models for wild and domestic bird HPAI occurrences ..." could have been supported by a clear sentence describing the transmission cycle, to prime the reader for why two separate models were necessary.

      We respectfully disagree that the term “spillover” is unclear in the manuscript. In both the Methods and Discussion sections (lines 387-391 and 409-414), we explicitly define “spillover” as the introduction of HPAI viruses from wild birds into domestic poultry, and we distinguish this from secondary farm-to-farm transmission. Our use of separate ecological niche models for wild and domestic outbreaks reflects not only the distinction between primary spillover and secondary transmission, but also the fundamentally different ecological processes, surveillance systems, and management implications that shape outbreaks in these two groups. We will clarify this choice in the revised manuscript when introducing the separate models. Furthermore, on line 83, we will add “as these two groups are influenced by different ecological processes, surveillance biases, and management contexts”.

      I also queried the importance of (dead-end) mammalian infections to a model of the HPAI transmission risk, to which the authors responded: "While spillover events of HPAI into mammals have been documented, these detections are generally considered dead-end infections and do not currently represent sustained transmission chains. As such, they fall outside the scope of our study, which focuses on avian hosts and models ecological suitability for outbreaks in wild and domestic birds." I would argue that any infections, whether they are in dead-end or competent hosts, represent the presence of environmental conditions to support transmission so are certainly relevant to a niche model and therefore within scope. It is certainly understandable if the authors have not been able to access data of mammalian infections, but it is an oversight to dismiss these infections as irrelevant.

      We understand the Reviewer’s point, but our study was designed to model HPAI occurrence in avian hosts only. We therefore restricted our analysis to wild birds and domestic poultry, which represent the primary hosts for HPAI circulation and the focus of surveillance and control measures. While mammalian detections have been reported, they are outside the scope of this work.

      Correlative ecological niche models, including BRTs, learn relationships between occurrence data and covariate data to make predictions, irrespective of correlations between covariates. I am not convinced that the authors can make any "interpretation" (line 298) that the covariates that are most informative to their models have any "influence" (line 282) on their response variable. Indeed, the observation that "land-use and climatic predictors do not play an important role in the niche ecological models" (line 286), while "intensive chicken population density emerges as a significant predictor" (line 282) begs the question: from an operational perspective, is the best (e.g., most interpretable and quickest to generate) model of HPAI risk a map of poultry farming intensity?

      We agree that poultry density may partly reflect reporting bias, but we also assumed it a meaningful predictor of HPAI risk. Its importance in our models is therefore expected. Importantly, our BRT framework does more than reproduce poultry distribution: it captures non-linear relationships and interactions with other covariates, allowing a more nuanced characterisation of risk than a simple poultry density map. Note also that we distinguished in our models intensive and extensive chicken poultry density and duck density. Therefore, it is not a “map of poultry farming intensity”. 

      At line 282, we used the word “influence” while fully recognising that correlative models cannot establish causality. Indeed, in our analyses, “relative influence” refers to the importance metric produced by the BRT algorithm (Ridgeway, 2020), which measures correlative associations between environmental factors and outbreak occurrences. These scores are interpreted in light of the broader scientific literature, therefore our interpretations build on both our results and existing evidence, rather than on our models alone. However, in the next version of the paper, we will revise the sentence as: “underscoring the strong association of poultry farming practices with HPAI spread (Dhingra et al., 2016)”. 

      I have more significant concerns about the authors' treatment of sampling bias: "We agree with the Reviewer's comment that poultry density could have potentially been considered to guide the sampling effort of the pseudo-absences to consider when training domestic bird models. We however prefer to keep using a human population density layer as a proxy for surveillance bias to define the relative probability to sample pseudo-absence points in the different pixels of the background area considered when training our ecological niche models. Indeed, given that poultry density is precisely one of the predictors that we aim to test, considering this environmental layer for defining the relative probability to sample pseudo-absences would introduce a certain level of circularity in our analytical procedure, e.g. by artificially increasing to influence of that particular variable in our models." The authors have elected to ignore a fundamental feature of distribution modelling with occurrence-only data: if we include a source of sampling bias as a covariate and do not include it when we sample background data, then that covariate would appear to be correlated with presence. They acknowledge this later in their response to my review: "...assuming a sampling bias correlated with poultry density would result in reducing its effect as a risk factor." In other words, the apparent predictive capacity of poultry density is a function of how the authors have constructed the sampling bias for their models. A reader of the manuscript can reasonably ask the question: to what degree are is the model a model of HPAI transmission risk, and to what degree is the model a model of the observation process? The sentence at lines 474-477 is a helpful addition, however the preceding sentence, "Another approach to sampling pseudo-absences would have been to distribute them according to the density of domestic poultry," (line 474) is included without acknowledgement of the flow-on consequence to one of the key findings of the manuscript, that "...intensive chicken population density emerges as a significant predictor..." (line 282). The additional context on the EMPRES-i dataset at line 475-476 ("the locations of outbreaks ... are often georeferenced using place name nomenclatures") is in conflict with the description of the dataset at line 407 ("precise location coordinates"). Ultimately, the choices that the authors have made are entirely defensible through a clear, concise description of model features and assumptions, and precise language to guide the reader through interpretation of results. I am not satisfied that this is provided in the revised manuscript.

      We thank the Reviewer for this important point. To address it, we compared model predictive performance and covariate relative influences obtained when pseudo-absences were weighted by poultry density versus human population density (Author response table 1). The results show that differences between the two approaches are marginal, both in predictive performance (ΔAUC ranging from -0.013 to +0.002) and in the ranking of key predictors (see below Author response images 1 and 2). For instance, intensive chicken density consistently emerged as an important predictor regardless of the bias layer used.

      Note: the comparison was conducted using a simplified BRT configuration for computational efficiency (fewer trees, fixed 5-fold random cross-validation, and standardised parameters). Therefore, absolute values of AUC and variable importance may differ slightly from those in the manuscript, but the relative ranking of predictors and the overall conclusions remain consistent.

      Given these small differences, we retained the approach using human population density. We agree that poultry density partly reflects surveillance bias as well as true epidemiological risk, and we will clarify this in the revised manuscript by noting that the predictive role of poultry density reflects both biological processes and surveillance systems. Furthermore, on line 289, we will add “We note, however, that intensive poultry density may reflect both surveillance intensity and epidemiological risk, and its predictive role in our models should be interpreted in light of both processes”.

      Author response table 1.

      Comparison of model predictive performances (AUC) between pseudo-absence sampling were weighted by poultry density and by human population density across host groups, virus types, and time periods. Differences in AUC values are shown as the value for poultry-weighted minus human-weighted pseudo-absences.

      Author response image 1.

      Comparison of variable relative influence (%) between models trained with pseudo-absences weighted by poultry density (red) and human population density (blue) for domestic bird outbreaks. Results are shown for four datasets: H5N1 (<2020), H5N1 (>2020), H5Nx (<2020), and H5Nx (>2020).

      Author response image 2.

      Comparison of variable relative influence (%) between models trained with pseudo-absences weighted by poultry density (red) and human population density (blue) for wild bird outbreaks. Results are shown for three datasets: H5N1 (>2020), H5Nx (<2020), and H5Nx (>2020).

      The authors have slightly misunderstood my comment on "extrapolation": I referred to "environmental extrapolation" in my review without being particularly explicit about my meaning. By "environmental extrapolation", I meant to ask whether the models were predicting to environments that are outside the extent of environments included in the occurrence data used in the manuscript. The authors appear to have understood this to be a comment on geographic extrapolation, or predicting to areas outside the geographic extent included in occurrence data, e.g.: "For H5Nx post-2020, areas of high predicted ecological suitability, such as Brazil, Bolivia, the Caribbean islands, and Jilin province in China, likely result from extrapolations, as these regions reported few or no outbreaks in the training data" (lines 195-197). Is the model extrapolating in environmental space in these regions? This is unclear. I do not suggest that the authors should carry out further analysis, but the multivariate environmental similarly surface (MESS; see Elith et al., 2010) is a useful tool to visualise environmental extrapolation and aid model interpretation.

      On the subject of "extrapolation", I am also concerned by the additions at lines 362-370: "...our models extrapolate environmental suitability for H5Nx in wild birds in areas where few or no outbreaks have been reported. This discrepancy may be explained by limited surveillance or underreporting in those regions." The "discrepancy" cited here is a feature of the input dataset, a function of the observation distribution that should be captured in pseudo-absence data. The authors state that Kazakhstan and Central Asia are areas of interest, and that the environments in this region are outside the extent of environments captured in the occurrence dataset, although it is unclear whether "extrapolation" is informed by a quantitative tool like a MESS or judged by some other qualitative test. The authors then cite Australia as an example of a region with some predicted suitability but no HPAI outbreaks to date, however this discussion point is not linked to the idea that the presence of environmental conditions to support transmission need not imply the occurrence of transmission (as in the addition, "...spatial isolation may imply a lower risk of actual occurrences..." at line 214). Ultimately, the authors have not added any clear comment on model uncertainty (e.g., variation between replicated BRTs) as I suggested might be helpful to support their description of model predictions.

      Many thanks for the clarification. Indeed, we interpreted your previous comments in terms of geographic extrapolations. We thank the Reviewer for these observations. We will adjust the wording to further clarify that predictions of ecological suitability in areas with few or no reported outbreaks (e.g., Central Asia, Australia) are not model errors but expected extrapolations, since ecological suitability does not imply confirmed transmission (for instance, on Line 362: “our models extrapolate environmental suitability” will be changed to “Interestingly, our models extrapolate geographical”). These predictions indicate potential environments favorable to circulation if the virus were introduced.

      In our study, model uncertainty is formally assessed when comparing the predictive performances of our models (Fig. S3, Table S1), the relative influence (Table S3) and response curves (Fig. 2) associated with each environmental factor (Table S2). All the results confirming a good converge between these replicates. Finally, we indeed did not use a quantitative tool such as a MESS to assess extrapolation but did rely on qualitative interpretation of model outputs.

      All of my criticisms are, of course, applied with the understanding that niche modelling is imperfect for a disease like HPAI, and that data may be biased/incomplete, etc.: these caveats are common across the niche modelling literature. However, if language around the transmission cycle, the niche, and the interpretation of any of the models is imprecise, which I find it to be in the revised manuscript, it undermines all of the science that is presented in this work.

      We respectfully disagree with this comment. The scope of our study and the methods employed are clearly defined in the manuscript, and the limitations of ecological niche modelling in this context are explicitly acknowledged in the Discussion section. While we appreciate the Reviewer’s concern, the comment does not provide specific examples of unclear or imprecise language regarding the transmission cycle, niche, or interpretation of the models. Without such examples, it is difficult to identify further revisions that would improve clarity.

      Reviewer #2 (Public review):

      The geographic range of highly pathogenic avian influenza cases changed substantially around the period 2020, and there is much interest in understanding why. Since 2020 the pathogen irrupted in the Americas and the distribution in Asia changed dramatically. This study aimed to determine which spatial factors (environmental, agronomic and socio-economic) explain the change in numbers and locations of cases reported since 2020 (2020--2023). That's a causal question which they address by applying correlative environmental niche modelling (ENM) approach to the avian influenza case data before (2015--2020) and after 2020 (2020--2023) and separately for confirmed cases in wild and domestic birds. To address their questions they compare the outputs of the respective models, and those of the first global model of the HPAI niche published by Dhingra et al 2016.

      We do not agree with this comment. In the manuscript, it is well established that we are quantitatively assessing factors that are associated with occurrences data before and after 2020. We do not claim to determine the causality. One sentence of the Introduction section (lines 75-76) could be confusing, so we intend to modify it in the final revision of our manuscript. 

      ENM is a correlative approach useful for extrapolating understandings based on sparse geographically referenced observational data over un- or under-sampled areas with similar environmental characteristics in the form of a continuous map. In this case, because the selected covariates about land cover, use, population and environment are broadly available over the entire world, modelled associations between the response and those covariates can be projected (predicted) back to space in the form of a continuous map of the HPAI niche for the entire world.

      We fully agree with this assessment of ENM approaches.

      Strengths:

      The authors are clear about expected bias in the detection of cases, such geographic variation in surveillance effort (testing of symptomatic or dead wildlife, testing domestic flocks) and in general more detections near areas of higher human population density (because if a tree falls in a forest and there is no-one there, etc), and take steps to ameliorate those. The authors use boosted regression trees to implement the ENM, which typically feature among the best performing models for this application (also known as habitat suitability models). They ran replicate sets of the analysis for each of their model targets (wild/domestic x pathogen variant), which can help produce stable predictions. Their code and data is provided, though I did not verify that the work was reproducible.

      The paper can be read as a partial update to the first global model of H5Nx transmission by Dhingra and others published in 2016 and explicitly follows many methodological elements. Because they use the same covariate sets as used by Dhingra et al 2016 (including the comparisons of the performance of the sets in spatial cross-validation) and for both time periods of interest in the current work, comparison of model outputs is possible. The authors further facilitate those comparisons with clear graphics and supplementary analyses and presentation. The models can also be explored interactively at a weblink provided in text, though it would be good to see the model training data there too.

      The authors' comparison of ENM model outputs generated from the distinct HPAI case datasets is interesting and worthwhile, though for me, only as a response to differently framed research questions.

      Weaknesses:

      This well-presented and technically well-executed paper has one major weakness to my mind. I don't believe that ENM models were an appropriate tool to address their stated goal, which was to identify the factors that "explain" changing HPAI epidemiology.

      Here is how I understand and unpack that weakness:

      (1) Because of their fundamentally correlative nature, ENMs are not a strong candidate for exploring or inferring causal relationships.

      (2) Generating ENMs for a species whose distribution is undergoing broad scale range change is complicated and requires particular caution and nuance in interpretation (e.g., Elith et al, 2010, an important general assumption of environmental niche models is that the target species is at some kind of distributional equilibrium (at time scales relevant to the model application). In practice that means the species has had an opportunity to reach all suitable habitats and therefore its absence from some can be interpreted as either unfavourable environment or interactions with other species). Here data sets for the response (N5H1 or N5Hx case data in domestic or wild birds ) were divided into two periods; 2015--2020, and 2020--2023 based on the rationale that the geographic locations and host-species profile of cases detected in the latter period was suggestive of changed epidemiology. In comparing outputs from multiple ENMs for the same target from distinct time periods the authors are expertly working in, or even dancing around, what is a known grey area, and they need to make the necessary assumptions and caveats obvious to readers.

      We thank the Reviewer for this observation. First, we constrained pseudo-absence sampling to countries and regions where outbreaks had been reported, reducing the risk of interpreting non-affected areas as environmentally unsuitable. Second, we deliberately split the outbreak data into two periods (2015-2020 and 2020-2023) because we do not assume a single stable equilibrium across the full study timeframe. This division reflects known epidemiological changes around 2020 and allows each period to be modeled independently. Within each period, ENM outputs are interpreted as associations between outbreaks and covariates, not as equilibrium distributions. Finally, by testing prediction across periods, we assessed both niche stability and potential niche shifts. These clarifications will be added to the manuscript to make our assumptions and limitations explicit.

      Line 66, we will add: “Ecological niche model outputs for range-shifting pathogens must therefore be interpreted with caution (Elith et al., 2010). Despite this limitation, correlative ecological niche models  remain useful for identifying broad-scale associations and potential shifts in distribution. To account for this, we analysed two distinct time periods (2015-2020 and 2020-2023).”

      Line 123, we will revise “These findings underscore the ability of pre-2020 models in forecasting the recent geographic distribution of ecological suitability for H5Nx and H5N1 occurrences” to “These results suggest that pre-2020 models captured broad patterns of suitability for H5Nx and H5N1 outbreaks, while post-2020 models provided a closer fit to the more recent epidemiological situation”.

      (3) To generate global prediction maps via ENM, only variables that exist at appropriate resolution over the desired area can be supplied as covariates. What processes could influence changing epidemiology of a pathogen and are their covariates that represent them? Introduction to a new geographic area (continent) with naive population, immunity in previously exposed populations, control measures to limit spread such as vaccination or destruction of vulnerable populations or flocks? Might those control measures be more or less likely depending on the country as a function of its resources and governance? There aren't globally available datasets that speak to those factors, so the question is not why were they omitted but rather was the authors decision to choose ENMs given their question justified? How valuable are insights based on patterns of correlation change when considering different temporal sets of HPAI cases in relation to a common and somewhat anachronistic set of covariates?

      We agree that the ecological niche models trained in our study are limited to environmental and host factors, as described in the Methods section with the selection of predictors. While such models cannot capture causality or represent processes such as immunity, control measures, or governance, they remain a useful tool for identifying broad associations between outbreak occurrence and environmental context. Our study cannot infer the full mechanisms driving changes in HPAI epidemiology, but it does provide a globally consistent framework to examine how associations with available covariates vary across time periods.

      (4) In general the study is somewhat incoherent with respect to time. Though the case data come from different time periods, each response dataset was modelled separately using exactly the same covariate dataset that predated both sets. That decision should be understood as a strong assumption on the part of the authors that conditions the interpretation: the world (as represented by the covariate set) is immutable, so the model has to return different correlative associations between the case data and the covariates to explain the new data. While the world represented by the selected covariates \*may\* be relatively stable (could be statistically confirmed), what about the world not represented by the covariates (see point 3)?

      We used the same covariate layers for both periods, which indeed assumes that these environmental and host factors are relatively stable at the global scale over the short timeframe considered. We believe this assumption is reasonable, as poultry density, land cover, and climate baselines do not change drastically between 2015 and 2023 at the resolution of our analysis. We agree, however, that unmeasured processes such as control measures, immunity, or governance may have changed during this time and are not captured by our covariates.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      - Line 400-401: "over the 2003-2016 periods" has an extra "s"; "two host species" (with reference to wild and domestic birds) would be more precise as "two host groups".

      - Remove comma line 404

      Many thanks for these comments, we have modified the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      Most of my work this round is encapsulated in the public part of the review.

      The authors responded positively to the review efforts from the previous round, but I was underwhelmed with the changes to the text that resulted. Particularly in regard to limiting assumptions - the way that they augmented the text to refer to limitations raised in review downplayed the importance of the assumptions they've made. So they acknowledge the significance of the limitation in their rejoinder, but in the amended text merely note the limitation without giving any sense of what it means for their interpretation of the findings of this study.

      The abstract and findings are essentially unchanged from the previous draft.

      I still feel the near causal statements of interpretation about the covariates are concerning. These models really are not a good candidate for supporting the inference that they are making and there seem to be very strong arguments in favour of adding covariates that are not globally available.

      We never claimed causal interpretation, and we have consistently framed our analyses in terms of associations rather than mechanisms. We acknowledge that one phrasing in the research questions (“Which factors can explain…”) could be misinterpreted, and we are correcting this in the revised version to read “Which factors are associated with…”. Our approach follows standard ecological niche modelling practice, which identifies statistical associations between occurrence data and covariates. As noted in the Discussion section, these associations should not be interpreted as direct causal mechanisms. Finally, all interpretive points in the manuscript are supported by published literature, and we consider this framing both appropriate and consistent with best practice in ecological niche modelling (ENM) studies.

      We assessed predictor contributions using the “relative influence” metric, the terminology reported by the R package “gbm” (Ridgeway, 2020). This metric quantifies the contribution of each variable to model fit across all trees, rescaled to sum to 100%, and should be interpreted as an association rather than a causal effect.

      L65-66 The general difficulty of interpreting ENM output with range-shifting species should be cited here to alert readers that they should not blithely attempt what follows at home.

      I believe that their analysis is interesting and technically very well executed, so it has been a disappointment and hard work to write this assessment. My rough-cut last paragraph of a reframed intro would go something like - there are many reasons in the literature not to do what we are about to do, but here's why we think it can be instructive and informative, within certain guardrails.

      To acknowledge this comment and the previous one, we revised lines 65-66 to: “However, recent outbreaks raise questions about whether earlier ecological niche models still accurately predict the current distribution of areas ecologically suitable for the local circulation of HPAI H5 viruses. Ecological niche model outputs for range-shifting pathogens must therefore be interpreted with caution (Elith et al., 2010). Despite this limitation, correlative ecological niche models  remain useful for identifying broad-scale associations and potential shifts in distribution.”

      We respectfully disagree with the Reviewer’s statement that “there are many reasons in the literature not to do what we are about to do”. All modeling approaches, including mechanistic ones, have limitations, and the literature is clear on both the strengths and constraints of ecological niche models. Our manuscript openly acknowledges these limits and frames our findings accordingly. We therefore believe that our use of an ENM approach is justified and contributes valuable insights within these well-defined boundaries.

      Reference: Ridgeway, G. (2007). Generalized Boosted Models: A guide to the gbm package. Update, 1(1), 2007.

    1. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the mechanisms underlying the virulence of OMVs using a Drosophila model. They reveal a complex interplay between host defenses and OMV pathogenicity. Although the study enhances our understanding of Drosophila innate immunity, additional evidence is needed to strengthen the conclusions.

      Strengths:

      (1) In Figure 1, Toll pathway mutants infected with OMVs displayed three distinct phenotypic outcomes: mildly enhanced resistance to OMV infection, a response similar to that of the control, or increased susceptibility. Therefore, in addition to Imd and Kenny mutants from the Imd pathway, further mutants, such as Relish and PGRP-LC, should be examined to assess whether the Imd pathway is involved in host defense against OMVs.

      (2) Plasmatocytes clear particles via phagocytosis or endocytosis. However, flies lacking all hemocytes showed increased resistance to OMV challenge, raising the question of whether hemocytes actually aid the pathogen. To explore this hypothesis, the uptake of fluorescently tagged OMVs should be examined.

      (3) Hayan cleaves PPO into active PO. However, Hayan and PPO mutants exhibit opposite phenotypes upon OMV injection, raising the question of whether OMV-induced pathogenesis is linked to melanization.

      (4) Puckered mRNA levels were used as a read-out for JNK pathway activity. A transient induction of the JNK pathway was observed in head and thorax tissues. It would be beneficial if the authors could directly examine JNK activation in neuronal cells using immunostaining for pJNK.

      (5) In Figure 4B, the kayak was knocked down using the pan-neuronal driver elav-Gal4. To confirm the specificity and validity of this observation, the experiment should be repeated using another neural-specific driver.

      Weaknesses:

      It is unclear how many Serratia marcescens cells a 69 nL injection of 0.1 ng/nL OMVs corresponds to.

    2. Author response:

      We thank the reviewers and editors for the careful evaluation of our manuscript. Below, we provide a first refutation of some of the concerns expressed by reviewers.

      Both reviewer 1 &3 underscore the importance of controlling for genetic backgrounds. This is actually an issue only for a limited part of the study and this criticism should not apply to major findings of this study, with some exceptions, as detailed below.

      It is important to note that we have identified ourselves several of the mutant lines we have been using. For instance, key and MyD88 mutant alleles have been identified in the Exelixis transposon insertion collection that we have screened in collaboration with this firm (e.g., [3, 4, 5]). This resource has been generated in a isogenized w [A5001] strain[6], which we are using as matched control for these mutants (Figs 1B,D). Of note, while they share a common genetic background, the phenotypes of key and MyD88 are opposite in terms of sensitivity to OMV challenge. The imd<sup>shadok</sup> null allele had been identified during our chemical mutagenesis screen with EMS in a yw cn bw background [5, 7, 8, 9], which was used as a control (FigS1A).

      With respect to Hayan (Fig. 2C, Fig. S2C) and eater (Fig. S2A-B) mutants[10, 11, 12], we find a similarly strong phenotype with two independent mutants in distinct genetic backgrounds (actually three for Hayan, as we have not included in our original manuscript the Hayan<sup>SK3</sup>allele generated in the Lemaitre laboratory in which OMVs displayed also impaired virulence). We have shown that the Hayan mutants do display the expected phenotype in terms of PPO cleavage (Fig. S2D). Please, also note that in Fig. S2C the two mutant alleles are tested in the same experiment: even though there is some variation between the w<sup>1118</sup> and the w[A5001] strains, the two mutants behave in a remarkably similar manner. As regards the role of the cellular response, we note that we obtained results similar to those obtained with eater mutants using genetic ablation of hemocytes (Fig. 2A) or by saturating the phagocytosis apparatus (Fig. 2B), a confirmation by two totally-independent approaches.

      Of note, the observed eater and Hayan phenotypes are strong and not relatively small and thus unlikely to be due to the genetic background.

      The PPO mutants have been isogenized in the w<sup>1118</sup> by the lab of Bruno Lemaitre[13, 14] and are also validated biochemically in Fig. S2D. These mutants have been extensively tested in the Lemaitre laboratory[13, 14, 15].

      With respect to RNAi silencing driven ubiquitously or in specific tissues using the UAS-Gal4 system, we have mostly used transgenes from the Trip collection and have used as a control the mCherry RNAi provided by this resource[16]. As the RNAi transgenes have been generated in the same genetic background, it follows that independently of the driver used, the genetic background used in mCherry and genes-of-interest (Duox, Nox, Jafrac2) silenced flies is controlled for (Fig. 3D,E).

      For UAS-Gal4-mediated overexpression of fly superoxide dismutase genes, we have used SOD1 and SOD2 transgenes that have both been generated by the same laboratory (Phillips laboratory, University of Guelph) presumably in the same genetic background. Using two distinct drivers we find a strongly enhanced susceptibility phenotype when using UAS-SOD2 but not UAS-SOD1 transgenes (Fig. 3F, Fig. 4E). Importantly, the former is associated with mitochondria whereas the other is expressed in the endoplasmic reticulum: we independently confirm this phenotype using the mitoTempo mitochondrial ROS inhibitor.

      We shall thus address the criticism with NOS mutants, where genetic background control is indeed critical and for the UAS-kay RNAi line using a Trip line and its associated mCherry RNAi control transgene.

      With respect to the Toll pathway mutants, we agree that some of the variability of the phenotypes may be due to the genetic background, especially as regards tube and pelle. The SPE and grass mutants have been retrieved in a screen performed by the group of Jean-Marc Reichhart in our Research Unit. They thus have been generated in the same genetic background, yet grass displays a mildly decreased virulence of injected OMVs whereas SPE mutants display an opposite phenotype (compare Fig. S1E to S1I; the survival experiment shave been performed in the same set of experiments and have been separated for clarity). We do not intend to analyze further the mutants of the Toll pathway as our data suggest that the canonical Toll pathway, likely activated through psh (Fig. S1F) appears to be activated to detectable levels too late by comparison with the time course of OMV pathogenicity. In our opinion, the contribution of the Toll pathway in the host defense against OMV pathogenicity is minor, albeit we acknowledge that some of the findings, especially with SPE are puzzling.

      With respect to the IMD pathway, we shall test also PGRP-LC and Relish mutants, as suggested by reviewers 2&3.

      Reviewer 2 query: “It is unclear how many Serratia marcescens cells a 69 nL injection of 0.1 ng/nL OMVs corresponds to.”

      OMVs were purified from 600 mL of SmDb11 cultures grown to an average OD<sub>600</sub> of 2.0. Based on a cell density of 0.8 × 10<sup>8</sup> cells/mL per OD unit, this corresponds to approximately 9.6 × 10<sup>10</sup> total bacterial cells.

      Each OMV preparation was concentrated into a final volume of 400 µL, resulting in a concentration factor of ~1500× relative to the original culture. Therefore, an injection dose of 69 nL of OMVs is equivalent to 0.1 mL of the starting bacterial culture, which corresponds to:

      0.2 OD units

      Approximately 1.6 × 10<sup>7</sup> bacterial cells

      It is likely that such high concentrations occur only toward the end of the infection, if OMVs are produced at the same rate in the host and in vitro.

      With respect to other Reviewer 2 queries, we shall give a try at labeling OMVs with the FM4-64 lipophilic dye and examining whether they are taken up by hemocytes. However, an issue may arise with potentially high background, which has been encountered in cell culture. Of note, OMVs are known to attack cultured human THP1 cells, a monocyte cell line [17].Of note, determining whether OMVs are taken up by hemocytes may only be a starting point to understand how they promote the pathogenicity of OMVs. This question constitutes the topic of a full study that we are currently unable to undertake.

      We shall also test whether we can document phospho-JNK expression in neural tissues.

      Finally, we shall also confirm the data obtained with two elav-Gal4 drivers (including an inducible one) with the nsyb-Gal4 driver line.

      References

      (1) Xu R, et al. The Toll pathway mediates Drosophila resilience to Aspergillus mycotoxins through specific Bomanins. EMBO Rep 24, e56036 (2023).

      (2) Huang J, et al. A Toll pathway effector protects Drosophila specifically from distinct toxins secreted by a fungus or a bacterium. Proc Natl Acad Sci U S A 120, e2205140120 (2023).

      (3) Gobert V, et al. Dual Activation of the Drosophila Toll Pathway by Two Pattern Recognition Receptors. Science 302, 2126-2130 (2003).

      (4) Gottar M, et al. Dual Detection of Fungal Infections in Drosophila via Recognition of Glucans and Sensing of Virulence Factors. Cell 127, 1425-1437 (2006).

      (5) Gottar M, et al. The Drosophila immune response against Gram-negative bacteria is mediated by a peptidoglycan recognition protein. Nature 416, 640-644 (2002).

      (6) Thibault ST, et al. A complementary transposon tool kit for Drosophila melanogaster using P and piggyBac. Nat Genet 36, 283-287 (2004).

      (7) Rutschmann S, Jung AC, Hetru C, Reichhart J-M, Hoffmann  JA, Ferrandon D. The Rel protein DIF mediates the antifungal, but not the antibacterial,  response in Drosophila. Immunity 12, 569-580 (2000).

      (8) Rutschmann S, Jung AC, Rui Z, Silverman N, Hoffmann JA, Ferrandon D. Role of Drosophila IKKg in a Toll-independent antibacterial immune response. Nat Immunology 1, 342-347 (2000).

      (9) Jung A, Criqui M-C, Rutschmann S, Hoffmann J-A, Ferrandon D. A microfluorometer assay to measure the expression of ß-galactosidase and GFP reporter genes in single Drosophila flies. Biotechniques 30, 594- 601 (2001).

      (10) Nam HJ, Jang IH, You H, Lee KA, Lee WJ. Genetic evidence of a redox-dependent systemic wound response via Hayan protease-phenoloxidase system in Drosophila. Embo J 31, 1253-1265 (2012).

      (11) Kocks C, et al. Eater, a transmembrane protein mediating phagocytosis of bacterial pathogens in Drosophila. Cell 123, 335-346 (2005).

      (12) Bretscher AJ, et al. The Nimrod transmembrane receptor Eater is required for hemocyte attachment to the sessile compartment in Drosophila melanogaster. Biology open 4, 355-363 (2015).

      (13) Binggeli O, Neyen C, Poidevin M, Lemaitre B. Prophenoloxidase activation is required for survival to microbial infections in Drosophila. PLoS Pathog 10, e1004067 (2014).

      (14) Dudzic JP, Kondo S, Ueda R, Bergman CM, Lemaitre B. Drosophila innate immunity: regional and functional specialization of prophenoloxidases. BMC Biol 13, 81 (2015).

      (15) Dudzic JP, Hanson MA, Iatsenko I, Kondo S, Lemaitre B. More Than Black or White: Melanization and Toll Share Regulatory Serine Proteases in Drosophila. Cell reports 27, 1050-1061 e1053 (2019).

      (16) Perkins LA, et al. The Transgenic RNAi Project at Harvard Medical School: Resources and Validation. Genetics 201, 843-852 (2015).

      (17) Goman A, et al. Uncovering a new family of conserved virulence factors that promote the production of host-damaging outer membrane vesicles in gram-negative bacteria. J Extracell Vesicles 14, e270032 (2025).

    1. Osvětlení LED Připevňuje se přímo na konstrukci stanu. Osvětlení je k dispozici v provedeních s 1, 2 a 4 lampami.

      use previous correction

    2. Osvětlení LED Připevňuje se přímo na konstrukci stanu. Osvětlení je k dispozici v provedeních s 1, 2 a 4 lampami.

      LED osvětlení Připevňuje se přímo na konstrukci, dostupné v provedení s 1, 2 a 4 LED zdroji.

    1. Osvětlení LED pro Jehlan Připevňuje se přímo na konstrukci. Dostupné v 1-, 2- a 4-halogenové verzi. Bílé světlo. Napájecí kabel o délce 5 m.

      use previous correction

    2. Hvězdový stan od MITKO, tedy plná bezpečnost   To, co skutečně odlišuje hvězdový stan od MITKO, je bezpečnost konstrukce. Stany Jehlan jsou navrženy pro použití v náročných venkovních podmínkách. Jejich hliníkové stožáry o průměru až 76 mm jsou pevnou oporou, která v kombinaci s velkými ocelovými základy zajišťuje stabilitu celé konstrukce. Při správném ukotvení stan odolává poryvům větru o rychlosti až 100 km/h, což z něj činí nejbezpečnější volbu pro venkovní akce bez ohledu na počasí. Není to jen efektní prvek programu, ale také promyšlená investice do komfortu a klidu organizátorů. Potvrzením kvality je 2letá záruka a 10letý pozáruční servis, díky kterému máte jistotu, že i po letech můžete počítat s technickou podporou a dostupností náhradních dílů. Navíc v MITKO můžete počítat s bezplatným grafickým návrhem a plnou podporou obchodníka v každé fázi procesu, od prvního dotazu po realizaci. To je záruka, že vše proběhne hladce a hotový stan bude přizpůsoben jak vašim potřebám, tak vizuální identitě značky.   Hvězdový stan – efektní prostor, který je vidět zdaleka   Pokud chcete být dobře viditelní a mít solidní pracovní prostor uvnitř, volba je jednoduchá – hvězdový stan od MITKO. Unikátní konstrukce s centrálním stožárem a rozložitými rameny přitahuje pozornost, ale stejně důležité je, že poskytuje až 227 m² zastřešení bez bočních podpěr. V praxi to znamená místo na lehátka, pódium, lavice – a stále spoustu volnosti. Skvěle se hodí na rozsáhlé plochy, náměstí a všude tam, kde záleží na prvním dojmu.   Hvězdový stan, který pracuje pro vaše branding   S hvězdovým stanem od MITKO se snadno odlišíte. Můžete na něj natisknout velké logo nebo grafiku – díky výšce přes 4 metry budou viditelné i z dálky. Umístěte vedle reklamní vlajky, které ještě lépe přitáhnou pozornost a pomohou návštěvníkům najít váš stánek. Chcete, aby se zastavili na déle? Přidejte lehátka s vlastním potiskem a reklamní slunečníky – celé to bude vypadat souvisle a profesionálně, bez nutnosti shánět prvky z různých zdrojů.   Flexibilita konstrukce hvězdového stanu (Jehlan) – vybíráte verzi, která se hodí k události   Nemusíte hádat, zda se hvězdový stan osvědčí na vaší akci. V MITKO nabízíme tři konfigurace Jehlana Base: s 1, 2 nebo 3 stožáry. Díky tomu si vybíráte konstrukci přesně podle potřeb události – od menších realizací po velké venkovní akce. Nejčastěji vybíraná verze s jedním stožárem je kompromisem mezi silným vizuálním efektem a efektivním provozem. Montáž trvá od 30 do 45 minut a vyžaduje pouze 2–3 osoby, v závislosti na velikosti stanu. V případě potřeby můžete konstrukci rozšířit o boční stěny, vstupní předsíňku nebo bezpečnostní sadu (kolíky, šňůry, kladivo) – všechny prvky jsou připraveny k okamžitému použití.   Hvězdový stan bez zprostředkovatelů   Místo řetězce subdodavatelů – jedno místo, plná kontrola. Každý hvězdový stan MITKO vzniká v Polsku. Sami jej šijeme, testujeme a odesíláme přímo k vám. Máte konkrétní termín? Realizujeme ho bez problémů – nic nemusí cestovat přes půl Evropy. Neobvyklé požadavky, např. stěny s oknem? U nás je to standard, nikoli „volitelná verze za 6 týdnů“. Pokud je potřeba úprava – nenarazíte na infolinku, ale mluvíte s lidmi, kteří tento stan skutečně vytvářejí.

      it´s already written several times above, is it necessary?

    3. Osvětlení LED pro Jehlan Připevňuje se přímo na konstrukci. Dostupné v 1-, 2- a 4-halogenové verzi. Bílé světlo. Napájecí kabel o délce 5 m.

      LED osvětlení Připevňuje se přímo na stožár, dostupné v provedení s 1, 2 a 4 LED zdroji. Napájecí kabel 5 m.

    1. TABLE 5 PROFICIENCY LEVEL AND TRANSLANGUAGING My high level of English proficiency and competence in English is a result of my instructor's use of Arabic in my English lessons. S/N Option Frequency Percent 1 Strongly Agree 77 48.4 2 Agree 31 19.5 3 Neutral 27 17 4 Disagree 15 9.4 5 Strongly Disagree 9 5.7

      This is the proficiency for English with translanguage

    1. __________________________________________________________________ /*<![CDATA[*/#mt-toc-container {display: none !important;}/*]]>*//*<![CDATA[*/ $(function() { if(!window['autoDefinitionList']){ window['autoDefinitionList'] = true; $('dl').find('dt').on('click', function() { $(this).next().toggle('350'); }); } });/*]]>*/ /*<![CDATA[*/window.addEventListener('load', function(){$('iframe').iFrameResize({warningTimeout:0, scrolling: 'omit'});})/*]]>*//*<![CDATA[*/ window.PageNum = "auto"; window.InitialOffset = "false"; window.PageName = "10.5: Stress"; /*]]>*/ /*<![CDATA[*/ //<!-- MathJax Config --> var front = window.PageNum.trim(); if(front=="auto"){ front = window.PageName.replace('\"', '\\\"').trim(); //front = "'..string.matchreplace(PageName,'\"','\\\"')..'".trim(); if(front.includes(":")){ front = front.split(":")[0].trim(); if(front.includes(".")){ front = front.split("."); front = front.map((int)=>int.includes("0")?parseInt(int,10):int).join("."); } front+="."; } else { front = ""; } } front = front.trim(); function loadMathJaxScript() { try { const script = document.createElement('script'); script.id = "mathjax-script"; script.src = "https://cdn.jsdelivr.net/npm/mathjax@4/tex-mml-svg.js"; script.type = "text/javascript"; script.defer = true; document.head.appendChild(script); } catch (err) { console.error(err); } } document.addEventListener('DOMContentLoaded', (e) => { loadMathJaxScript(); }); if (window.PageName !== 'Realtime MathJax'){ MathJax = { options: { ignoreHtmlClass: "tex2jax_ignore", processHtmlClass: "tex2jax_process", menuOptions: { settings: { zscale: "150%", zoom: "Double-Click", assistiveMml: true, // true to enable assitive MathML collapsible: false, // true to enable collapsible math }, }, }, output: { scale: 0.85, mtextInheritFont: false, displayOverflow: "linebreak", linebreaks: { width: "100%", }, }, startup: { pageReady: () => { if (window.activateBeeLine) { window.activateBeeLine(); } return MathJax.startup.defaultPageReady(); }, }, chtml: { matchFontHeight: true, }, tex: { tags: "all", tagformat: { number: (n) => { if (window.InitialOffset) { const offset = Number(window.InitialOffset); if(!offset) { return front + n; // If offset is falsy (nan, undefined, etc.) } const added = Number(n) + offset; return front + added; } else { return front + n; } }, }, macros: { eatSpaces: ['#1', 2, ['', ' ', '\\endSpaces']], PageIndex: ['{' + front.replace(/\./g, '{.}') + '\\eatSpaces#1 \\endSpaces}', 1], test: ["{" + front + "#1}", 1], mhchemrightleftharpoons: "{\\unicode{x21CC}\\,}", xrightleftharpoons: ['\\mhchemxrightleftharpoons[#1]{#2}', 2, ''] }, packages: { "[+]": [ "mhchem", "color", "cancel", "ams", "tagformat" ], }, }, loader: { '[tex]/mhchem': { ready() { const {MapHandler} = MathJax._.input.tex.MapHandler; const mhchem = MapHandler.getMap('mhchem-chars'); mhchem.lookup('mhchemrightarrow')._char = '\uE42D'; mhchem.lookup('mhchemleftarrow')._char = '\uE42C'; } }, load: [ "[tex]/mhchem", "[tex]/color", "[tex]/cancel", "[tex]/tagformat", ], }, }; }; //<!-- End MathJax Config -->/*]]>*/

      Exercise regularly. practice relaxation techniques, and get enough sleep.

    1. Reviewer #2 (Public review):

      Summary:

      Using a gerbil model, the authors tested the hypothesis that loss of synapses between sensory hair cells and auditory nerve fibers (which may occur due to noise exposure or aging) affects behavioral discrimination of the rapid temporal fluctuations of sounds. In contrast to previous suggestions in the literature, their results do not support this hypothesis; young animals treated with a compound that reduces the number of synapses did not show impaired discrimination compared to controls. Additionally, their results from older animals showing impaired discrimination suggest that age-related changes aside from synaptopathy are responsible for the age-related decline in discrimination.

      Strengths:

      (1) The rationale and hypothesis are well-motivated and clearly presented.

      (2) The study was well conducted with strong methodology for the most part, and good experimental control. The combination of physiological and behavioral techniques is powerful and informative. Reducing synapse counts fairly directly using ouabain is a cleaner design than using noise exposure or age (as in other studies), since these latter modifiers have additional effects on auditory function.

      (3) The study may have a considerable impact on the field. The findings could have important implications for our understanding of cochlear synaptopathy, one of the most highly researched and potentially impactful developments in hearing science in the past fifteen years.

      Weaknesses:

      (1) I have concerns that the gerbils may not have been performing the behavioral task using temporal fine structure information.

      Human studies using the same task employed a filter center frequency that was (at least) 11 times the fundamental frequency (Marmel et al., 2015; Moore and Sek, 2009). Moore and Sek wrote: "the default (recommended) value of the centre frequency is 11F0." Here, the center frequency was only 4 or 8 times the fundamental frequency (4F0 or 8F0). Hence, relative to harmonic frequency, the harmonic spacing was considerably greater in the present study. However, gerbil auditory filters are thought to be broader than those in human. In the revised version of the manuscript, the authors provide modelling results suggesting that the excitation patterns were discriminable for the 4F0 conditions, but may not have been for the 8F0 conditions. These results provide some reassurance that the 8F0 discriminations were dependent on temporal cues, but the description of the model lacks detail. Also, the authors state that "thus, for these two conditions with harmonic number N of 8 the gerbils cannot rely on differences in the excitation patterns but must solve the task by comparing the temporal fine structure." This is too strong. Pulsed tone intensity difference limens (the reference used for establishing whether or not the excitation pattern cues were usable) may not be directly comparable to profile-analysis-like conditions, and it has been argued that frequency discrimination may be more sensitive to excitation pattern cues than predicted from a simple comparison to intensity difference limens (Micheyl et al. 2013, https://doi.org/10.1371/journal.pcbi.1003336).

      I'm also somewhat concerned that the masking noise used in the present study was too low in level to mask cochlear distortion products. Based on their excitation pattern modelling, the authors state (without citation) that "since the level of excitation produced by the pink noise is less than 30 dB below that produced by the complex tones, distortion products will be masked." The basis for this claim is not clear. In human, distortion products may be only ~20 dB below the levels of the primaries (referenced to an external sound masker / canceller, which is appropriate, assuming that the modelling reported in the present paper did not include middle-ear effects; see Norman-Haignere and McDermott, 2016, doi: 10.1016/j.neuroimage.2016.01.050). Oxenham et al. (2009, doi: 10.1121/1.3089220) provide further cautionary evidence on the potential use of distortion product cues when the background noise level is too low (in their case the relative level of the noise in the compromised condition was only a little below that used in the present study). The masking level used in the present study may have been sufficient, but it would be useful to have some further reassurance on this point.

      (2) The synapse reductions in the high ouabain and old groups were relatively small (mean of 19 synapses per hair cell compared to 23 in the young untreated group). In contrast, in some mouse models of the effects of noise exposure or age, a 50% reduction in synapses is observed, and in the human temporal bone study of Wu et al. (2021, https://doi.org/10.1523/JNEUROSCI.3238-20.2021) the age-related reduction in auditory nerve fibres was ~50% or greater for the highest age group across cochlear location. It could be simply that the synapse loss in the present study was too small to produce significant behavioral effects. Hence, although the authors provide evidence that in the gerbil model the age-related behavioral effects are not due to synaptopathy, this may not translate to other species (including human).

      (3) The study was not pre-registered, and there was no a priori power calculation, so there is less confidence in replicability than could have been the case. Only three old animals were used in the behavioral study, which raises concerns about the reliability of comparisons involving this group. Statistical analyses on very small samples can be unreliable due to problems of power, generalisability, and susceptibility to outliers.

    2. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #2 (Public review):

      Summary:

      Using a gerbil model, the authors tested the hypothesis that loss of synapses between sensory hair cells and auditory nerve fibers (which may occur due to noise exposure or aging) affects behavioral discrimination of the rapid temporal fluctuations of sounds. In contrast to previous suggestions in the literature, their results do not support this hypothesis; young animals treated with a compound that reduces the number of synapses did not show impaired discrimination compared to controls. Additionally, their results from older animals showing impaired discrimination suggest that age-related changes aside from synaptopathy are responsible for the age-related decline in discrimination.

      Strengths:

      (1) The rationale and hypothesis are well-motivated and clearly presented.

      (2) The study was well conducted with strong methodology for the most part, and good experimental control. The combination of physiological and behavioral techniques is powerful and informative. Reducing synapse counts fairly directly using ouabain is a cleaner design than using noise exposure or age (as in other studies), since these latter modifiers have additional effects on auditory function.

      (3) The study may have a considerable impact on the field. The findings could have important implications for our understanding of cochlear synaptopathy, one of the most highly researched and potentially impactful developments in hearing science in the past fifteen years.

      Weaknesses:

      (1) I have concerns that the gerbils may not have been performing the behavioral task using temporal fine structure information.

      Human studies using the same task employed a filter center frequency that was (at least) 11 times the fundamental frequency (Marmel et al., 2015; Moore and Sek, 2009). Moore and Sek wrote: "the default (recommended) value of the centre frequency is 11F0." Here, the center frequency was only 4 or 8 times the fundamental frequency (4F0 or 8F0). Hence, relative to harmonic frequency, the harmonic spacing was considerably greater in the present study. However, gerbil auditory filters are thought to be broader than those in human. In the revised version of the manuscript, the authors provide modelling results suggesting that the excitation patterns were discriminable for the 4F0 conditions, but may not have been for the 8F0 conditions. These results provide some reassurance that the 8F0 discriminations were dependent on temporal cues, but the description of the model lacks detail. Also, the authors state that "thus, for these two conditions with harmonic number N of 8 the gerbils cannot rely on differences in the excitation patterns but must solve the task by comparing the temporal fine structure." This is too strong. Pulsed tone intensity difference limens (the reference used for establishing whether or not the excitation pattern cues were usable) may not be directly comparable to profile-analysis-like conditions, and it has been argued that frequency discrimination may be more sensitive to excitation pattern cues than predicted from a simple comparison to intensity difference limens (Micheyl et al. 2013, https://doi.org/10.1371/journal.pcbi.1003336

      We can assume that our conclusions based on the excitation patterns are adequate when putting gerbil auditory filter data, frequency difference limens and intensity difference limens together into perspective. Kittel et al. (2002) observed an about factor 2 larger auditory-filter bandwidth in the gerbil than in humans reducing the number of independent frequency channels in the analysis of excitation patterns. The gerbil frequency-difference limen for pure tones being an indicator for the sensitivity to make use of excitation patterns is more than an order of magnitude larger than the corresponding human frequency difference limen (Klinge and Klump 2009, https://doi.org/10.1121/1.3021315). Finally, the gerbil intensity-difference limen of 2.8 dB observed for 1-kHz pure tones is considerably larger than the 0.75 dB observed for humans in the same study (Sinnott et al. 1992). Thus, taken together these lines of evidence indicate that our conclusions regarding the potential use of excitation patterns are not too strong.

      I'm also somewhat concerned that the masking noise used in the present study was too low in level to mask cochlear distortion products. Based on their excitation pattern modelling, the authors state (without citation) that "since the level of excitation produced by the pink noise is less than 30 dB below that produced by the complex tones, distortion products will be masked." The basis for this claim is not clear. In human, distortion products may be only ~20 dB below the levels of the primaries (referenced to an external sound masker / canceller, which is appropriate, assuming that the modelling reported in the present paper did not include middle-ear effects; see Norman-Haignere and McDermott, 2016, doi: 10.1016/j.neuroimage.2016.01.050). Oxenham et al. (2009, doi: 10.1121/1.3089220) provide further cautionary evidence on the potential use of distortion product cues when the background noise level is too low (in their case the relative level of the noise in the compromised condition was only a little below that used in the present study). The masking level used in the present study may have been sufficient, but it would be useful to have some further reassurance on this point.

      In the method section, we provide the citation for estimating the size of the distortion products and the estimated signal-to-noise ratio making the basis for our estimates clear.

      We consulted Oxenham et al. (2009, doi: 10.1121/1.3089220) who suggested that distortion products may have been used in human subjects. However, in Fig. 1 of their paper, they convincingly demonstrate that even for humans that have more narrow auditory filters than gerbils, spectral cues cannot be used to evaluate the frequency shift in harmonic complex tones. We are confident that the same limitation applies to gerbils that have wider auditory filters than humans and a lower ability to use spectral cues as indicated by their higher frequency-difference limens and intensity-difference limens compared to humans.

      (2) The synapse reductions in the high ouabain and old groups were relatively small (mean of 19 synapses per hair cell compared to 23 in the young untreated group). In contrast, in some mouse models of the effects of noise exposure or age, a 50% reduction in synapses is observed, and in the human temporal bone study of Wu et al. (2021, https://doi.org/10.1523/JNEUROSCI.3238-20.2021) the age-related reduction in auditory nerve fibres was ~50% or greater for the highest age group across cochlear location. It could be simply that the synapse loss in the present study was too small to produce significant behavioral effects. Hence, although the authors provide evidence that in the gerbil model the age-related behavioral effects are not due to synaptopathy, this may not translate to other species (including human).

      (3) The study was not pre-registered, and there was no a priori power calculation, so there is less confidence in replicability than could have been the case. Only three old animals were used in the behavioral study, which raises concerns about the reliability of comparisons involving this group.

      Reviewer #3 (Public review):

      This study is a part of the ongoing series of rigorous work from this group exploring neural coding deficits in the auditory nerve, and dissociating the effects of cochlear synaptopathy from other age-related deficits. They have previously shown no evidence of phase-locking deficits in the remaining auditory nerve fibers in quiet-aged gerbils. Here, they study the effects of aging on the perception and neural coding of temporal fine structure cues in the same Mongolian gerbil model.

      They measure TFS coding in the auditory nerve using the TFS1 task which uses a combination of harmonic and tone-shifted inharmonic tones which differ primarily in their TFS cues (and not the envelope). They then follow this up with a behavioral paradigm using the TFS1 task in these gerbils. They test young normal hearing gerbils, aged gerbils, and young gerbils with cochlear synaptopathy induced using the neurotoxin ouabain to mimic synapse losses seen with age.

      In the behavioral paradigm, they find that aging is associated with decreased performance compared to the young gerbils, whereas young gerbils with similar levels of synapse loss do not show these deficits. When looking at the auditory nerve responses, they find no differences in neural coding of TFS cues across any of the groups. However, aged gerbils show an increase in the representation of periodicity envelope cues (around f0) compared to young gerbils or those with induced synapse loss. The authors hence conclude that synapse loss by itself doesn't seem to be important for distinguishing TFS cues, and rather the behavioral deficits with age are likely having to do with the misrepresented envelope cues instead.

      The manuscript is well written, and the data presented are robust. Some of the points below will need to be considered while interpreting the results of the study, in its current form. These considerations are addressable if deemed necessary, with some additional analysis in future versions of the manuscript.

      Spontaneous rates - Figure S2 shows no differences in median spontaneous rates across groups. But taking the median glosses over some of the nuances there. Ouabain (in the Bourien study) famously affects low spont rates first, and at a higher degree than median or high spont rates. It seems to be the case (qualitatively) in figure S2 as well, with almost no units in the low spont region in the ouabain group, compared to the other groups. Looking at distributions within each spont rate category and comparing differences across the groups might reveal some of the underlying causes for these changes. Given that overall, the study reports that low-SR fibers had a higher ENV/TFS log-z-ratio, the distribution of these fibers across groups may reveal specific effects of TFS coding by group.

      [Update: The revised manuscript has addressed these issues]

      Threshold shifts - It is unclear from the current version if the older gerbils have changes in hearing thresholds, and whether those changes may be affecting behavioral thresholds. The behavioral stimuli appear to have been presented at a fixed sound level for both young and aged gerbils, similar to the single unit recordings. Hence, age-related differences in behavior may have been due to changes in relative sensation level. Approaches such as using hearing thresholds as covariates in the analysis will help explore if older gerbils still show behavioral deficits.

      [Update: The issue of threshold shifts with aging gerbils is still unresolved in my opinion. From the revised manuscript, it appears that aged gerbils have a 36dB shift in thresholds. While the revised manuscript provides convincing evidence that these threshold shifts do not affect the auditory nerve tuning properties, the behavioral paradigm was still presented at the same sound level for young and aged animals. But a potential 36 dB change in sensation level may affect behavioral results. The authors may consider adding thresholds as covariates in analyses or present any evidence that behavioral thresholds are plateaued along that 30dB range].

      Since we do not have behavioural detection thresholds from our individual animals, only CAP thresholds that represent the auditory-nerve data and cannot be translated to behavioural thresholds directly, we want to refrain from using these indirect measures as covariates in the present analysis. In addition, the study by Hamann et al. (2002, https://doi.org/10.1016/S0378-5955(02)00454-9) indicates that age-related behavioural threshold increases are smaller than threshold increases obtained from auditory brainstem response measurements. Finally, statistical analyses on very small samples can be unreliable due to problems of power, generalisability, and susceptibility to outliers.

      Moore and Sek (2009) in their paper on the TFS1 test pointed out that the effect of signal level on the TFS1 threshold in normal hearing human subjects was small when the signal-to-noise ratio between the broadband masking noise and the complex tone was kept constant. Furthermore, the masking noise will raise the thresholds of normal hearing gerbils and old gerbils with an audibility threshold increase to about the same signal-to-noise ratio. Thus, as long as the signal remains audible to the behaviourally tested gerbil which can be expected at an overall signal level of 68 dB SPL, we expect little effect of raised audibility thresholds on the TFS1 threshold. The lack of temporal processing deficits in the auditory-nerve fibers of old, mildly hearing impaired gerbils compared to those in normal hearing young adult gerbils further strengthens this argument.

      Task learning in aged gerbils - It is unclear if the aged gerbils really learn the task well in two of the three TFS1 test conditions. The d' of 1 which is usually used as the criterion for learning was not reached in even the easiest condition for aged gerbils in all but one condition for the aged gerbils (Fig. 5H) and in that condition, there doesn't seem to be any age-related deficits in behavioral performance (Fig. 6B). Hence dissociating the inability to learn the task from the inability to perceive TFS 1 cues in those animals becomes challenging.

      [Update: The revised manuscript sufficiently addresses these issues, with the caveat of hearing threshold changes affecting behavioral thresholds mentioned above].

      As we argued above, an audibility threshold increase in the old gerbils is unlikely to explain the raised TFS1 thresholds in the old gerbils.

      Increased representation of periodicity envelope in the AN - the mechanisms for increased representation of periodicity envelope cues is unclear. The authors point to some potential central mechanisms but given that these are recordings from the auditory nerve what central mechanisms these may be is unclear. If the authors are suggesting some form of efferent modulation only at the f0 frequency, no evidence for this is presented. It appears more likely that the enhancement may be due to outer hair cell dysfunction (widened tuning, distorted tonotopy). Given this increased envelope coding, the potential change in sensation level for the behavior (from the comment above), and no change in neural coding of TFS cues across any of the groups, a simpler interpretation may be -TFS coding is not affected in remaining auditory nerve fibers after age-related or ouabain induced synapse loss, but behavioral performance is affected by altered outer hair cell dysfunction with age.

      [Update: The revised manuscript has addressed these issues]

      Emerging evidence seems to suggest that cochlear synaptopathy and/or TFS encoding abilities might be reflected in listening effort rather than behavioral performance. Measuring some proxy of listening effort in these gerbils (like reaction time) to see if that has changed with synapse loss, especially in the young animals with induced synaptopathy, would make an interesting addition to explore perceptual deficits of TFS coding with synapse loss.

      [Update: The revised manuscript has addressed these issues]

      Reviewer #3 (Recommendations for the authors):

      Thank you for your revisions. They largely address most of my initial concerns. The issue of threshold shifts potentially affecting behavioral thresholds still remains unresolved in my opinion. The new data about unaltered tuning curves is convincing that the auditory nerve fiber recordings are unaffected by threshold shifts. But am I correct in my understanding that the threshold shift with age was 36 dB relative to the young (L168)? If so, wouldn't the fact that behavior was performed at 68 dB SPL regardless of group affect the behavioral thresholds with age? Is there any additional evidence that suggests that behavioral performance plateaus along that ~30dB range that the authors could include to strengthen this claim?

      In our response above to reviewer #3 and to reviewer #2 we provided additional arguments why we think that an audibility threshold increase in old gerbils cannot explain their compromised TFS1 thresholds.


      The following is the authors’ response to the original reviews.

      Reviewer #1(Public review)  

      Summary:  

      The authors investigate the effects of aging on auditory system performance in understanding temporal fine structure (TFS), using both behavioral assessments and physiological recordings from the auditory periphery, specifically at the level of the auditory nerve. This dual approach aims to enhance understanding of the mechanisms underlying observed behavioral outcomes. The results indicate that aged animals exhibit deficits in behavioral tasks for distinguishing between harmonic and inharmonic sounds, which is a standard test for TFS coding. However, neural responses at the auditory nerve level do not show significant differences when compared to those in young, normalhearing animals. The authors suggest that these behavioral deficits in aged animals are likely attributable to dysfunctions in the central auditory system, potentially as a consequence of aging. To further investigate this hypothesis, the study includes an animal group with selective synaptic loss between inner hair cells and auditory nerve fibers, a condition known as cochlear synaptopathy (CS).CS is a pathology associated with aging and is thought to be an early indicator of hearing impairment. Interestingly, animals with selective CS showed physiological and behavioral TFS coding similar to that of the young normal-hearing group, contrasting with the aged group's deficits. Despite histological evidence of significant synaptic loss in the CS group, the study concludes that CS does not appear to affect TFS coding, either behaviorally or physiologically.  

      We agree with the reviewer’s summary.

      Strengths:  

      This study addresses a critical health concern, enhancing our understanding of mechanisms underlying age-related difficulties in speech intelligibility, even when audiometric thresholds are within normal limits. A major strength of this work is the comprehensive approach, integrating behavioral assessments, auditory nerve (AN) physiology, and histology within the same animal subjects. This approach enhances understanding of the mechanisms underlying the behavioral outcomes and provides confidence in the actual occurrence of synapse loss and its effects. The study carefully manages controlled conditions by including five distinct groups: young normal-hearing animals, aged animals, animals with CS induced through low and high doses, and a sham surgery group. This careful setup strengthens the study's reliability and allows for meaningful comparisons across conditions. Overall, the manuscript is well-structured, with clear and accessible writing that facilitates comprehension of complex concepts.

      Weaknesses:

      The stimulus and task employed in this study are very helpful for behavioral research, and using the same stimulus setup for physiology is advantageous for mechanistic comparisons. However, I have some concerns about the limitations in auditory nerve (AN) physiology. Due to practical constraints, it is not feasible to record from a large enough population of fibers that covers a full range of best frequencies (BFs) and spontaneous rates (SRs) within each animal. This raises questions about how representative the physiological data are for understanding the mechanism in behavioral data. I am curious about the authors' interpretation of how this stimulus setup might influence results compared to methods used by Kale and Heinz (2010), who adjusted harmonic frequencies based on the characteristic frequency (CF) of recorded units. While, the harmonic frequencies in this study are fixed across all CFs, meaning that many AN fibers may not be tuned closely to the stimulus frequencies. If units are not responsive to the stimulus further clarification on detecting mistuning and phase locking to TFS effects within this setup would be valuable. Since the harmonic frequencies in this study are fixed across all CFs, this means that many AN fibers may not be tuned closely to the stimulus frequencies, adding sampling variability to the results.

      We chose the stimuli for the AN recordings to be identical to the stimuli used in the behavioral evaluation of the perceptual sensitivity. Only with this approach can we directly compare the response of the population of AN fibers with perception measured in behavior.

      The stimuli are complex, i.e., comprise of many frequency components AND were presented at 68 dB SPL. Thus, the stimuli excite a given fiber within a large portion of the fiber’s receptive field. Furthermore, during recordings, we assured ourselves that fibers responded to the stimuli by audiovisual control. Otherwise it would have cost valuable recording time to record from a nonresponsive AN fiber.

      Given the limited number of units per condition-sometimes as few as three for certain conditions - I wonder if CF-dependent variability might impact the results of the AN data in this study and discussing this factor can help with better understanding the results. While the use of the same stimuli for both behavioral and physiological recordings is understandable, a discussion on how this choice affects interpretation would be beneficial. In addition a 60 dB stimulus could saturate high spontaneous rate (HSR) AN fibers, influencing neural coding and phase-locking to TFS. Potentially separating SR groups, could help address these issues and improve interpretive clarity.  

      A deeper discussion on the role of fiber spontaneous rate could also enhance the study. How might considering SR groups affect AN results related to TFS coding? While some statistical measures are included in the supplement, a more detailed discussion in the main text could help in interpretation.  We do not think that it will be necessary to conduct any statistical analysis in addition to that already reported in the supplement.  

      We considered moving some supplementary information back into the main manuscript but decided against it. Our single-unit sample was not sufficient, i.e. not all subpopulations of auditory-nerve fibers were sufficiently sampled for all animal treatment groups, to conclusively resolve every aspect that may be interesting to explore. The power of our approach lies in the direct linkage of several levels of investigation – cochlear synaptic morphology, single-unit representation and behavioral performance – and, in the main manuscript, we focus on the core question of synaptopathy and its relation to temporal fine structure perception. This is now spelled out clearly in lines 197 - 203 of the main manuscript.  

      Although Figure S2 indicates no change in median SR, the high-dose treatment group lacks LSR fibers, suggesting a different distribution based on SR for different animal groups, as seen in similar studies on other species. A histogram of these results would be informative, as LSR fiber loss with CS-whether induced by ouabain in gerbils or noise in other animals-is well documented (e.g., Furman et al., 2013).  

      Figure S2 was revised to avoid overlap of data points and show the distributions more clearly. Furthermore, the sample sizes for LSR and HSR fibers are now provided separately.

      Although ouabain effects on gerbils have been explored in previous studies, since these data already seems to be recorded for the animal in this study, a brief description of changes in auditory brainstem response (ABR) thresholds, wave 1 amplitudes, and tuning curves for animals with cochlear synaptopathy (CS) in this study would be beneficial. This would confirm that ouabain selectively affects synapses without impacting outer hair cells (OHCs). For aged animals, since ABR measurements were taken, comparing hearing differences between normal and aged groups could provide insights into the pathologies besides CS in aged animals. Additionally, examining subject variability in treatment effects on hearing and how this correlates with behavior and physiology would yield valuable insights. If limited space maybe a brief clarification or inclusion in supplementary could be good enough.  

      We thank the reviewer for this constructive suggestion. The requested data were added in a new section of the Results, entitled “Threshold sensitivity and frequency tuning were not affected by the synapse loss.” (lines 150 – 174). Our young-adult, ouabain-treated gerbils showed no significant elevations of CAP thresholds and their neural tuning was normal. Old gerbils showed the typical threshold losses for individuals of comparable age, and normal neural tuning, confirming previous reports. Thus, there was no evidence for relevant OHC impairments in any of our animal groups.   

      Another suggestion is to discuss the potential role of MOC efferent system and effect of anesthesia in reducing efferent effects in AN recordings. This is particularly relevant for aged animals, as CS might affect LSR fibers, potentially disrupting the medial olivocochlear (MOC) efferent pathway. Anesthesia could lessen MOC activity in both young and aged animals, potentially masking efferent effects that might be present in behavioral tasks. Young gerbils with functional efferent systems might perform better behaviorally, while aged gerbils with impaired MOC function due to CS might lack this advantage. A brief discussion on this aspect could potentially enhance mechanistic insights.  

      Thank you for this suggestion. The potential role of olivocochlear efferents is now discussed in lines 597 - 613.

      Lastly, although synapse counts did not differ between the low-dose treatment and NH I sham groups, separating these groups rather than combining them with the sham might reveal differences in behavior or AN results, particularly regarding the significance of differences between aged/treatment groups and the young normal-hearing group.  

      For maximizing statistical power, we combined those groups in the statistical analysis. These two groups did not differ in synapse number, threshold sensitivity or neural tuning bandwidths.

      Reviewer #2 (Public review):

      Summary:  

      Using a gerbil model, the authors tested the hypothesis that loss of synapses between sensory hair cells and auditory nerve fibers (which may occur due to noise exposure or aging) affects behavioral discrimination of the rapid temporal fluctuations of sounds. In contrast to previous suggestions in the literature, their results do not support this hypothesis; young animals treated with a compound that reduces the number of synapses did not show impaired discrimination compared to controls. Additionally, their results from older animals showing impaired discrimination suggest that agerelated changes aside from synaptopathy are responsible for the age-related decline in discrimination. 

      We agree with the reviewer’s summary.

      Strengths: 

      (1) The rationale and hypothesis are well-motivated and clearly presented. 

      (2) The study was well conducted with strong methodology for the most part, and good experimental control. The combination of physiological and behavioral techniques is powerful and informative. Reducing synapse counts fairly directly using ouabain is a cleaner design than using noise exposure or age (as in other studies), since these latter modifiers have additional effects on auditory function. 

      (3) The study may have a considerable impact on the field. The findings could have important implications for our understanding of cochlear synaptopathy, one of the most highly researched and potentially impactful developments in hearing science in the past fifteen years.  

      Weaknesses: 

      (1) My main concern is that the stimuli may not have been appropriate for assessing neural temporal coding behaviorally. Human studies using the same task employed a filter center frequency that was (at least) 11 times the fundamental frequency (Marmel et al., 2015; Moore and Sek, 2009). Moore and Sek wrote: "the default (recommended) value of the centre frequency is 11F0." Here, the center frequency was only 4 or 8 times the fundamental frequency (4F0 or 8F0). Hence, relative to harmonic frequency, the harmonic spacing was considerably greater in the present study. By my calculations, the masking noise used in the present study was also considerably lower in level relative to the harmonic complex than that used in the human studies. These factors may have allowed the animals to perform the task using cues based on the pattern of activity across the neural array (excitation pattern cues), rather than cues related to temporal neural coding. The authors show that mean neural driven rate did not change with frequency shift, but I don't understand the relevance of this. It is the change in response of individual fibers with characteristic frequencies near the lowest audible harmonic that is important here.  

      The auditory filter bandwidth of the gerbil is about double that of human subjects. Because of this, the masking noise has a larger overall level than in the human studies in the filter, prohibiting the use of distortion products. The larger auditory filter bandwidth precludes that the gerbils can use excitation patterns, especially in the condition with a center frequency of 1600 Hz and a fundamental of 200 Hz and in the condition with a center frequency of 3200 Hz and a fundamental of 400 Hz. In the condition with a center frequency of 1600 Hz and a fundamental of 400 Hz, it is possible that excitation patterns are exploited. We have now added  modeling of the excitation patterns, and a new figure showing their change at the gerbils’ perception threshold, in the discussion of the revised version (lines 440 - 446 and Fig. 8).

      The case against excitation pattern cues needs to be better made in the Discussion. It could be that gerbil frequency selectivity is broad enough for this not to be an issue, but more detail needs to be provided to make this argument. The authors should consider what is the lowest audible harmonic in each case for their stimuli, given the level of each harmonic and the level of the pink noise. Even for the 8F0 center frequency, the lowest audible harmonic may be as low as the 4th (possibly even the 3rd). In human, harmonics are thought to be resolvable by the cochlea up to at least the 8th.  

      This issue is now covered in the discussion, see response to the previous point.

      (2) The synapse reductions in the high ouabain and old groups were relatively small (mean of 19 synapses per hair cell compared to 23 in the young untreated group). In contrast, in some mouse models of the effects of noise exposure or age, a 50% reduction in synapses is observed, and in the human temporal bone study of Wu et al. (2021, https://doi.org/10.1523/JNEUROSCI.3238-20.2021) the age-related reduction in auditory nerve fibres was ~50% or greater for the highest age group across cochlear location. It could be simply that the synapse loss in the present study was too small to produce significant behavioral effects. Hence, although the authors provide evidence that in the gerbil model the age-related behavioral effects are not due to synaptopathy, this may not translate to other species (including human). This should be discussed in the manuscript. 

      We agree that our results apply to moderate synaptopathy, which predominantly characterizes early stages of hearing loss or aged individuals without confounding noise-induced cochlear damage. This is now discussed in lines 486 – 498.

      It would be informative to provide synapse counts separately for the animals who were tested behaviorally, to confirm that the pattern of loss across the group was the same as for the larger sample.  

      Yes, the pattern was the same for the subgroup of behaviorally tested animals. We have added this information to the revised version of the manuscript (lines 137 – 141).

      (3) The study was not pre-registered, and there was no a priori power calculation, so there is less confidence in replicability than could have been the case. Only three old animals were used in the behavioral study, which raises concerns about the reliability of comparisons involving this group.  

      The results for the three old subjects differed significantly from those of young subjects and young ouabain-treated subjects. This indicates a sufficient statistical power, since otherwise no significant differences would be observed.

      Reviewer #3 (Public review):

      This study is a part of the ongoing series of rigorous work from this group exploring neural coding deficits in the auditory nerve, and dissociating the effects of cochlear synaptopathy from other agerelated deficits. They have previously shown no evidence of phase-locking deficits in the remaining auditory nerve fibers in quiet-aged gerbils. Here, they study the effects of aging on the perception and neural coding of temporal fine structure cues in the same Mongolian gerbil model. 

      They measure TFS coding in the auditory nerve using the TFS1 task which uses a combination of harmonic and tone-shifted inharmonic tones which differ primarily in their TFS cues (and not the envelope). They then follow this up with a behavioral paradigm using the TFS1 task in these gerbils. They test young normal hearing gerbils, aged gerbils, and young gerbils with cochlear synaptopathy induced using the neurotoxin ouabain to mimic synapse losses seen with age. 

      In the behavioral paradigm, they find that aging is associated with decreased performance compared to the young gerbils, whereas young gerbils with similar levels of synapse loss do not show these deficits. When looking at the auditory nerve responses, they find no differences in neural coding of TFS cues across any of the groups. However, aged gerbils show an increase in the representation of periodicity envelope cues (around f0) compared to young gerbils or those with induced synapse loss. The authors hence conclude that synapse loss by itself doesn't seem to be important for distinguishing TFS cues, and rather the behavioral deficits with age are likely having to do with the misrepresented envelope cues instead.  

      We agree with the reviewer’s summary.

      The manuscript is well written, and the data presented are robust. Some of the points below will need to be considered while interpreting the results of the study, in its current form. These considerations are addressable if deemed necessary, with some additional analysis in future versions of the manuscript. 

      Spontaneous rates - Figure S2 shows no differences in median spontaneous rates across groups. But taking the median glosses over some of the nuances there. Ouabain (in the Bourien study) famously affects low spont rates first, and at a higher degree than median or high spont rates. It seems to be the case (qualitatively) in Figure S2 as well, with almost no units in the low spont region in the ouabain group, compared to the other groups. Looking at distributions within each spont rate category and comparing differences across the groups might reveal some of the underlying causes for these changes. Given that overall, the study reports that low-SR fibers had a higher ENV/TFS log-zratio, the distribution of these fibers across groups may reveal specific effects of TFS coding by group.  

      As the reviewer points out, our sample from the group treated with a high concentration of ouabain showed very few low-spontaneous-rate auditory-nerve fibers, as expected from previous work. However, this was also true, e.g., for our sample from sham-operated animals, and may thus well reflect a sampling bias. We are therefore reluctant to attach much significance to these data distributions. We now point out more clearly the limitations of our auditory-nerve sample for the exploration of  interesting questions beyond our core research aim (see also response to Reviewer 1 above).  

      Threshold shifts - It is unclear from the current version if the older gerbils have changes in hearing thresholds, and whether those changes may be affecting behavioral thresholds. The behavioral stimuli appear to have been presented at a fixed sound level for both young and aged gerbils, similar to the single unit recordings. Hence, age-related differences in behavior may have been due to changes in relative sensation level. Approaches such as using hearing thresholds as covariates in the analysis will help explore if older gerbils still show behavioral deficits.  

      Unfortunately, we did not obtain behavioral thresholds that could be used here. We want to point out that the TFS 1 stimuli had an overall level of 68 dB SPL, and the pink noise masker would have increased the threshold more than expected from the moderate, age-related hearing loss in quiet. Thus, the masked thresholds for all gerbil groups are likely similar and should have no effect on the behavioral results.

      Task learning in aged gerbils - It is unclear if the aged gerbils really learn the task well in two of the three TFS1 test conditions. The d' of 1 which is usually used as the criterion for learning was not reached in even the easiest condition for aged gerbils in all but one condition for the aged gerbils (Fig. 5H) and in that condition, there doesn't seem to be any age-related deficits in behavioral performance (Fig. 6B). Hence dissociating the inability to learn the task from the inability to perceive TFS 1 cues in those animals becomes challenging.  

      Even in the group of gerbils with the lowest sensitivity, for the condition 400/1600 the animals achieved a d’ of on average above 1. Furthermore, stimuli were well above threshold and audible, even when no discrimination could be observed. Finally, as explained in the methods, different stimulus conditions were interleaved in each session, providing stimuli that were easy to discriminate together with those being difficult to discriminate. This approach ensures that the gerbils were under stimulus control, meaning properly trained to perform the task. Thus, an inability to discriminate does not indicate a lack of proper training.  

      Increased representation of periodicity envelope in the AN - the mechanisms for increased representation of periodicity envelope cues is unclear. The authors point to some potential central mechanisms but given that these are recordings from the auditory nerve what central mechanisms these may be is unclear. If the authors are suggesting some form of efferent modulation only at the f0 frequency, no evidence for this is presented. It appears more likely that the enhancement may be due to outer hair cell dysfunction (widened tuning, distorted tonotopy). Given this increased envelope coding, the potential change in sensation level for the behavior (from the comment above), and no change in neural coding of TFS cues across any of the groups, a simpler interpretation may be -TFS coding is not affected in remaining auditory nerve fibers after age-related or ouabain induced synapse loss, but behavioral performance is affected by altered outer hair cell dysfunction with age. 

      A similar point was made by Reviewer #1. As indicated above, new data on threshold sensitivity and neural tuning were added in a new section of the Results which indirectly suggest that significant OHC pathologies were not a concern, neither in our young-adult, synaptopathic gerbils nor in the old gerbils.  

      Emerging evidence seems to suggest that cochlear synaptopathy and/or TFS encoding abilities might be reflected in listening effort rather than behavioral performance. Measuring some proxy of listening effort in these gerbils (like reaction time) to see if that has changed with synapse loss, especially in the young animals with induced synaptopathy, would make an interesting addition to explore perceptual deficits of TFS coding with synapse loss.  

      This is an interesting suggestion that we now explore in the revision of the manuscript. Reaction times can be used as a proxy for listening effort and were recorded for all responses. The the new analysis now reported in lines 378 - 396 compared young-adult control gerbils with young-adult gerbils that had been treated with the high concentration of ouabain. No differences in response latencies was found, indicating that listening effort did not change with synapse loss.  

      Reviewer #1 (Recommendations for the authors): 

      Figure 2: The y-axis labeled as "Frequency" is potentially misleading since there are additional frequency values on the right side of the panels. It would be helpful to clarify more in the caption what these right-side frequency values represent. Additionally, the legend could be positioned more effectively for clarity.

      Thank you for your suggestion. The axis label was rephrased.

      Figure 7: This figure is a bit unclear, as it appears to show two sets of gerbil data at 1500 Hz, yet the difference between them is not explained.  

      We added the following text to the figure legend: „The higher and lower thresholds shown for the gerbil data reflect thresholds at  fc of 1600 Hz for fundamentals f0 of 200 Hz and 400 Hz, respectively.“

      Maybe a short description of fmax that is used in Figure 4 could help or at least point to supplementary for finding the definition.  

      We thank the reviewer for pointing out this typo/inaccuracy. The correct terminology in line with the remainder of the manuscript is “fmaxpeak”. We corrected the caption of figure 5 (previously figure 4) and added the reference pointing to figure 11 (previously figure 9), which explains the terms.

      I couldn't find information about the possible availability of data. 

      The auditory-nerve recordings reported in this paper are part of a larger study of single-unit auditorynerve responses in gerbils, formally described and published by Heeringa (2024) Single-unit data for sensory neuroscience: Responses from the auditory nerve of young-adult and aging gerbils. Scientific Data 11:411, https://doi.org/10.1038/s41597-024-03259-3. As soon as the Version of Record will be submitted, the raw single-unit data can be accessed directly through the following link:  https://doi.org/10.5061/dryad.qv9s4mwn4. The data that are presented in the figures of the present manuscript and were statistically analyzed are uploaded to the Zenodo repository (https://doi.org/10.5281/zenodo.15546625).  

      Reviewer #2 (Recommendations for the authors): 

      L22. The term "hidden hearing loss" is used in many different ways in the literature, from being synonymous with cochlear synaptopathy, to being a description of any listening difficulties that are not accounted for by the audiogram (for which there are many other / older terms). The original usage was much more narrow than your definition here. It is not correct that Schaette and McAlpine defined HHL in the broad sense, as you imply. I suggest you avoid the term to prevent further confusion.  

      We eliminated the term hidden hearing loss.

      L43. SNHL is undefined.

      Thank you for catching that. The term is now spelled out.

      L64. "whether" -> "that"  

      We corrected this issue.

      L102. It would be informative to see the synapse counts (across groups) for the animals tested in the behavioral part of the study. Did these vary between groups in the same way?  

      Yes, the pattern was the same for the subgroup of behaviorally tested animals. We have added this information to the revised version of the manuscript (lines 137 – 141).

      L108. How many tests were considered in the Bonferroni correction? Did this cover all reported tests in the paper?  

      The comparisons of synapse numbers between treatment groups were done with full Bonferroni correction, as in the other tests involving posthoc pair-wise comparisons after an ANOVA.

      Figure 1 and 6 captions. Explain meaning of * and ** (criteria values).  

      The information was added to the figure legends of now Figs. 1 and 7. 

      L139. I don't follow the argument - the mean driven rate is not important. It is the rate at individual CFs and how that changes with frequency shift that provides the cue.

      L142. I don't follow - individual driven rates might have been a cue (some going up, some down, as frequency was shifted).  

      Yes, theoretically it is possible that the spectral pattern of driven rates (i.e., excitation pattern) can be specifically used for profile analysis and subsequently as a strong cue for discriminating the TFS1 stimuli. In order to shed some light on this question with regard to the actual stimuli used in this study, we added a comprehensive figure showing simulated excitation patterns (figure 8). The excitation patterns were generated with a gammatone filter bank and auditory filter bandwidths appropriate for gerbils (Kittel et al. 2002). The simulated excitation patterns allow to draw some at least semi-quantitative conclusions about the possibility of profile analysis: 1. In the 200/1600 Hz and 400/3200 Hz conditions (i.e., harmonic number of fc is 8), the difference between all inharmonic excitation patterns and the harmonic reference excitation pattern is far below the threshold for intensity discrimination (Sinnott et al. 1992). 2. In the same conditions, the statistics of the pink noise make excitation patterns differences at or beyond the filter slopes (on both high and low frequency limits) useless for frequency shift discrimination. 3. In the 400/1600 Hz condition (i.e., harmonic number of fc is 4), there is a non-negligible possibility that excitation pattern differences were a main cue for discrimination. All of these conclusions are compatible with the results of our study.

      L193. Is this p-value Bonferroni corrected across the whole study? If not, the finding could well be spurious given the number of tests reported.  

      Yes, it is Bonferroni corrected

      L330. TFS is already defined.  

      L346. AN is already defined.  

      L408. "temporal fine structure" -> "TFS"  

      It was a deliberate decision to define these terms again in the Discussion, for readers who prefer to skip most of the detailed Results. 

      L364-366. This argument is somewhat misleading. Cochlear resolvability largely depends on the harmonic spacing (i.e., F0) relative to harmonic frequency (in other words, on harmonic rank). Marmel et al. (2015) and Moore and Sek (2009) used a center frequency (at least) 11 times F0. Here, the center frequency was only 4 or 8 times F0. In human, this would not be sufficient to eliminate excitation pattern cues.  

      We have now included results from modeling the excitation patterns in the discussion with a new figure demonstrating that at a center frequency of 8 times F0, excitation patterns provide no useful cue while this is a possibility at  a center frequency of 4 times F0 (Fig. 8, lines 440 - 446).

      L541. Was that a spectrum level of 20 dB SPL (level per 1-Hz wide band) at 1 kHz? Need to clarify.  

      The power spectral density of the pink noise at 1 kHz (i.e., the level in a 1 Hz wide band centered at 1 kHz) was 13.3 dB SPL. The total level of the pink noise (including edge filters at 100 Hz and 11 kHz) was 50 dB SPL.

      L919. So was the correction applied across only the tests within each ANOVA? Don't you need to control the study-wise error rate (across all primary tests) to avoid spurious findings?  

      We added information about the family-wise error rate (line 1077 - 1078). Since the ANOVAs tested different specific research questions, we do not think that we need to control the study-wise error rate.

      Reviewer #3 (Recommendations for the authors): 

      There was no difference in TFS sensitivity in the AN fiber activity across all the groups. Potential deficits with age were only sound in the behavioral paradigm. Given that, it might make it clearer to specify that the deficits or lack thereof are in behavior, in multiple instances in the manuscript where it says synaptopathy showed no decline in TFS sensitivity (For example Line 342-344).  

      We carefully went through the entire text and clarified a couple more instances.

      L353 - this statement is a bit too strong. It implies causality when there is only a co-occurrence of increased f0 representation and age-related behavioral deficits in TFS1 task.  

      The statement was rephrased as “Thus, cue representation may be associated with the perceptual deficits, but not reduced synapse numbers, as originally proposed.”

      L465-467 - while this may be true, I think it is hard to say this with the current dataset where only AN fibers are being recorded from. I don't think we can say anything about afferent central mechanisms with this data set.  

      We agree. However, we refer here to published data on central inhibition to provide a possible explanation. 

      Hearing thresholds with ABRs are mentioned in the methods, but that data is not presented anywhere. Would be nice to see hearing thresholds across the various groups to account or discount outer hair cell dysfunction. 

      This important point was made repeatedly and we thank the Reviewers for it. As indicated above, new data on threshold sensitivity and neural tuning were added in a new section of the Results which indirectly suggest that significant OHC pathologies were not a concern, neither in our young-adult, synaptopathic gerbils nor in the old gerbils.

    1. Reliability of TCP-IP

      FLUSSO CORRETTO DI COME FUNZIONA UNA RICHIESTA WEB 1. Trovi il computer remoto → IP

      Il browser scopre l’IP del server (es. di google.com). Questo dice quale macchina contattare.

      1. Crei una connessione affidabile → TCP

      Il tuo computer apre una connessione TCP verso quell’IP.

      TCP fa queste cose:

      stabilisce la connessione,

      spezza i dati in pacchetti,

      garantisce che arrivino in ordine,

      richiede ritrasmissioni se qualcosa si perde.

      TCP è quindi il trasportatore affidabile dei dati.

      1. Scegli a quale applicazione parlare → Porta

      Per parlare HTTP, il browser contatta la porta 80 (o 443).

      IP = dov’è il computer

      Porta = quale applicazione dentro quel computer

      Il tuo computer usa anche lui una porta, ma una porta alta e temporanea (es. 51234). Serve per distinguere quella connessione da altre.

      1. Invia la richiesta HTTP

      A questo punto TCP è solo il tubo che trasporta i dati. Dentro quel tubo ci metti un messaggio HTTP, tipo:

      GET /index.html HTTP/1.1 Host: www.google.com

      HTTP è il linguaggio della richiesta.

      1. Il server legge la richiesta e risponde via TCP

      Il server ha un programma (Apache, Nginx, ecc.) che:

      ascolta su porta 80,

      riceve la richiesta HTTP,

      la interpreta,

      manda una risposta HTTP dentro la stessa connessione TCP.

      Il tutto ritorna al tuo browser.

      RIASSUNTO IN UNA FRASE PERFETTA

      IP ti porta al computer giusto, TCP ti fornisce un canale affidabile, la porta ti collega all’applicazione giusta, HTTP è il linguaggio della richiesta e della risposta

    1. Reviewer #1 (Public review):

      Summary:

      Grasper et al. present a combined analysis of the role of temporal mutagenesis in cancer, which includes both theoretical investigation and empirical analysis of point mutations in TCGA cancer patient cohorts. They find that temporal elevated mutation rates contribute to cancer fitness by allowing fast adaptation when the fitness drops (due to previous deleterious mutations). This may be relevant in the case of tumor suppressor genes (TSG), which follow the 2-hit hypothesis (i.e., biallelic 2 mutations are necessary to deactivate TS), and in cases where temporal mutagenesis occurs (e.g. high APOBEC, ROS). They provide evidence that this scenario is likely to occur in patients, in some cancer types. This is an interesting and potentially important result that merits the attention of the target audience. Nonetheless, I have some questions (detailed below) regarding the design of the study, the tools and parametrization of the theoretical analysis and the empirical analysis - that I think if addressed would make the paper more solid and the conclusion more substantiated.

      Strengths:

      Combined theoretical investigation with empirical analysis of cancer patients

      Weaknesses:

      Parametrization and systematic investigation of theoretical tools and their relevant to tumor evolution

      Comments on revisions:

      The authors have adequately addressed my suggestions. I think some of the details provided in some of the replies to my comments (specifically with regard to my points 1, 4, 6ii; minor point 6) could be integrated into relevant text in the introduction , discussion and methods, to help the readers follow better the model and its interpretation - but this is up to the authors to decide what to emphasize.

    2. Reviewer #2 (Public review):

      This work presents theoretical results concerning the effect of punctuated mutation on multistep adaptation along with empirical analysis of multistep adaptation in cancer. The empirical results are claimed to demonstrate the acceleration of multistep adaptation predicted theoretically. However, there is an important disconnect between the theoretical results and the empirical observations, such that it is not clear that punctuated mutation can produce the phenomena observed empirically. Furthermore, there are other plausible explanations for the empirical observations.

      The theoretical work emphasizes the positive effect of punctuated mutation on the rate of crossing a "fitness valley", i.e., multistep adaptation where the first mutation is deleterious. The empirical work, however, focuses on inactivation of both alleles of a tumor suppressor gene (TSG), for which the first mutation--inactivation of one gene copy--is expected to be neutral or slightly advantageous, not maladaptive as suggested by the authors. Pairs of genes with putative synergystic effects were also analyzed, but there is no indication that these generally involve fitness valleys either.

      This disconnect is most glaring in Figure 4, in which the simulations are supposed to confirm that punctuated mutation can produce the empirical phenomena reported for TSG inactivation. If this is the case, it should be possible to produce such results in simulations in which inactivation of just one allele is neutral. Instead, simulations assuming a substantial fitness penalty (0.05) for the first mutation are presented. Contrary to what is claimed in the text (line 212), this is not a "biologically realistic" parameter value for TSG inactivation. The insensitivity of results to the size the fitness penalty is irrelevant: a substantial fitness penalty is qualitatively different from no penalty at all.

      The paper does report a small (15%) effect of punctuation on the rate of multistep adaptation in the absence of a fitness valley. This effect is much smaller than the fourfold increase in the presence of a fitness valley. The results presented--a single stochastic run for each condition--are insufficient to establish that there is any effect at all: if we assume that the number of pairs of fixations (about 150-180 in each simulation) is Poisson distributed, the 15% difference is not statistically significant.

      Assuming that this effect is genuine, it is likely due to a mutation rate that is unrealisitcally high (considering that "rescue" requires inactivation of a particular gene). Theoretical considerations suggest that punctuated mutation has little or no effect in the absence of a fitness valley in the limit of low mutation rate:

      (A1) The authors' theoretical results for a Galton-Watson process (SI2) imply that there is no effect without a fitness valley in that limit. This is so because there is no effect in the "supercritical" regime. Cancer cells must be supercritical (otherwise there would be no net growth), and a neutral or advantangeous mutant would remain in the supercritical regime.

      (A2) Fig. S2D indicates, as far as I can tell from the colors, that punctuation makes little or no difference to the rate of adaptation in the absence of a fitness valley, i.e., for vertical axis values of 1 or more. I am not sure why the authors (line 129) point to this figure as evidence that punctuation speeds two-step adaptation when the first mutation is not maladaptive; the figure appears to say that it does not. The fraction of events due to "stochastic tunneling" of course increases with punctuation, but that does not change the fact that adaptation is no faster.

      (A3) The authors' verbal argument to the contrary (line 124ff) is flawed. Despite the fact that even a mildly advantageous mutant is likely to go extinct, its expected frequency only increases with time, and that of a neutral allele remains constant over time. Thus, the average number of opportunities for a second mutation does not decrease with time since the first mutation, as it does when the first muation is deleterious.

      (A4) I ran some simulations for a Wright-Fisher population, and they seem to confirm the lack of an effect in the low mutation rate limit.

      Thus, it is unclear whether punctuated mutation can explain the reported phenomena or should be expected to have major effects on the rate or nature of cancer cell adaptation.

      I would also note that routes to inactivation of both copies of a TSG that are not accelerated by punctuation will dilute any effects of punctuation. An example is a single somatic mutation followed by loss of heterozygosity. Such mechanisms are not included in the theoretical analysis nor assessed empirically. If, for example, 90% of double inactivations were the result of such mechanisms with a constant mutation rate, a factor of two effect of punctuated mutagenesis would increase the overall rate by only 10%. Consideration of the rate of apparent inactivation of just one TSG copy and of deletion of both copies would shed some light on the importance of this consideration.

      Several factors besides the effects of punctuated mutation might explain or contribute to the empirical observations. Though these are now mentioned in the paper, I will list them here for clarity:

      (B1) High APOBEC3 activity can select for inactivation of TSGs (references in Butler and Banday 2023, PMID 36978147). This could explain the empirical correlations.

      (B2) Without punctuation, the rate of multistep adaptation is expected to rise more than linearly with mutation rate. Thus, if APOBEC signatures are correlated with a high mutation rate due to the action of APOBEC, this alone could explain the correlation with TSG inactivation.

      (B3) The nature of mutations caused by APOBEC might explain the results. Notably, one of the two APOBEC mutation signatures, SBS13, is particularly likely to produce nonsense mutations. The authors count both nonsense and missense mutations, but nonsense mutations are more likely to inactivate the gene, and hence to be selected.

    3. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      his valuable study presents a theoretical model of how punctuated mutations influence multistep adaptation, supported by empirical evidence from some TCGA cancer cohorts. This solid model is noteworthy for cancer researchers as it points to the case for possible punctuated evolution rather than gradual genomic change. However, the parametrization and systematic evaluation of the theoretical framework in the context of tumor evolution remain incomplete, and alternative explanations for the empirical observations are still plausible.

      We thank the editor and the reviewers for their thorough engagement with our work. The reviewers’ comments have drawn our attention to several important points that we have addressed in the updated version. We believe that these modifications have substantially improved our paper.

      There were two major themes in the reviewers’ suggestions for improvement. The first was that we should demonstrate more concretely how the results in the theoretical/stylized modelling parts of our paper quantitatively relate to dynamics in cancer.

      To this end, we have now included a comprehensive quantification of the effect sizes of our results across large and biologically-relevant parameter ranges. Specifically, following reviewer 1’s suggestion to give more prominence to the branching process, we have added two figures (Fig S3-S4) quantifying the likelihood of multi-step adaptation in a branching process for a large range of mutation rates and birth-death ratios. Formulating our results in terms of birth-death ratios also allowed us to provide better intuition regarding how our results manifest in models with constant population size vs models of growing populations. In particular, the added figure (Fig S3) highlights that the effect size of temporal clustering on the probability of successful 2-step adaptation is very sensitive to the probability that the lineage of the first mutant would go extinct if it did not acquire a second mutation. As a result, the phenomenon we describe is biologically likely to be most effective in those phases during tumor evolution in which tumor growth is constrained. This important pattern had not been described sufficiently clearly in the initial version of our manuscript, and we thank both reviewers for their suggestions to make these improvements.

      The second major theme in the reviewers’ suggestions was focused on how we relate our theoretical findings to readouts in genomic data, with both reviewers pointing to potential alternative explanations for the empirical patterns we describe.

      We have now extended our empirical analyses following some of the reviewers’ suggestions. Specifically, we have included analyses investigating how the contribution of reactive oxygen species (ROS)-related mutation signatures correlates with our proxies for multi-step adaptation; and we have included robustness checks in which we use Spearman instead of Pearson correlations. Moreover, we have included more discussion on potential confounds and the assumptions going into our empirical analyses as well as the challenges in empirically identifying the phenomena we describe.

      Below, we respond in detail to the individual comments made by each reviewer.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Grasper et al. present a combined analysis of the role of temporal mutagenesis in cancer, which includes both theoretical investigation and empirical analysis of point mutations in TCGA cancer patient cohorts. They find that temporally elevated mutation rates contribute to cancer fitness by allowing fast adaptation when the fitness drops (due to previous deleterious mutations). This may be relevant in the case of tumor suppressor genes (TSG), which follow the 2-hit hypothesis (i.e., biallelic 2 mutations are necessary to deactivate TS), and in cases where temporal mutagenesis occurs (e.g., high APOBEC, ROS). They provide evidence that this scenario is likely to occur in patients with some cancer types. This is an interesting and potentially important result that merits the attention of the target audience. Nonetheless, I have some questions (detailed below) regarding the design of the study, the tools and parametrization of the theoretical analysis, and the empirical analysis, which I think, if addressed, would make the paper more solid and the conclusion more substantiated.

      Strengths:

      Combined theoretical investigation with empirical analysis of cancer patients.

      Weaknesses:

      Parametrization and systematic investigation of theoretical tools and their relevance to tumor evolution.

      We sincerely thank Reviewer 1 for their comments. As communicated in more detail in the point-by-point replies to the “Recommendations for the authors”, we have revised the paper to address these comments in various ways. To summarize, Reviewer 1 asked for (1) more comprehensive analyses of the parameter space, especially in ranges of small fitness effects and low mutation rates; (2) additional clarifications on details of mechanisms described in the manuscript; and (3) suggested further robustness checks to our empirical analyses. We have addressed these points as follows: we have added detailed analyses of dynamics and effect sizes for branching processes (see Sections SI2 and SI3 in the Supplementary Information, as well as Figures S3 and S4). As suggested, these additions provide characterizations of effect sizes in biologically relevant parameter ranges (low mutation rates and smaller fitness effect sizes), and extend our descriptions to processes with dynamically changing population sizes. Moreover, we have added further clarifications at suggested points in the manuscript, e.g. to elaborate on the non-monotonicities in Fig 3. Lastly, we have undertaken robustness checks using Spearman rather than Pearson correlation coefficients to quantify relations between TSG deactivation and APOBEC signature contribution, and have performed analyses investigating dynamics of reactive oxygen species-associated mutagenesis instead of APOBEC.

      Reviewer #2 (Public review):

      This work presents theoretical results concerning the effect of punctuated mutation on multistep adaptation and empirical evidence for that effect in cancer. The empirical results seem to agree with the theoretical predictions. However, it is not clear how strong the effect should be on theoretical grounds, and there are other plausible explanations for the empirical observations.

      Thank you very much for these comments. We have now substantially expanded our investigations of the parameter space as outlined in the response to the “eLife Assessment” above and in the detailed comments below (A(1)-A(3)) to convey more quantitative intuition for the magnitude of the effects we describe for different phases of tumor evolution. We agree that there could be potential additional confounders to our empirical investigations besides the challenges regarding quantification that we already described in our initial version of the manuscript. We have thus included further discussion of these in our manuscript (see replies to B(1)-B(3)), and we have expanded our empirical analyses as outlined in the response to the “eLife Assessment”.

      For various reasons, the effect of punctuated mutation may be weaker than suggested by the theoretical and empirical analyses:

      (A1) The effect of punctuated mutation is much stronger when the first mutation of a two-step adaptation is deleterious (Figure 2). For double inactivation of a TSG, the first mutation--inactivation of one copy--would be expected to be neutral or slightly advantageous. The simulations depicted in Figure 4, which are supposed to demonstrate the expected effect for TSGs, assume that the first mutation is quite deleterious. This assumption seems inappropriate for TSGs, and perhaps the other synergistic pairs considered, and exaggerates the expected effects.

      Thank you for highlighting this discrepancy between Figure 2 and Figure 4. For computational efficiency and for illustration purposes, we had opted for high mutation rates and large fitness effects in Figure 2; however, our results are valid even in the setting of lower mutation rates and fitness effects. To improve the connection to Figure 4, and to address other related comments regarding parameter dependencies, we have now added more detailed quantification of the effects we describe (Figures SF3 and SF4) to the revised manuscript. These additions show that the effects illustrated in Figure 2 retain large effect sizes when going to much lower mutation rates and much smaller fitness effects. Indeed, while under high mutation rates we only see the large relative effects if the first mutation is highly deleterious, these large effects become more universal when going to low mutation rates.

      In general, it is correct that the selective disadvantage (or advantage) conveyed by the first mutation affects the likelihood of successful 2-step adaptations. It is also correct that the magnitude of the ‘relative effect’ of temporal clustering on valley-crossing is highest if the lineage with only the first of the two mutations is vanishingly unlikely to produce a second mutant before going extinct. If the first mutation is strongly deleterious, the lineage of such a first mutant is likely to quickly go extinct – and therefore also more likely to do so before producing a second mutant.

      However, this likelihood of producing the second mutant is also low if the mutation rate is low. As our added figure (Figure SF3) illustrates, at low mutation rates appropriate for cancer cells, is insensitive to the magnitude of the fitness disadvantage for large parts of the parameter space. Especially in populations of constant size (approximated by a birth/death ratio of 1), the relative effects for first mutations that reduce the birth rate by 0.5 or by 0.05 are indistinguishable (Figure SF3f).

      Moreover, the absolute effect , as we discuss in the paper (Figures SF2 and SF3) is largest in regions of the parameter space in which the first mutant is not infinitesimally unlikely to produce a second mutant (and 𝑓<sub>𝑘</sub> and 𝑓<sub>1</sub> would be infinitesimally small), but rather in parameter regions in which this first mutant has a non-negligible chance to produce a second mutant. The absolute effect therefore peaks around fitness-neutral first mutations. While the next comment (below) says that our empirical investigations more closely resemble comparisons of relative effects and not absolute effects, we would expect that the observations in our data come preferentially from multi-step adaptations with large absolute effect since the absolute effect is maximal when both 𝑓<sub>𝑘</sub> and 𝑓<sub>1</sub>are relatively high.

      In summary, we believe Figure 2, while having exaggerated parameters for very defendable reasons, is not a misleading illustration of the general phenomenon or of its applicability in biological settings, as effect sizes remain large when moving to biologically realistic parameter ranges. To clarify this issue, we have largely rewritten the relevant paragraphs in the results section and have added two additional figures (Figures SF3 and SF4) as well as a section in the SI with detailed discussion (SI2).

      (A2) More generally, parameter values affect the magnitude of the effect. The authors note, for example, that the relative effect decreases with mutation rate. They suggest that the absolute effect, which increases, is more important, but the relative effect seems more relevant and is what is assessed empirically.

      Thank you for this comment. As noted in the replies to the above comments, we have now included extensive investigations of how sensitive effect sizes are to different parameter choices. We also apologize for insufficiently clearly communicating how the quantities in Figure 4 relate to the findings of our theoretical models.

      The challenge in relating our results to single-timepoint sequencing data is that we only observe the mutations that a tumor has acquired, but we do not directly observe the mutation rate histories that brought about these mutations. As an alternative readout, we therefore consider (through rough proxies: TSGs and APOBEC signatures) the amount of 2-step adaptations per acquired/retained mutation. While we unfortunately cannot control for the average mutation rate in a sample, we motivate using this “TSG-deactivation score” by the hypothesis that for any given mutation rate, we expect a positive relationship between the amount of temporal clustering and the amount of 2-step adaptations per acquired/retained mutation. This hypothesis follows directly from our theoretical model where it formally translates to the statement that for a fixed , is increasing in .

      However, while both quantities 𝑓<sub>𝑘</sub>/𝑓<sub>1</sub>  or from our theoretical model relate to this hypothesis – both are increasing in 𝑘–, neither of them maps directly onto the formulation of our empirical hypothesis.

      We have now rewritten the relevant passages of the manuscript to more clearly convey our motivation for constructing our TSG deactivation score in this form (P. 4-6).

      (A3) Routes to inactivation of both copies of a TSG that are not accelerated by punctuation will dilute any effects of punctuation. An example is a single somatic mutation followed by loss of heterozygosity. Such mechanisms are not included in the theoretical analysis nor assessed empirically. If, for example, 90% of double inactivations were the result of such mechanisms with a constant mutation rate, a factor of two effect of punctuated mutagenesis would increase the overall rate by only 10%. Consideration of the rate of apparent inactivation of just one TSG copy and of deletion of both copies would shed some light on the importance of this consideration.

      This is a very good point, thank you. In our empirical analyses, the main motivation was to investigate whether we would observe patterns that are qualitatively consistent with our theoretical predictions, i.e. whether we would find positive associations between valley-crossing and temporal clustering. Our aim in the empirical analyses was not to provide a quantitative estimate of how strongly temporally clustered mutation processes affect mutation accumulation in human cancers. We hence restricted attention to only one mutation process which is well characterized to be temporally clustered (APOBEC mutagenesis) and to only one category of (epi)genomic changes (SNPs, in which APOBEC signatures are well characterized). Of course, such an analysis ignores that other mutation processes (e.g. LOH, copy number changes, methylation in promoter regions, etc.) may interact with the mechanisms that we consider in deactivating Tumor suppressor genes.

      We have now updated the text to include further discussion of this limitation and further elaboration to convey that our empirical analyses are not intended as a complete quantification of the effect of temporal clustering on mutagenesis in-vivo (P. 10,11).

      Several factors besides the effects of punctuated mutation might explain or contribute to the empirical observations:

      (B1) High APOBEC3 activity can select for inactivation of TSGs (references in Butler and Banday 2023, PMID 36978147). This selective force is another plausible explanation for the empirical observations.

      Thank you for making this point. We agree that increased APOBEC3 activity, or any other similar perturbation, can change the fitness effect that any further changes/perturbations to the cell would bring about. Our empirical analyses therefore rely on the assumption that there are no major confounding structural differences in selection pressures between tumors with different levels of APOBEC signature contributions. We have expanded our discussion section to elaborate on this potential limitation (P. 10-11).

      While the hypothesis that APOBEC3 activity selects for inactivation of TSGSs has been suggested, there remain other explanations. Either way, the ways in which selective pressures have been suggested to change would not interfere relevantly with the effects we describe. The paper cited in the comment argues that “high APOBEC3 activity may generate a selective pressure favoring” TSG mutations as “APOBEC creates a high [mutation] burden, so cells with impaired DNA damage response (DDR) due to tumor suppressor mutations are more likely to avert apoptosis and continue proliferating”. To motivate this reasoning, in the same passage, the authors cite a high prevalence of TP53 mutations across several cancer types with “high burden of APOBEC3-induced mutations”, but also note that “this trend could arise from higher APOBEC3 expression in p53-mutated tumors since p53 may suppress APOBEC3B transcription via p21 and DREAM proteins”.

      Translated to our theoretical framework, this reasoning builds on the idea that APOBEC3 activity increases the selective advantage of mutants with inactivation of both copies of a TSG. In contrast, the mechanism we describe acts by altering the chances of mutants with only one TSG allele inactivated to inactivate the second allele before going extinct. If homozygous inactivation of TSGs generally conveys relatively strong fitness advantages, lineages with homozygous inactivation would already be unlikely to go extinct. Further increasing the fitness advantage of such lineages would thus manifest mostly in a quicker spread of these lineages, rather than in changes in the chance that these lineages survive. In turn, such a change would have limited effect on the “rate” at which such 2-step adaptations occur, but would mostly affect the speed at which they fixate. It would be interesting to investigate these effects empirically by quantifying the speed of proliferation and chance of going extinct for lineages that newly acquired inactivating mutations in TSGs.

      Beyond this explicit mention of selection pressures, the cited paper also discusses high occurrences of mutations in TSGs in relation to APOBEC. These enrichments, however, are not uniquely explained by an APOBEC-driven change in selection pressures. Indeed, our analyses would also predict such enrichments.

      (B2) Without punctuation, the rate of multistep adaptation is expected to rise more than linearly with mutation rate. Thus, if APOBEC signatures are correlated with a high mutation rate due to the action of APOBEC, this alone could explain the correlation with TSG inactivation.

      Thank you for making this point. Indeed, an identifying assumption that we make is that average mutation rates are balanced between samples with a higher vs lower APOBEC signature contribution. We cannot cleanly test this assumption, as we only observe aggregate mutation counts but not mutation rates. However, the fact that we observe an enrichment for APOBEC-associated mutations among the set of TSG-inactivating mutations (see Figure 4F) would be consistent with APOBEC-mutations driving the correlations in Fig 4D, rather than just average mutation rates. We have now added a paragraph to our manuscript to discuss these points (P. 10-11).

      (B3) The nature of mutations caused by APOBEC might explain the results. Notably, one of the two APOBEC mutation signatures, SBS13, is particularly likely to produce nonsense mutations. The authors count both nonsense and missense mutations, but nonsense mutations are more likely to inactivate the gene, and hence to be selected.

      Thank you for making this point.  We have included it in our discussion of potential confounders/limitations in the revised manuscript (P. 10-11).  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific questions/comments/suggestions:

      (1) For the theoretical investigation, the authors use the Wright-Fisher model with specific parameters for the decrease/increase in the fitness (0.5,1.5). This model is not so relevant to cancer, because it assumes a constant population size, while in cancer, the population is dynamic (increasing, if the tumor grows). Although I see they mention relevance to the branching process (in SI), I think the branching process should be bold in the main text and the Wright-Fisher in SI (or even dropped).

      Thank you for this comment. We agree that too little attention had been given to the branching process in the original version of our manuscript. While the Wright-Fisher process is computationally efficient to simulate and thus lends itself to clean simulations for illustrative examples, it did lead us to put undue emphasis on populations of constant size.

      The added Figures SF2 and SF3 now focus on branching processes, and we have substantially expanded our discussion of how dynamics differ as a function of the population-size trajectory (constant vs growing; SI2, P. 4,9,10). Generally, we do believe that it is appropriate to consider both regimes. If tumors evolve from being confined within their site of origin to progressively invading adjacent tissues and organ compartments, they traverse different regions of the birth-death ratio parameter space. Moreover, the timing of transitions between phases of more or less constrained growth is likely closely tied to adaptation dynamics, since breaching barriers to expansion requires adapting to novel environments and selection pressures.

      We hope that the revised version of the manuscript conveys these points more clearly, and thank you for alerting us to this imbalance in the original version of our manuscript.

      (2) The parameters 0.5 (decrease in fitness) and 1.5 (increase in fitness) seem exaggerated (the typical values for the selective advantage are usually much lower (by an order of magnitude). The same goes for the mutation rate. The authors chose values of the order 0.001, while in cancer (and generally) it is much lower than that (10-5 - 10-6). I think that generally, the authors should present a more systematic analysis of the sensitivity of the results to these parameters.

      Thank you very much for this very important comment. We have made this a major focus in our revisions (see our reply to the editor’s comments). As suggested, we have now added further analyses to explore more biologically relevant parameter regimes. Reviewer 2 has made a similar remark, and to avoid redundancies, we point for a more detailed response to our response to that comment (A1).

      (3) In Figure 3, the authors explore the sensitivity to mu (mutation rate) and k (temporal clustering) and find a non-monotonic behavior (Figure 3C). However, this behavior is not well explained. I think some more explanations are required here.

      Thank you for pointing this out. We had initially relegated the more detailed explanations to the SI2 (which in the revised manuscript became SI4), but are happy to provide more elaboration in the main text, and have done so now (P. 5).

      For , the non-monotonicity reflects the exploration-exploitation tradeoff that this section is dedicated to very small  values (little exploration) prevent the population from finding fitness peaks. In contrast, once a fitness peak is reached, excessively large  values (little exploitation) scatter the population away from this peak to points of lower fitness.

      For , the most relevant dynamic is that at high , the population becomes unable to find close-by fitness improvements (1-step adaptations) if it is not in a burst. As 𝑘 increases, this delay in adaptation (until a burst occurs) eventually comes to outweigh the benefits of high 𝑘 (better ability to undergo multi-step adaptations). Additionally, if 𝑘 ∙ μ becomes very large, clonal interference eventually leads to diminishing exploration-returns when 𝑘 is increased further (Fig 5C), as the per-cell likelihood of finding a specific fitness peak eventually saturates and increasing  only causes multiple cells to find the same peak, rather than one cell finding this peak and its lineage fixating in the population.

      (4) In Figure 5, where the authors show the accumulation of the first (red; deleterious mutation) and second (blue; advantageous mutation), it seems that the fraction of deleterious mutations is much lower than that of advantageous mutations. This is opposite to the case of cancer, where most of the mutations are 'passengers', (slightly) deleterious or neutral mutations. Can the author explain this discrepancy and generally the relation of their parametrization to deleterious vs. advantageous mutations?

      Thank you for this comment. In general, we have focused attention in our paper on sequences of mutations that bring about a fitness increase. We call those sequences ‘adaptations’ and categorize these as one-step or multi-step, depending on whether or not they contain intermediates states with a fitness disadvantage.

      In our modelling, we do not consider mutations that are simply deleterious and are not a necessary part of a multi-step adaptation sequence. The motivation for this abstraction is, firstly, to focus on adaptation dynamics, and secondly, that in certain limits (small mu and large constant population sizes), lineages with only deleterious mutations have a probability close to one of going extinct, so that any emerging deleterious mutant would likely be 'washed out’ of the population before a new mutation emerges.

      However, whether the dynamics of how neutral or deleterious passenger mutations are acquired also vary relevantly with the extent of temporal clustering is a valid and interesting question that would warrant its own study. The types of theoretical arguments for such an investigation would be very similar to the ones we use in our paper.

      (5) The theoretical investigation assumes a multi/2-step adaptation scenario where the first mutation is deleterious and the second is advantageous. I think this should be generalized and further explored. For example, what happens when there are multiple mutations that are slightly deleterious (as probably is the case in cancer) and only much later mutations confer a selective advantage? How stable is the "valley crossing" if more deleterious mutations occur after the 2 steps?

      This is also an important point and relates in part to the previous comment (4).  For discussion of interactions with deleterious mutations, please see the reply to comment (4).  

      Regarding generalizations of this valley-crossing scenario, note that any sequence of mutations that increases fitness can be decomposed into sequences of either one-step or multi-step adaptations, as defined  in the paper. Therefore, if all intermediate states before the final selectively advantageous state have a selective disadvantage making the lineages of such cells likely to go extinct, then our derivations in S1 apply, and the relative effect of temporal clustering becomes where n is the number of intermediate states. If, conversely, any of the intermediate states already had a selective advantage, then our model would consider the subsequence until this first mutation with a selective advantage as its individual (one-step or multi-step) “adaptation”.

      The second question, “How stable is the "valley crossing" if more deleterious mutations occur after the 2 steps?”, touches on a different property of the population dynamics, namely on how the fate of a mutant lineage depends on how this lineage emerged. In our paper, we compare different levels of temporal clustering for a fixed average mutation rate. This choice implies that, if we assume that the mutant that emerges from a valley-crossing does not go extinct, then the number of deleterious mutations expected to occur in this lineage, once emerged, will not depend on the extent of temporal clustering. However, if in-burst mutation rates increased the expected burden of early acquired deleterious mutations sufficiently much to affect the probability that the lineage with a multi-step adaptation goes extinct before the burst ends, then there may indeed be an interaction between effects of deleterious passengers and temporal clustering. We would, however, expect effects on this probability of early extinction to be relatively minor, since such a lineage with a selective advantage would quickly grow to large cell-numbers implying that it would require a large number of co-occurring and sufficiently deleterious mutations across these cells for the lineage to go extinct.

      (6) For the empirical analysis of TCGA cohorts, the authors focus on the contribution of APOBEC mutations (via signature analysis) to temporal mutagenesis. They find only a few cancer types (Figure 4D) that follow their prediction (in Figure 4C) of a correlation between TSG deactivation and temporal mutations in bursts. I think two main points should be addressed:

      Thank you for this comment. We will respond in detail to the corresponding points below, but would like to note here that while we find this correlation “in only a few cancer types”, we also show that only few cancer types have relevant proportions of mutations caused by APOBEC, and it is precisely in these cancer types that we find a correlation.  We have clarified this aspect in the revised version of the manuscript (P.7).

      (i) APOBEC is not the only cause for temporal mutagenesis. For example, elevated ROS and hypoxia are also potential contributors - it might therefore be important to extend the signature analysis (to include more possible sources for temporal mutagenesis). Potentially, such an extension may show that more cancer types follow the author's prediction.

      Thank you for this interesting suggestion. We have now included analogous analyses for contributions of signature SBS18 which is associated with ROS mutagenesis, and for the joint contribution of signatures SBS17a, SBS17b, SBS18 and SBS36, which all have been shown (some in a more context-dependent manner) to be associated with ROS mutagenesis. When doing so, we do not find a clear trend. However, we also do not find these signatures to account for substantial proportions of the acquired mutations, meaning that ROS mutagenesis likely also does not account for much of the variation in how temporally clustered the mutation rate trajectories of different tumors are. We have incorporated these results and their discussion in the manuscript (SI5 and Fig S8).

      (ii) The TSG deactivation score used by the authors only counts the number of mutations and does not consider if the 2 mutations are biallelic, which is highly important in this case. There are ways to investigate the specific allele of mutations in TCGA data (for example, see Ciani et al. Cell Sys 2022 PMID: 34731645). Given the focus on TSG of this study, I think it is important to account for this in the analysis.

      Thank you for making this point. We did initially consider inferring allele-specific mutation status, but decided against it as this would have shrunk our dataset substantially, thus potentially introducing unwanted biases. Determining whether two mutations lie on the same or on different alleles requires either (1) observing sequencing reads that either cover the loci of both mutations, or (2) tracing whether (sets of) other SNPs on the same gene co-occur exclusively with one of the two considered mutations. These requirements lead to a substantial filtering of the observed mutations. Moreover, this filtering would be especially strong for tumors with a small overall mutation burden, as these would have fewer co-occurring SNPs to leverage in this inference. We would have hence preferentially filtered out TSG-deactivating mutations in tumors with low mutation burden. We have modified the text to address this point (P.14).

      (7) To continue point 4. I wonder why some known cancer types with high APOBEC signatures (e.g., lung, mentioned in the introduction) do not appear in the results of Figure 4. Can the author explain why it is missed?

      We do provide complete results for all categories in Supplementary Figure 3. To not overwhelm the figure in the main text, we only show the four categories with the highest average APOBEC signature contribution, beyond those four, average APOBEC signature contributions quickly drop. Lung-related categories do not feature in these top four (Lung squamous cell carcinoma are fifth and Lung adenocarcinoma are eighth in this ordering).

      Minors:

      (1) It is worth mentioning the relevance to resistance to treatment (see https://www.nature.com/articles/s41588-025-02187-1).

      Thank you for this suggestion. We have included a mention of the relation to this paper in the discussion section (P. 11).

      (2) Some of the figures' resolution should be improved - specifically, Figures 4, S1, and S5, which are not clear/readable.

      Thank you for pointing this out. This was the result of conversion to a word document. We will provide tif files in the revisions to have better resolution.

      (3) Regarding Figure 3e,f. How come that moving from K=1 to K=I doesn't show any changes in fitness - it looks as if in both cases the value fluctuates around comparable mean fitness? Is that the case?

      While fitness differences between simulations with different k manifest robustly over long time-horizons (see Fig 3C with results over  generations), there are various sources of substantial stochasticity that make the fitness values in these short-term plots (Fig3D-F) imperfect illustrations of how long-term average fitness behaves. For instance, fitness landscapes are drawn randomly which introduces variability in how high and how close-by different fitness peaks are. Similarly, there is substantial randomness since both the type (direction on the 2-D fitness landscape) and the timing of mutation are stochastic.

      The short-term plots in Fig3D-F are intended to showcase representative dynamics of transitions between points on the genotype space with different fitness values following a redrawing of the landscape – but not necessarily to provide a comparison between the height of the attained (local) fitness-maxima.  

      (4) Figures 4c,d - correlation should be Spearman, not Pearson (it's not a linear relationship).

      Thank you for this comment. As a robustness check, we have generated the same figures using Spearman and not Pearson correlations and find results that are qualitatively consistent with the initially shown results. Indeed, using Spearman correlations, all four cancer types from Fig 4D have significant correlations.

      (5) Typo for E) "...in samples of the cancer types in (C) were caused by APOBEC" - it should be D (not C) I guess.

      Thank you for catching this. We fixed the typo.

      (6) Figure 5 - the mutation rate is too high (0.001), sensitivity to that? Also the fitness change is exaggerated (0.5, 1.5), and the division of mutations to 100 and 100 (200 in total) loci is not clear.

      Thank you for making this point. In this simulation setting it is unfortunately computationally prohibitively expensive to perform simulations at biologically realistic mutation rates. Therefore, we have scaled up the mutation rate while scaling down the population size. Moreover, the choice of model here is not meant to resemble a biologically realistic dynamic, but rather to create a stylized setting to be able to consider the interplay between clonal interference and facilitated valley-crossing in isolation. The key result from this figure is the separation of time scales at which low or high temporal clustering maximizes adaptability.

      However, known parameter dependencies in these models allow us to reason about how tuning individual parameters of this stylized model would affect the relative importance of effects of clonal interference. This relative importance is largest when mutants are likely to co-occur on different competing clones in a population. The likelihood of such co-occurrences decreases substantially if decreasing the mutation rate to biologically realistic values. However, this likelihood also sensitively depends on the time that it takes a clone with a one-step adaptation to spread through the population. Smaller fitness advantages, as well as larger population sizes, slow down this process of taking over the population, which increases the likelihood of clonal interference. We now discuss these points in our revised manuscript (P. 8).

      7) In the results text (last section) "Performing simulations for 2-step adaptations, we found that fixation rates are non-monotone in k. While at low k increasing k leads to a steep increase in the fixation rate, this trend eventually levels off and becomes negative, with further increases in k leading to a decrease in the fixation rate". Where are the results of this? It should be bold and apparent.

      Thank you for alerting us that this is unclear. The relevant figure reference is indeed Fig 5C as in the preceding passage in the manuscript. However, we noticed that due to the presence of the steadily decreasing black line for 1-step adaptations, it is not easy to see that also the blue line is downward sloping. We have added a further reference to Fig 5C, and have adapted the grid spacing in the background of that figure-panel to make this trend more easily visible.

      (8) Although not inconceivable, conclusions regarding resistance in the discussion are overstated. If you want to make this statement, you need to show that in resistant tumors, the temporal mutagenesis is responsible for progression vs. non-resistant/sensitive cases (is that the case), otherwise this should be toned down.

      Thank you for pointing this out. We have tempered these conclusions in the revised version of the manuscript (P. 11).

      Reviewer #2 (Recommendations for the authors):

      (1) It might be useful to look specifically at X-linked TSGs. On the authors' interpretation, their relative inactivation rates should not be correlated with APOBEC signatures in males (but should be in females), though the size of the dataset may preclude any definite conclusions.

      Thank you for this suggestion. Indeed, the size of the dataset unfortunately makes such analyses infeasible. Moreover, it is not clear whether X-linked TSGs might have structurally different fitness dynamics than TSGs on other chromosomes. However, this is an interesting suggestion worth following up on as more synergistic pairs confined to the X-chromosome are getting identified.

      (2) Might there be value in distinguishing tumors that carry mutations expected to increase APOBEC expression from those that do not? Among several reasons, an APOBEC signature due to such a mutation and an APOBEC signature due to abortive viral infection may differ with respect to the degree of punctuation.

      This is also an interesting suggestion for future investigations, but for which we unfortunately do not have sufficient information to build a meaningful analysis. In particular, it is unclear to what extent the degree and manifestation of episodicity/punctuation varies between these different mechanisms. Burst duration and intensity, as well as out-of-burst baseline rates of APOBEC mutagenesis likely differ in ways that are yet insufficiently characterized, which would make any result of analyses like these in Fig 4 hard to interpret.

      (3) Also, in that paragraph, is "proportional to" used loosely to mean "an increasing function of"?

      Thank you for this comment. We are not quite sure which paragraph is meant, but we use the term “proportional” in a literal sense at every point it is mentioned in the paper.

      For the occurrences of the term on pages 3, 10 and 11, the word is used in reference to probabilities of reproduction (division in the branching process, or ‘being drawn to populate a spot in the next generation’ in the WF process) being “proportional” to fitness. These probabilities are constructed by dividing each individual cell’s fitness by the total fitness summed across all cells in the population. As the population acquires fitness-enhancing mutations, the resulting proportionality constant (1/total_fitness) changes, so that the mapping from ‘fitness’ to probability of reproduction in the next reproduction event changes over time. Nevertheless, this mapping always remains fitness-proportional.

      On page 4, the term is used as follows: “the absolute rates 𝑓<sub>𝑘</sub> and 𝑓<sub>1</sub> are proportional to µ<sup>n+1”</sup>. Here, proportionality in the literal sense follows from the equations on page 20, when setting , so that the second factor becomes µ<sup>n+1</sup>.  We have included a clarifying sentence to address this in the derivations (SI1).

      (4) It could be mentioned in the main text that the time between bursts (d) must not be too short in order for the effect to be substantial. I would think that the relevant timescale depends on how deleterious the initial mutation is.

      Thank you for making this interesting and very relevant point. We have included a section (SI3) and Figure (Fig S4) in the supplement to investigate the dependence on d. In short, we find that effects are weaker for small inter-burst intervals. The sensitivity to the burst size is highest for inter-burst intervals that are sufficiently small so that the lineage of the first mutant has relevant probability of surviving long enough to experience multiple burst phases.

      (5) Why not report that relative rate for Figure 2E as for 2D, as the former would seem to be more relevant to TSGs? And why was it assumed that the first inactivation is deleterious in the simulations in Figure 4 if the goal is to model TSGs?

      Thank you for noting this. For how we revised the paper to better connect Figures 2 and 4, please see our comment (A1) above. In general, neither 2E nor 2D should serve as quantitative predictions for what effect size we should expect in real world data, but are rather curated illustrations of the general phenomenon that we describe: we chose high mutation rates and exaggerated fitness effects so that dynamics become visually tractable in small simulation examples.

      For figure 4, assuming that the first inactivation is deleterious achieves that the branching process for the mutant lineage becomes subcritical, which keeps the simulation example simple and illustrative. For more comprehensive motivation of the approach in 4D, and especially the discussion of how fitness effects of different magnitudes may or may not be subject to the effects we describe depending on whether the population is in a phase of constant or growing population size, we refer the reader to our added section SI2, and the added discussion on pages 6 and 10.

      (6) Figure 2, D and E. I'm not sure why heatmaps with height one were provided rather than simple plots over time. It is difficult, for example, to determine from a heatmap whether the increase is linear or the relative rates with and without punctuation.

      Thank you for this comment. These are not heatmaps with height one, but rather for every column of pixels, different segments of that column correspond to different clones within that population. This approach is intended to convey the difference in dynamics between the results in Fig 2 and the analogous results for a branching process in Fig S1. In Fig 2, valley-crossings happen sequentially, with subsequent fixations of adapted mutants. In Fig S1, with a growing population size, multiple clones with different numbers of adaptations coexist. We have now adapted the caption of Fig 2 to clarify this point.

      (7) Page 3: "High mutation rates are known to limit the rate of 1-step adaptations due to clonal interference." This is a bit misleading, as it makes it sound like increasing the mutation rate decreases the rate of one-step adaptations.

      Thank you for alerting us to this poor phrasing. We have changed it in the revised version of the manuscript (P. 3).

      (8) Page 4: "proportional to \mu^{n+1}" Is "proportional" being used loosely for "an increasing function of"?

      It is meant in the literal mathematical sense (see response to comment (3))

      (9) Page 5, near bottom: "at least two mutations across the population". In the same genome?

      We counted mutations irrespective of whether they emerged in the same genome, to remain analogous to the TCGA analyses for which we also do not have single cell-resolved information.

      (10) Page 6: "missense or nonsense mutation". What about indels? If these are not affected by APOBEC, omitting them will exaggerate the effect of punctuation.

      Thank you for pointing out that this focus on single nucleotide substitutions conveys an exaggerated image of the importance of this effect of APOBEC-driven mutagenesis. There are of course several other classes of (epi)genomic alterations (e.g. chromatin modifications, methylation changes, copy number changes) that we do not consider in this part of our analysis. APOBEC mutagenesis serves as an example of a temporally clustered mutation process, which we investigate in its domain of action.

      We have added further discussion (P. 10-11) to convey that our empirical results merely constitute an investigation of whether empirical patterns are consistent with our hypothesis, but that the narrow focus on only SNVs, only TSGs, and only APOBEC mutagenesis does not allow for a general quantitative statement about the in-vivo relevance of the phenomena we describe.

      (11) Page 6: "normalized by the total number of single nucleotide substitutions." It is difficult to know how to normalize correctly, but I might think that the square of the number of substitutions would be more appropriate. Perhaps the total numbers are close enough that it matters little.

      Thank you for noting this. In the revised manuscript we have now expanded this passage in the text to more clearly convey our motivations for why we normalize by the total number of single nucleotide substitutions. While the likelihood for crossing a fitness valley with 2 mutations is indeed proportional to the square of the mutation rate, we do not directly observe mutation rates from our data.  Rather, we observe the number of acquired single nucleotide substitutions for every tumor sample, but since tumors in our data differ in the time since initiation and therefore differ in the numbers of divisions their cells have undergone before being sequenced, we cannot directly infer mutation rates. One way to phrase our main result about valley-crossing is that temporally clustered mutation processes have an increased rate of successful valley-crossings per attempted valley crossing. Our TSG deactivation score is constructed to reflect this idea. The number of TSGs serves as a proxy for successful valley-crossings and the total mutation burden serves as a proxy for attempted valley-crossings.

      To convey these points more clearly, we have rewritten the first paragraph in the Section “Proxies for valley crossing and for temporal clustering found in patient data” (P.6)

      (12) Perhaps embed links to the COSMIC web pages for SBS2 and SBS13 in the text.

      Thank you for this suggestion. We have embedded the links at the first mention of SBS2 and SBS13 in the text.

    1. 这是一个非常典型的“PNC(规控)选手误入CV(计算机视觉)深水区”的课表。

      作为PNC架构师,我必须敲醒你:李沐老师的课虽然是神课,但他是讲CV和NLP通用的。如果你全看,至少浪费50%的时间。 对于PNC算法岗,你的核心战场是“时序预测”“决策逻辑”,而不是教车子怎么“看图”。

      以下是基于PNC高薪Offer标准的剪裁版学习指南


      第一部分:绝对核心区 (必修 - 死磕)

      优先级:S+ 理由:这是现代自动驾驶Prediction(预测)和End-to-End Planning(端到端规划)的基石。不懂这个,你只能做传统的规则代码,拿不到顶薪。

      • 7月17日 - 7月18日:序列模型、RNN
        • PNC视角:把课程里的“文本/单词”自动脑补替换成“车辆历史轨迹点 (x, y, v, a)”。预测旁车未来3秒怎么走,本质上就是个语言模型(Next Token Prediction)。
      • 7月25日:GRU、LSTM
        • 面试考点:LSTM怎么解决梯度消失的?在轨迹预测(Social-LSTM)里怎么用?
        • 要求手写代码。弄懂Input/Output的维度。
      • 8月7日:Seq2Seq、Encoder-Decoder、束搜索 (Beam Search)
        • PNC视角:这是轨迹生成的标准架构。输入过去5秒轨迹(Encoder),输出未来5秒轨迹(Decoder)。
        • 实战痛点:Beam Search用于生成多模态轨迹(比如预测前车可能直行,也可能左转,这是两条不同的Beam)。
      • 8月8日:注意力机制 (Attention)
        • PNC视角:核心中的核心。用于处理交互 (Interaction)。比如:自车在规划时,应该关注左边的车还是右边的车?Attention Score告诉你答案。
      • 8月14日 - 8月15日:Transformer、BERT
        • 判决学死它
        • 理由:现在的SOTA预测模型(如VectorNet, TNT)和端到端规划(UniAD)全是Transformer架构。面试必问 Self-Attention 的 $O(n^2)$ 复杂度怎么优化。

      第二部分:上下游常识区 (选修 - 速通概念)

      优先级:A 理由:PNC的输入是Perception发来的。你不需要会写检测算法,但你必须懂“输入数据”的特性,才能在规划层做鲁棒性处理(Safety Shield)。

      • 6月27日:物体检测基础 (边缘框、锚框)
        • PNC视角:你需要懂 Bounding Box (bbox)。
        • 痛点:感知发给你的bbox经常会抖动(跳变)。如果你不懂IOU(交并比)和NMS(非极大值抑制),你就不知道怎么在规划层过滤这些噪声。
      • 7月3日:YOLO vs R-CNN
        • PNC视角:只看实时性对比。
        • 考点:为什么车端常用YOLO(快,One-stage)而不用R-CNN?因为规控需要感知在30ms内给结果。
      • 7月10日:语义分割
        • PNC视角:懂概念即可。
        • 理由:语义分割的输出就是PNC里的Costmap(代价地图)可行驶区域 (Drivable Area)。你要知道这个东西是怎么来的,边缘为什么会不准。
      • 8月21日:优化算法
        • PNC视角:SGD, Adam。虽然PNC主要用QP/SQP求解器,但深度学习优化器的思想(动量、梯度下降)对理解非凸优化有帮助。

      第三部分:垃圾时间 (跳过 - 别浪费生命)

      优先级:C (Direct Pass) 理由:这是CV算法岗或者Infra架构师的活,PNC看了没用。

      • 6月19日:多GPU训练、分布式
        • 判决:[跳过]
        • 理由:除非你去搞自动驾驶云端训练平台。
      • 6月20日:图像增广
        • 判决:[跳过]
        • 理由:这是感知的活。规划的增广是对轨迹做扰动,不是对图片做旋转/裁剪。
      • 6月26日:Kaggle 图像分类实战
        • 判决:[跳过]
        • 理由:纯CV比赛,对PNC无加分。
      • 7月11日:样式迁移 (Style Transfer)
        • 判决:[跳过]
        • 理由:那是搞滤镜App用的,车不需要画梵高风格的画。
      • 7月4日:多尺度检测实现、SSD实现
        • 判决:[跳过]
        • 理由:实现细节太繁琐,面试不考规控人员手写SSD Loss。

      导师的“一针见血”总结

      你的学习路径应该是这样的:

      1. 略过前面的CV卷积部分(知道卷积是提特征的就行)。
      2. 空降7月17日 (RNN) 开始认真学。
      3. 死磕 Transformer 及其变体。
      4. 思维转换:每当李沐老师说“单词”时,你脑子里要自动替换成“(x, y)”;每当他说“句子”时,你替换成“轨迹”。

      下一步建议: 学完 Transformer 后,不要去微调 BERT 做文本分类。 直接去找一篇自动驾驶轨迹预测的经典论文(比如 VectorNet 或 Trajectron++)的代码读一下,看看他们是怎么用 LSTM/Transformer 处理 (x,y) 坐标序列的。这才是把你学到的东西变现的唯一路径。

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Jeong and Choi examine neural correlates of behavior during a naturalistic foraging task in which rats must dynamically balance resource acquisition (foraging) with the risk of threat. Rats first learn to forage for sucrose reward from a spout, and when a threat is introduced (an attack-like movement from a "LobsterBot"), they adjust their behavior to continue foraging while balancing exposure to the threat, adopting anticipatory withdraw behaviors to avoid encounter with the LobsterBot. Using electrode recordings targeting the medial prefrontal cortex (PFC), they identify heterogenous encoding of task variables across prelimbic and infralimbic cortex neurons, including correlates of distance to the reward/threat zone and correlates of both anticipatory and reactionary avoidance behavior. Based on analysis of population responses, they show that prefrontal cortex switches between different regimes of population activity to process spatial information or behavioral responses to threat in a context-dependent manner. Characterization of the heterogenous coding scheme by which frontal cortex represents information in different goal states is an important contribution to our understanding of brain mechanisms underlying flexible behavior in ecological settings.

      Strengths:

      As many behavioral neuroscience studies employ highly controlled task designs, relatively less is generally known about how the brain organizes navigation and behavioral selection in naturalistic settings, where environment states and goals are more fluid. Here, the authors take advantage of a natural challenge faced by many animals - how to forage for resources in an unpredictable environment - to investigate neural correlates of behavior when goal states are dynamic. Related to his, they also investigate prefrontal cortex (PFC) activity is structured to support different functional "modes" (here, between a navigational mode and a threat-sensitive foraging mode) for flexible behavior. Overall, an important strength and real value of this study is the design of the behavioral experiment, which is trial-structured, permitting strong statistical methods for neural data analysis, yet still rich enough to encourage natural behavior structured by the animal's volitional goals. The experiment is also phased to measure behavioral changes as animals first encounter a threat, and then learn to adapt their foraging strategy to its presence. Characterization of this adaptation process is itself quite interesting and sets a foundation for further study of threat learning and risk management in the foraging context. Finally, the characterization of single-neuron and population dynamics in PFC in this naturalistic setting with fluid goal states is an important contribution to the field. Previous studies have identified neural correlates of spatial and behavioral variables in frontal cortex, but how these representations are structured, or how they are dynamically adjusted when animals shift their goals, has been less clear. The authors synthesize their main conclusions into a conceptual model for how PFC activity can support mode switching, which can be tested in future studies with other task designed and functional manipulations.

      Weaknesses:

      While the task design in this study is intentionally stimulus-rich and places minimal constraint on the animal to preserve naturalistic behavior, this also introduces confounds that limit interpretability of the neural analysis. For example, some variables which are the target of neural correlation analysis, such as spatial/proximity coding and coding of threat and threat-related behaviors, are naturally entwined. To their credit, the authors have included careful analyses and control conditions to disambiguate these variables and significantly improve clarity.

      The authors also claim that the heterogenous coding of spatial and behavioral variables in PFC is structured in a particular way that depends on the animal's goals or context. As the authors themselves discuss, the different "zones" contain distinct behaviors and stimuli, and since some neurons are modulated by these events (e.g., licking sucrose water, withdrawing from the LobsterBot, etc.), differences in population activity may to some extent reflect behavior/event coding. The authors have included a control analysis, removing timepoints corresponding to salient events, to substantiate the claim that PFC neurons switch between different coding "modes." While this significantly strengthens evidence for their conclusion, this analysis still depends on relatively coarse labeling of only very salient events. Future experiment designs, which intentionally separate task contexts (e.g. navigation vs. foraging), could serve to further clarify the structure of coding across contexts and/or goal states.

      Finally, while the study includes many careful, in-depth neural and behavioral analyses to support the notion that modal coding of task variables in PFC may play a role in organizing flexible, dynamic behavior, the study still lacks functional manipulations to establish any form of causality. This limitation is acknowledged in the text, and the report is careful not to over interpret suggestions of causal contribution, instead setting a foundation for future investigations.

      Thank you for the positive comment. We also acknowledge the inherent drawbacks of studying naturalistic behavior. As you also mentioned in the second round of review, separating navigation and foraging tasks in a larger apparatus, such as the one illustrated below, could better distinguish neural activity patterns associated with these different task types. To address the limitations of the current study, we have revised the report to avoid overinterpretation or unwarranted assumptions, and we appreciate that you have recognized this effort.

      Author response image 1.

      Reviewer #2 (Public review):

      Summary:

      Jeong & Choi (2023) use a semi-naturalistic paradigm to tackle the question of how the activity of neurons in the mPFC might continuously encode different functions. They offer two possibilities: either there are separate dedicated populations encoding each function, or cells alter their activity dependent on the current goal of the animal. In a threat-avoidance task rats procurred sucrose in an area of a chamber where, after remaining there for some amount of time, a 'Lobsterbot' robot attacked. In order to initiate the next trial rats had to move through the arena to another area before returning to the robot encounter zone. Therefore the task has two key components: threat avoidance and navigating through space. Recordings in the IL and PL of the mPFC revealed encoding that depended on what stage of the task the animal was currently engaged in. When animals were navigating, neuronal ensembles in these regions encoded distance from the threat. However, whilst animals were directly engaged with the threat and simultaneously consuming reward, it was possible to decode from a subset of the population whether animals would evade the threat. Therefore the authors claim that neurons in the mPFC switched between two functional modes: representing allocentric spatial information, and representing egocentric information pertaining to the reward and threat. Finally, the authors propose a conceptual model based on these data whereby this switching of population encoding is driven by either bottom-up sensory information or top-down arbitration.

      Strengths:

      Whilst these multiple functions of activity in the mPFC have generally been observed in tasks dedicated to the study of a singular function, less work has been done in contexts where animals continuously switch between different modes of behaviour in a more natural way. Being able to assess whether previous findings of mPFC function apply in natural contexts is very valuable to the field, even outside of those interested in the mPFC directly. This also speaks to the novelty of the work; although mixed selectivity encoding of threat assessment and action selection has been demonstrated in some contexts (e.g. Grunfeld & Likhtik, 2018) understanding the way in which encoding changes on-the-fly in a self-paced task is valuable both for verifying whether current understanding holds true and for extending our models of functional coding in the mPFC.

      The authors are also generally thoughtful in their analyses and use a variety of approaches to probe the information encoded in the recorded activity. In particular, they use relatively close analysis of behaviour as well as manipulating the task itself by removing the threat to verify their own results. The use of such a rich task also allows them to draw comparisons, e.g. in different zones of the arena or different types of responses to threat, that a more reduced task would not otherwise allow. Additional in-depth analyses in the updated version of the manuscript, particularly the feature importance analysis, as well as complimentary null findings (a lack of cohesive place cell encoding, and no difference in location coding dependent on direction of trajectory) further support the authors' conclusion that populations of cells in the mPFC are switching their functional coding based on task context rather than behaviour per se. Finally, the authors' updated model schematic proposes an intriguing and testable implementation of how this encoding switch may be manifested by looking at differentiable inputs to these populations.

      Weaknesses:

      The main existing weakness of this study is that its findings are correlational (as the authors highlight in the discussion). Future work might aim to verify and expand the authors' findings - for example, whether the elevated response of Type 2 neurons directly contributes to the decision-making process or just represents fear/anxiety motivation/threat level - through direct physiological manipulation. However, I appreciate the challenges of interpreting data even in the presence of such manipulations and some of the additional analyses of behaviour, for example the stability of animals' inter-lick intervals in the E-zone, go some way towards ruling out alternative behavioural explanations. Yet the most ideal version of this analysis is to use a pose estimation method such as DeepLabCut to more fully measure behavioural changes. This, in combination with direct physiological manipulation, would allow the authors to fully validate that the switching of encoding by this population of neurons in the mPFC has the functional attributes as claimed here.

      I wanted to add a minor comment about interpreting the two possible accounts presented in fig. 8 to suggest a third possibility: that both bottom-up sensory and top-down arbitration mechanisms can occur simultaneously to influence whether the activity of the population switches. Indeed, a model where these inputs are balanced or pitted against each other, so to speak, to continuously modulate encoding in the mPFC seems both adaptive and likely. Further, some speculation on the source of the 'arbitrator' in the top-down account would make this model more tractable for future testing of its validity.

      We thank the reviewer for highlighting this important perspective. We fully agree that an intricate and recurrent interaction between bottom-up and top-down modulations is a highly plausible account of how the mPFC changes its encoding mode. In line with this suggestion, we have incorporated this idea as a third possibility in the revised Discussion, alongside an updated version of Figure 8 that explicitly illustrates this competitive model.

      Although we were unable to identify a definitive study directly measuring how the mPFC switches encoding modes across tasks, we did find relevant human EEG and fMRI studies addressing this issue. Based on these findings, we now propose the anterior cingulate cortex (ACC) as a potential hub for top-down arbitration. We have added a paragraph in the Discussion describing this possibility and its implications for future testing.

      “Which brain region might act as this arbitrator? Evidence from human neuroimaging studies implicates the anterior cingulate cortex (ACC) as a central hub for switching cognitive modes. During task switching, the ACC shows increased activation (Hyafil et al., 2009), enhances connectivity with task-specific regions (Aben et al., 2020), correlates with multitask performance (Kondo et al., 2004), and monitors the reliability of competing decision systems (Lee et al., 2014). Collectively, these findings point to a pivotal role for the ACC in coordinating task assignment. Rodent studies also link the ACC to strategic mode switching (Tervo et al., 2014), suggesting that the rodent ACC could similarly arbitrate between strategies, determining which task-relevant variables are represented in the ventral mPFC, including the PL and IL. Future studies combining multi-context tasks with causal manipulations will be essential to determine whether these functional shifts are driven primarily by top-down arbitration or by bottom-up sensory inputs.”

      Reviewer #3 (Public review):

      Summary:

      This study investigates how various behavioral features are represented in the medial prefrontal cortex (mPFC) of rats engaged in a naturalistic foraging task. The authors recorded electrophysiological responses of individual neurons as animals transitioned between navigation, reward consumption, avoidance, and escape behaviors. Employing a range of computational and statistical methods, including artificial neural networks, dimensionality reduction, hierarchical clustering, and Bayesian classifiers, the authors sought to predict from neural activity distinct task variables (such as distance from the reward zone and the success or failure of avoidance behavior). The findings suggest that mPFC neurons alternate between at least two distinct functional modes, namely spatial encoding and threat evaluation, contingent on the specific location.

      Strengths:

      This study attempt to address an important question: understanding the role of mPFC across multiple dynamic behaviors. The authors highlight the diverse roles attributed to mPFC in previous literature and seek to explain this apparent heterogeneity. They designed an ethologically relevant foraging task that facilitated the examination of complex dynamic behavior, collecting comprehensive behavioral and neural data. The analyses conducted are both sound and rigorous.

      Weaknesses:

      Because the study still lacks experimental manipulation, the findings remain correlational. The authors have appropriately tempered their claims regarding the functional role of the mPFC in the task. The nature of the switch between functional modes encoding distinct task variables (i.e., distance to reward, and threat-avoidance behavior type) is not established. Moreover, the evidence presented to dissociate movement from these task variables is not fully convincing, particularly without single-session video analysis of movement. Specifically, while the new analyses in Figure 7 are informative, they may not fully account for all potential confounding variables arising from changes in context or behavior.

      Regarding the claim of highly stereotyped behavior, there are some inconsistencies. While the authors assert this, Figure 1F shows inter-animal variability, and the PETHs, representing averaged activity, may not fully capture the variability of the behavior across sessions and animals. To strengthen this aspect, a more detailed analysis that examines the relationship between behavior and neural activity on a trial-by-trial basis, or at minimum, per session, could help.

      We thank the reviewer for this thoughtful recommendation and the opportunity to clarify our use of the term “stereotyped behavior.” By this, we were specifically referring to the animals’ consistent licking behavior in the E-zone, rather than to the latency of head withdrawal, which indeed varied across trials and animals. Because licking tempo and body posture during sucrose consumption were highly consistent, the decision to avoid or stay (AW vs. EW) could not be predicted from overt behavior alone. This consistency strengthens our conclusion that the significant predictive power of the Bayesian decoding analysis reflects intrinsic firing patterns of the mPFC neural network, rather than simple behavioral correlates of avoidance.

      We also note that the Bayesian model was conducted on a trial-by-trial basis, and the reported prediction accuracy of 73% represents the average across all individual trials (Figure 6B, C). Thus, the analysis inherently captures variability across trials and animals, directly addressing the reviewer’s concern.

      The reviewer is correct that the PETHs shown in Figure 5 are based on session-averaged activity aligned to head-entry and head-withdrawal events. The purpose of this analysis was to illustrate that certain modulation patterns could be grouped into 2–3 distinct categories. While averaged activity can provide insight into collective responses to external events, we agree that trial-based analyses provide a more rigorous demonstration of the link between neural ensemble activity and behavioral decisions. This is precisely why we complemented the PETH analysis with Bayesian decoding, which provides stronger evidence that mPFC ensemble activity is predictive of the animal’s choice to avoid or stay.

      Similarly, the claim regarding the limited scope of extraneous behavior (beyond licking) requires further substantiation. It would be more convincing to quantify potential variations in licking vigor and to provide evidence for the absence of significant postural changes.

      To address this concern, we quantified licking vigor using the inter-lick interval (ILI) as an indirect index. A lick was defined as the period from tongue contact with the IR beam (Lick-On) to withdrawal (Lick-Off), and the ILI was calculated as the time between a Lick-Off and the subsequent Lick-On. Across all animals, ILIs were clustered within a narrow range with a median of 0.155 s (see Author response image 4, left panel).

      We analyzed licking vigor at two levels: within trials and within sessions. Because reduced vigor or satiation would lengthen ILIs, comparing the first half and the last half of ILIs within a trial or within a session provides a sensitive proxy for licking consistency.

      Within trials: For each of 2,820 trials, we compared the mean ILI of the first half of licks to that of the second half. The average difference was only ~ 17 ms (middle panel). Across sessions: Trial-averaged ILIs were compared between the first and last halves of each session, yielding a mean difference ~ 1.7 ms per session (right panel).

      These analyses demonstrate that rats maintained stable licking vigor whenever they entered the E-zone, regardless of avoidance outcome.

      Author response image 2.

      Concerning the ANN model, while I understand the choice of a 4-layer network for its performance, the study could have benefited from exploring simpler models. A model where weight corresponds directly to individual neurons could improve interpretability and facilitate the investigation of dynamic changes in neuronal 'modes' (i.e., weight adjustments) over time.

      We fully agree with the reviewer on the importance of biologically interpretable models. While artificial neural networks (ANNs) share certain similarities with neural computation, they are not intended to capture biological realism. For example, the error correction mechanism used in ANNs, such as backpropagation has no direct counterpart in mammalian neural circuits. Although we considered approaches that would link each computational node more directly to the activity of individual neurons, building such a model would require temporally sensitive, mechanistic frameworks (e.g., leaky integrate-and-fire networks) and an extensive behavioral alignment effort, which is beyond the scope of the current study.

      Our use of an ANN was intended solely as an analytical tool to uncover hidden patterns in multi-unit activity that may not be detectable with traditional methods. Among various machine-learning algorithms, we selected a four-layer ANN regressor because it achieved significantly lower decoding errors (Supplementary Figure S3) and showed robustness to hyperparameter variation (Glaser et al., 2020). To acknowledge the limitations of this approach and suggest future directions, we have revised the Results section to explicitly discuss these points.

      “Among various machine learning algorithms, we selected a robust tool for decoding underlying patterns in the data, rather than to model the architecture of the mPFC. We implemented a four-layer artificial neural network regressor (ANN; see Materials and Methods for a detailed structure), as the ANN achieves significantly lower decoding errors (Supplementary Figure S3) and has robustness to hyperparameter changes (Glaser et al., 2020).”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      In their revised manuscript, Chen et al. have added additional data that establishes GPR30 spinal neurons as a population of excitatory neurons, half of which express CCK. These data help to position GPR30 neurons in the existing framework of spinal neuron populations that contribute to neuropathic pain, strengthening the author's findings.

      Thank you very much for your positive feedback and for recognizing the value of our additional data.

      Reviewer #3 (Public review):

      The authors did an excellent job addressing many of the critiques raised. Despite acknowledging that a direct functional corticospinal projection to CCK/GPR30+neurons is not supported by the data and revising the title, these claims still persist throughout the manuscript. Manipulating gene expression or the activity of postsynaptic neurons through a trans-synaptic labeling strategy does not directly support any claim that those upstream neurons are directly modulating spinal neurons through the proposed pathway. Indeed they might, but that is not demonstrated here.

      We sincerely thank the reviewer for this critical insight. We fully agree that our trans-synaptic approach does not provide a direct functional connection. In response, we have revised the manuscript to remove any overstated claims of "direct" modulation and instead emphasize the critical role of spinal GPR30+ neurons. Moreover, we have added a statement in the Discussion to acknowledge this limitation and to highlight that the precise function role of this connection requires further investigation in further studies.

      Reviewer #1 (Recommendations for the authors): 

      I recommend 2 minor corrections to the text and figures

      (1)  Line 131 : "What's more, near-universal CCK+ neurons were co-localized with GPR30 (Fig 2F and G)."

      The additional quantification of the overlap between GPR30 and tdTomato provided by the authors is useful, but there are inconsistencies with how the data are reported in the figures and text, making them difficult to interpret. 2F supports the author's conclusion that approximately 90% of CCK⁺ neurons express GPR30, and about 50% of GPR30⁺ neurons co-express CCK. However, the x-axis labels in 2G appear to have been switched, and suggest that the opposite is true (i.e., most GRPR neurons are CCK+, while only 50% of CCK neurons are GPR30+). Please clarify which is correct throughout the results and discussion sections.

      Thank you for identifying this important error. We apologized for the confusion caused by the mislabeled x-axis in Fig. 2G. The x-axis labels were indeed inadvertently switched. The correct data is that approximately 90% of CCK<sup>+</sup> neurons express GPR30. We have corrected the figure and have carefully reviewed the entire manuscript to ensure all related descriptions and discussions are consistent with the accurate quantification.

      (2) The following sentence describing Figure 5 was hard to follow: Lines 190-192, "Consistent with prior observations, we found that these SDH downstream neurons exhibited colocalization with CCK+ neurons, with 28.1% of mCherry+ neurons expressing CCK (Fig 5I and J)." Since the authors are describing a common population of neurons, a statement describing the coexpression (rather than the colocalization" would more simply summarize their data.

      We thank the reviewer for this helpful suggestion. We fully agree that "coexpression" is a more precise term for the description. We have revised the sentence on Lines 189-190 to read: "Consistent with prior observations, we found that 28.1% of mCherry+ S1-SDH downstream neurons coexpressed CCK (Fig 5I and J)."

      Reviewer #3 (Recommendations for the authors): 

      Additional Recommendations

      The authors did a commendable job revising the manuscript text to improve readability; however, several informal phrases from the original version still persist, or were added (e.g. "by the way").

      We thank the reviewer for this valuable feedback regarding the language. We have conducted a line-by-line review of the entire manuscript to identify all remaining informal phrases, and replaced them with more appropriate phrasing.

      It should be clearly mentioned that spontaneous E/IPSCs were recorded in Figure 4 and Fig S5.

      We thank the reviewer for this helpful suggestion. We have now clearly indicated the spontaneous E/IPSCs in Fig. 4 and Fig. S5 and manuscript.

      The rationale for recording EPSCs from GFP-labeled CCK+ neurons because "a significant proportion of spinal CCK+ neurons form excitatory synapses with upstream neurons" does not make any sense. Do the authors instead mean that CCK neurons receive excitatory inputs from other spinal neurons and intend to test if those synaptic connections are modulated by GPR30?

      We thank the reviewer for this critical correction. Our intended meaning was indeed that CCK<sup>+</sup> neurons receive excitatory inputs from other neurons, and we aimed to test whether those synaptic connections are modulated by GPR30. To avoid confusion, we have revised the manuscript to remove the erroneous statement “Since CCK+ neurons mainly receive excitatory synaptic inputs from upstream neurons, we then intended to test whether GPR30 modulated these synaptic connections.”

      I am confused by the statement on Page 8 "to examine whether GPCR30-mediated EPSCs depend on AMPA mediated currents." Given that sEPSCs were recorded at -70 mV in low Cl internal I'm not sure what other glutamate receptor would be involved. Perhaps the intention was to more directly test whether GPR30 activation acutely modulates AMPAR-mediated EPSCs? However, as the authors acknowledged, this experiment does not necessarily support a solely post-synaptic AMPAR-dependent mechanism.

      We thank the reviewer for this insightful comment and apologize for the lack of clarity. Our intention was indeed to test whether GPR30 activation modulates AMPAR-mediated currents. We have revised the text. In addition, we also emphasize in the Discussion that our data did not rule out the potential pre-synaptic contributions to this effect.

      An elevation in EPSCs within a cell does not necessarily mean that the cell is more excitable, only that it is receiving more excitatory inputs or has an increase in synaptic receptors. The cell may scale down its activity to compensate for this increase. I recommend only drawing conclusions from what the experiments actually tested.

      We thank the reviewer for this crucial clarification. We have revised the manuscript to remove any claims that the cells were "more excitable". Our conclusions now strictly focus on the specific findings that GPR30 activation enhanced the excitatory transmission onto CCK<sup>+</sup> neurons.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In the Late Triassic and Early Jurassic (around 230 to 180 Ma ago), southern Wales and adjacent parts of England were a karst landscape. The caves and crevices accumulated remains of small vertebrates. These fossil-rich fissure fills are being exposed in limestone quarrying. In 2022 (reference 13 of the article), a partial articulated skeleton and numerous isolated bones from one fissure fill of end-Triassic age (just over 200 Ma) were named Cryptovaranoides microlanius and described as the oldest known squamate - the oldest known animal, by some 20 to 30 Ma, that is more closely related to snakes and some extant lizards than to other extant lizards. This would have considerable consequences for our understanding of the evolution of squamates and their closest relatives, especially for their speed and absolute timing, and was supported in the same paper by phylogenetic analyses based on different datasets.

      In 2023, the present authors published a rebuttal (reference 18) to the 2022 paper, challenging anatomical interpretations and the irreproducible referral of some of the isolated bones to Cryptovaranoides. Modifying the datasets accordingly, they found Cryptovaranoides outside Squamata and presented evidence that it is far outside. In 2024 (reference 19), the original authors defended most of their original interpretation and presented some new data, some of it from newly referred isolated bones. The present article discusses anatomical features and the referral of isolated bones in more detail, documents some clear misinterpretations, argues against the widespread but not justifiable practice of referring isolated bones to the same species as long as there is merely no known evidence to the contrary, further argues against comparing newly recognized fossils to lists of diagnostic characters from the literature as opposed to performing phylogenetic analyses and interpreting the results, and finds Cryptovaranoides outside Squamata again.

      Although a few of the character discussions and the discussion of at least one of the isolated bones can probably still be improved (and two characters are addressed twice), I see no sign that the discussion is going in circles or otherwise becoming unproductive. I can even imagine that the present contribution will end it.

      We appreciate the positive response from reviewer 1!

      Reviewer #2 (Public review):

      Congratulations on this thorough manuscript on the phylogenetic affinities of Cryptovaranoides.

      Thank you.

      Recent interpretations of this taxon, and perhaps some others, have greatly changed the field's understanding of reptile origins- for better and (likely) for worse.

      We agree, and note that while it is possible for challenges to be worse than the original interpretations, both the original and subsequent challenges are essential aspects of what make science, science.

      This manuscript offers a careful review of the features used to place Cryptovaranoides within Squamata and adequately demonstrates that this interpretation is misguided, and therefore reconciles morphological and molecular data, which is an important contribution to the field of paleontology. The presence of any crown squamate in the Permian or Triassic should be met with skepticism, the same sort of skepticism provided in this manuscript.

      We agree and add that every testable hypothesis requires skepticism and testing.

      I have outlined some comments addressing some weaknesses that I believe will further elevate the scientific quality of the work. A brief, fresh read‑through to refine a few phrases, particularly where the discussion references Whiteside et al. could also give the paper an even more collegial tone.

      We have followed Reviewer 2’s recommendations closely (see below) and have justified in our responses if we do not fully follow a particular recommendation.

      This manuscript can be largely improved by additional discussion and figures, where applicable. When I first read this manuscript, I was a bit surprised at how little discussion there was concerning both non-lepidosauromorph lepidosaurs as well as stem-reptiles more broadly. This paper makes it extremely clear that Cryptovaranoides is not a squamate, but would greatly benefit in explaining why many of the characters either suggested by former studies to be squamate in nature or were optimized as such in phylogenetic analyses are rather widespread plesiomorphies present in crownward sauropsids such as millerettids, younginids, or tangasaurids. I suggest citing this work where applicable and building some of the discussion for a greatly improved manuscript. In sum:

      (1) The discussion of stem-reptiles should be improved. Nearly all of the supposed squamate features in Cryptovaranoides are present in various stem-reptile groups. I've noted a few, but this would be a fairly quick addition to this work. If this manuscript incorporates this advice, I believe arguments regarding the affinities of Cryptovaranoides (at least within Squamata) will be finished, and this manuscript will be better off for it.

      (2) I was also surprised at how little discussion there was here of putative stem-squamates or lepidosauromorphs more broadly. A few targeted comparisons could really benefit the manuscript. It is currently unclear as to why Cryptovaranoides could not be a stem-lepidosaur, although I know that the lepidosaur total-group in these manuscripts lacks character sampling due to their scarcity.

      We are responding to (1) and (2) together. We agree with the Reviewer that a thorough comparison of Cryptovaranoides to non-lepidosaurian reptiles is critical. This is precisely what we did in our previous study: Brownstein et al. (2023)— see main text and supplementary information therein. As addressed therein, there is a substantial convergence between early lepidosaurs and some groups of archosauromorphs (our inferred position for Cryptovaranoides). Many of those points are not addressed in detail here in order to avoid redundancy and are simply referenced back to Brownstein et al. (2023). Secondly, stem reptiles (i.e., non-lepidosauromorphs and non-archosauromorphs), such as suggested above (millerettids, younginids, or tangasaurids), are substantially more distantly related to Cryptovaranoides (following any of the published hypotheses). As such, they share fewer traits (either symplesiomorphies or homoplasies), and so, in our opinion, we would risk directing losing the squamate-focus of our study.

      We thus respectfully decline to engage the full scope of the problem in this contribution, but do note that this level of detailed work would make for an excellent student dissertation research program.

      (3) This manuscript can be improved by additional figures, such as the slice data of the humerus. The poor quality of the scan data for Cryptovaranoides is stated during this paper several times, yet the scan data is often used as evidence for the presence or absence of often minute features without discussion, leaving doubts as to what condition is true. Otherwise, several sections can be rephrased to acknowledge uncertainty, and probably change some character scorings to '?' in other studies.

      We strongly agree with the reviewer. Unfortunately, the original publication (Whiteside et al., 2021) did not make available the raw CT scan data to make this possible. As noted below in the Responses to Recommendations Section, we only have access to the mesh files for each segmented element. While one of us has observed the specimens personally, we have not had the opportunity to CT scan the specimens ourselves.

      Reviewer #3 (Public review):

      Summary:

      The study provides an interesting contribution to our understanding of Cryptovaranoides relationships, which is a matter of intensive debate among researchers. My main concerns are in regard to the wording of some statements, but generally, the discussion and data are well prepared. I would recommend moderate revisions.

      Strengths:

      (1) Detailed analysis of the discussed characters.

      (2) Illustrations of some comparative materials.

      Thank you for noting the strengths inherent to our study.

      Weaknesses:

      Some parts of the manuscript require clarification and rewording.

      One of the main points of criticism of Whiteside et al. is using characters for phylogenetic considerations that are not included in the phylogenetic analyses therein. The authors call it a "non-trivial substantive methodological flaw" (page 19, line 531). I would step down from such a statement for the reasons listed below:

      (1) Comparative anatomy is not about making phylogenetic analyses. Comparative anatomy is about comparing different taxa in search of characters that are unique and characters that are shared between taxa. This creates an opportunity to assess the level of similarity between the taxa and create preliminary hypotheses about homology. Therefore, comparative anatomy can provide some phylogenetic inferences.

      That does not mean that tests of congruence are not needed. Such comparisons are the first step that allows creating phylogenetic matrices for analysis, which is the next step of phylogenetic inference. That does not mean that all the papers with new morphological comparisons should end with a new or expanded phylogenetic matrix. Instead, such papers serve as a rationale for future papers that focus on building phylogenetic matrices.

      We agree completely. We would also add that not every study presenting comparative anatomical work need be concluded with a phylogenetic analysis.

      Our criticism of Whiteside et al. (2022) and (2024) is that these studies provided many unsubstantiated claims of having recovered synapomorphies between Cryptovaranoides and crown squamates without actually having done so through the standard empirical means (i.e., phylogenetic analysis and ancestral state reconstruction). Both Whiteside et al. (2022) and (2024) indicate characters presented as ‘shared with squamates’ along with 10 characters presented as synapomorphies (10). However, their actual phylogenetically recovered synapomorphies were few in number (only 3) and these were not discussed.

      Furthermore, Whiteside et al. (2022) and (2024) comparative anatomy was restricted to comparing †Cryptovaranoides to crown squamates., based on the assumption that †Cryptovaranoides was a crown squamate and thus only needed to be compared to crown squamates.

      In conclusion, we respectfully, we maintain such efforts are “non-trivial substantive methodological flaw(s)”.

      (2) Phylogenetic matrices are never complete, both in terms of morphological disparity and taxonomic diversity. I don't know if it is even possible to have a complete one, but at least we can say that we are far from that. Criticising a work that did not include all the possibly relevant characters in the phylogenetic analysis is simply unfair. The authors should know that creating/expanding a phylogenetic matrix is a never-ending work, beyond the scope of any paper presenting a new fossil.

      Respectfully, we did not criticize previous studies for including an incomplete phylogeny. Instead, we criticized the methodology behind the homology statements made in Whiteside et al. (2022) and Whiteside et al. (2024).

      (3) Each additional taxon has the possibility of inducing a rethinking of characters. That includes new characters, new character states, character state reordering, etc. As I said above, it is usually beyond the scope of a paper with a new fossil to accommodate that into the phylogenetic matrix, as it requires not only scoring the newly described taxon but also many that are already scored. Since the digitalization of fossils is still rare, it requires a lot of collection visits that are costly in terms of time.

      We agree on all points, but we are unsure of what the Reviewer is asking us to do relative to this study.

      (4) If I were to search for a true flaw in the Whiteside et al. paper, I would check if there is a confirmation bias. The mentioned paper should not only search for characters that support Cryptovaranoides affinities with Anguimorpha but also characters that deny that. I am not sure if Whiteside et al. did such an exercise. Anyway, the test of congruence would not solve this issue because by adding only characters that support one hypothesis, we are biasing the results of such a test.

      We would refer the Reviewer to their section (1) on comparative anatomy. As we and the Reviewer have pointed out, Whiteside et al. did not perform comparative anatomical statements outside of crown Squamata in their original study. More specifically, Whiteside et al. (2022, Fig. 8) presented a phylogeny where Cryptovaranoides formed a clade with Xenosaurus within the crown of Anguimorpha or what they termed “Anguiformes”, and made comparisons to the anatomies of the legless anguids, Pseudopus and Ophisaurus. Whiteside et al. (2024), abandoned “Anguiformes”, maintained comparisons to Pseudopus and emphasized affinities with Anguimorpha (but almost all of their phylogenies as published, they do not recover a monophyletic Angumimorpha unless amphisbaenians and snakes are considered to be anguimorphans. Thus, we agree that confirmation bias was inherent in their studies.

      To sum up, there is nothing wrong with proposing some hypotheses about character homology between different taxa that can be tested in future papers that will include a test of congruence. Lack of such a test makes the whole argumentation weaker in Whiteside et al., but not unacceptable, as the manuscript might suggest. My advice is to step down from such strong statements like "methodological flaw" and "empirical problems" and replace them with "limitations", which I think better describes the situation.

      We agree with the first sentence in this paragraph – there is nothing wrong with proposing character homologies between different taxa based on comparative anatomical studies. However, that is not what Whiteside et al. (2022) and (2024) did. Instead, they claimed that an ad hoc comparison of Cryptovaranoides to crown Squamata confirmed that Cryptovaranoides is in fact a crown squamate and likely a member of Anguimorpha. Their study did not recognize limitations, but rather, concluded that their new taxon pushed the age of crown Squamata into the Triassic.

      As noted by Reviewer 2, such a claim, and the ‘data’ upon which it is based, should be treated with skepticism. We have elected to apply strong skepticism and stringent tests of falsification to our critique.

      Reviewer #1 (Recommendations for the authors):

      (1) Lines 596-598 promise the following: "we provide a long[-]form review of these and other features in Cryptovaranoides that compare favorably with non-squamate reptiles in Supplementary Material." You have kindly informed me that all this material has been moved into the main text; please amend this passage.

      This has been deleted.

      (2) Comments on science

      41: I would rather say "an additional role".

      This has been edited accordingly.

      43: Reconstructing the tree entirely from extant organisms and adding fossils later is how Hennig imagined it, because he was an entomologist, and fossil insects are, on average,e extremely rare and usually very incomplete (showing a body outline and/or wing venation and little or nothing else). He was wrong, indeed wrong-headed. As a historical matter, phylogenetic hypotheses were routinely built on fossils by the mid-1860s, pretty much as soon as the paleontologists had finished reading On the Origin of Species, and this practice has never declined, let alone been interrupted. As a theoretical matter, including as many extinct taxa as possible in a phylogenetic analysis is desirable because it breaks up long branches (as most recently and dramatically shown by Mongiardino Koch & Parry 2020), and while some methods and some kinds of data are less susceptible to long-branch attraction and long-branch repulsion than others, none are immune; and while missing data (on average more common in fossils) can actively mislead parametric methods, this is not the case with parsimony, and even in Bayesian inference the problem is characters with missing data, not taxa with missing data. Some of you have, moreover, published tip-dated phylogenetic analyses. As a practical matter, molecular data are almost never available from fossils, so it is, of course, true that analyses which only use molecular data can almost never include fossils; but in the very rare exceptions, there is no reason to treat fossil evidence as an afterthought.

      We agree and have changed “have become” to “is.”

      49-50, 59: The ages of individual fissure fills can be determined by biostratigraphy; as far as I understand, all specimens ever referred to Cryptovaranoides [13, 19] come from a single fill that is "Rhaetian, probably late Rhaetian (equivalent of Cotham Member, Lilstock Formation)" [13: pp. 2, 15].

      We appreciate this comment; the recent literature, however, suggests that variable ages are implied by the biostratigraphy at the English Fissure Fills, so we have chosen to keep this as is. Also note that several isolated bones were not recovered with the holotype but were discussed by Whiteside et al. (2024). The provenance of these bones was not clearly discussed in that paper.

      59-60: Why "putative"? Just to express your disagreement? I would do that in a less misleading way, for example: "and found this taxon as a crown-group squamate (squamate hereafter) in their phylogenetic analyses." - plural because [19] presented four different analyses of two matrices just in the main paper.

      We have removed this word.

      121-124: The entepicondylar foramen is homologous all the way down the tree to Eusthenopteron and beyond. It has been lost a quite small number of times. The ectepicondylar foramen - i.e., the "supinator" (brachioradialis) process growing distally to meet the ectepicondyle, fusing with it and thereby enclosing the foramen - goes a bit beyond Neodiapsida and also occurs in a few other amniote clades (...as well as, funnily enough, Eusthenopteron in later ontogeny, but that's independent).

      We agree. However, the important note here is that the features on the humerus of Cryptovaranoides are not comparable (differ in location and morphology) to the ent- and ectepondylar foramina in other reptiles, as we discuss at length. As such, we have kept this sentence as is.

      153: Yes, but you [18] mistakenly wrote "strong anterior emargination of the maxillary nasal process, which is [...] a hallmark feature of archosauromorphs" in the main text (p. 14) - and you make the same mistake again here in lines 200-206! Also, the fact [19: Figure 2a-c] remains that Cryptovaranoides did not have an antorbital fenestra, let alone an antorbital fossa surrounding it (a fossa without a fenestra only occurs in some cases of secondary loss of the fenestra, e.g., in certain ornithischian dinosaurs). Unsurprisingly, therefore, Cryptovaranoides also does not have an orbital-as-opposed-to-nasal process on its maxilla [19: Figure 2a-c].

      Line 243-249 (in original manuscript) deal with the emargination of maxillary nasal process (but this does not imply a full antorbital fenestra).  We explicitly state that this feature alone "has limited utility" for supporting archosauromorph affinity.

      158-173: The problem here is not that the capitellum is not preserved; from amniotes and "microsaurs" to lissamphibians and temnospondyls, capitella ossify late, and larger capitella attach to proportionately larger concave surfaces, so there is nothing wrong with "the cavity in which it sat clearly indicates a substantial condyle in life". Instead, the problem is a lack of quantification (...as has also been the case in the use of the exact same character in the debate on the origin of lissamphibians); your following sentence (lines 173-175) stands. The rest of the paragraph should be drastically shortened.

      We appreciate this comment. We note that the ontogenetic variation of this feature is in part the issue with the interpretation provided by Whiteside et al. (2024). The issue is the lack of consistency on the morphology of the capitellum in that study. We are unclear on what the reviewer means by ‘quantification,’ as the character in question is binary. 

      250-252: It's not going to matter here, but in any different phylogenetic context, "sphenoid" would be confusing given the sphenethmoid, orbitosphenoid, pleurosphenoid, and laterosphenoid. I actually recommend "parabasisphenoid" as used in the literature on early amniotes (fusion of the dermal parasphenoid and the endochondral basisphenoid is standard for amniotes).

      We have added "(=parabasisphenoid)" on first use but retain use of sphenoid because in the squamate and archosauromorph literature, sphenoid (or basisphenoid) is used more frequently.

      314-315: Vomerine teeth are, of course, standard for sarcopterygians. Practically all extant amphibians have a vomerine toothrow, for example. A shagreen of denticles on the vomer is not as widespread but still reaches into the Devonian (Tulerpeton).

      We agree, but vomerine teeth are rare in lepidosaurs and archosaurs and occur only in very recent clades e.g. anguids and one stem scincoid. Their presence in amphibians is not directly relevant to the phylogenetic placement of Cryptovaranoides among reptiles.

      372: Fusion was not scored as present in [13], but as unknown (as "partial" uncertainty between states 0 and 1 [19:8]), and seemingly all three options were explored in [19].

      We politely disagree with the reviewer; state 1 is scored in Whiteside et al. (2024).

      377-383: Together with the partially fused NHMUK PV R37378 [13: Figure 4B, C; 19: 8], this is actually an argument that Cryptovaranoides is outside but close to Unidentata. The components of the astragalus fuse so early in extant amniotes that there is just a single ossification center in the already fused cartilage, but there are Carboniferous and Permian examples of astragali with sutures in the expected places; all of the animals in question (Diadectes, Hylonomus, captorhinids) seem to be close to but outside Amniota. (And yet, the astragalus has come undone in chamaeleons, indicating the components have not been lost.) - Also, if NHMUK PV R37378 doesn't belong to a squamate close to Unidentata, what does it belong to? Except in toothless beaks, premaxillary fusion is really rare; only molgin newts come to mind (and age, tooth size, and tooth number of NHMUK PV R37378 are wholly incompatible with a salamandrid).

      The relevance of the astragalus is to the current discussion is unclear as we do not mention this element in our manuscript.  We discuss the fusion in the premaxillae in response to previous comment. 

      471-474: That thing is concave. (The photo is good enough that you can enlarge it to 800% before it becomes too pixelated.) It could be a foramen filled with matrix; it does not look like a grain sticking to the outside of the bone. Also, spell out that you're talking about "suc.fo" in Figure 3j.

      We are also a bit confused about this comment, as we state:

      “Finally, we note here that Whiteside et al. [19] appear to have labeled a small piece of matrix attached to a coracoid that they refer to †C. microlanius as the supracoroacoid [sic] foramen in their figure 3, although this labeling is inferred because only “suc, supracoroacoid [sic]” is present in their figure 3 caption.” (L. 519-522, P. 17). We cannot verify that this structure is concave, as so we keep this text as is.

      476-489: [19] conceded in their section 4.1 (pp. 11-12) that the atlas pleurocentrum, though fused to the dorsal surface of the axis intercentrum as usual for amniotes and diadectomorphs, was not fused to the axis pleurocentrum.

      This is correct, as we note in the MS. The issue is whether these elements are clearly identifiable.

      506-510: [19:12] did identify what they considered a possible ulnar patella, illustrated it (Figure 4d), scored it as unknown, and devoted the entire section 4.4 to it.<br /> 512-523: What I find most striking is that Whiteside et al., having just discovered a new taxon, feel so certain that this is the last one and any further material from that fissure must be referable to one of the species now known from there.

      We agree with these points and believe we have devoted adequate text to addressing them. Note that the reviewer does not recommend any revisions to these sections.

      553: Not that it matters, but I'm surprised you didn't use TNT 1.6; it came out in 2023 and is free like all earlier versions.

      We have kept this as is following the reviewer comment, and because we were interested in replicating the analyses in the previous publications that have contributed to the debate about the identity of this taxon.  For the present simple analyses both versions should perform identically, as the search algorithms for discrete characters are identical across these versions.

      562: Is "01" a typo, or do you mean "0 or 1"? In that case, rather write "0/1" or "{01}".

      This has been corrected to {01}

      (3) Comments on nomenclature and terminology

      55, 56: Delete both "...".

      This has been corrected.

      100: "ent- and ectepicondylar"

      For clarity, we have kept the full words.

      107-108: I understand that "high" is proximal and "low" is distal, but what is "the distal surface" if it is not the articular surface in the elbow joint?

      This has been corrected.

      120: "stem pan-lepidosaurs, and stem pan-squamates"; Lepidosauria and Squamata are crown groups that don't contain their stems

      This has been corrected.

      122, 123: Italics for Claudiosaurus and Delorhynchus.

      This has been corrected.

      130: Insert a space before "Tianyusaurus" (it's there in the original), and I recommend de-italicizing the two genus names to keep the contrast (as you did in line 162).

      This has been corrected.

      130, 131: Replace both "..." by "[...]", though you can just delete the second one.

      This has been corrected.

      174: Not a capitulum, but a grammatically even smaller (double diminutive) capitellum.

      This has been corrected.

      209, 224, Table 1: Both teams have consistently been doing this wrong. It's "recessus scalae tympani". The scala tympani ("ladder/staircase of the [ear]drum") isn't the recess, it's what the recess is for; therefore, the recess is named "recess of the scala tympani", and because there was no word for "of" in Classical Latin ("de" meant "off" and "about"), the genitive case was the only option. (For the same reason, the term contains "tympani", the genitive of "tympanum".)

      This has been corrected.

      415-425: This is a terminological nightmare. Ribs can have (and I'm not sure this is exhaustive): a) two separate processes (capitulum, tuberculum) that each bear an articulating facet, and a notch in between; b) the same, but with a non-articulating web of bone connecting the processes; c) a single uninterrupted elongate (even angled) articulating facet that articulates with the sutured or fused dia- and parapophysis; d) a single round articulating facet. Certainly, a) is bicapitate and d) is unicapitate, but for b) and c) all bets are off as to how any particular researcher is going to call them. This is a known source of chaos in phylogenetic analyses. I recommend writing a sentence or three on how the terms "unicapitate" & "bicapitate" lack fixed meanings and have caused confusion throughout tetrapod phylogenetics, and that the condition seen in Cryptovaranoides is nonetheless identical to that in archosauromorphs.

      This has been added: “This confusion in part stems from the lack of a fixed meaning for uni- and bicapitate rib heads; in any case, †C. microlanius possesses a condition identical to archosauromorphs as we have shown.”  (L.475-477, P.16).

      439-440: Other than in archosaurs, some squamates and Mesosaurus, in which sauropsids are dorsal intercentra absent?

      We are unclear about the relevance of the question to this section. The issue at hand is that some squamate lineages possess dorsal intercentra, so the absence of dorsal intercentra cannot be considered a squamate synapomorphy without the optimization of this feature along a phylogeny (which was not accomplished by Whiteside et al.).

      458: prezygapophyses.

      This has been corrected.

      516: "[...]".

      This has been corrected.

      566: synapomorphies.

      This has been corrected.

      587: Macrocnemus.

      This has been corrected.

      585: I strongly recommend either taking off and nuking the name Reptilia from orbit (like Pisces) or using it the way it is defined in Phylonyms, namely as the crown group (a subset of Neodiapsida). Either would mean replacing "neodiapsid reptiles" with "neodiapsids".

      This has been corrected to “neodiapsids.”

      625: Replace "inclusive clades" by "included clades", "component clades", "subclades", or "parts," for example.

      This has been kept as is because “inclusive clades” is common terminology and is used extensively in, for example, the PhyloCode. 

      659: Please update.

      References are updated.

      Fig. 8: Typo in Puercosuchus.

      This has been corrected.

      (4) Comments on style and spelling

      You inconsistently use the past and the present tense to describe [13, 19], sometimes both in the same sentence (e.g., lines 323 vs. 325). I recommend speaking of published papers in the past tense to avoid ascribing past views and acts to people in their present state.

      This has been corrected to be more consistent throughout the manuscript.

      48: Remove the second comma.

      This has been corrected.

      91: Replace "[13] and WEA24" by "[13, 19]".

      This has been corrected.

      100: Commas on both sides of "in fact" or on neither

      This has been corrected.

      117: I recommend "the interpretation in [19]". I have nothing against the abbreviation "WEA24", but you haven't defined it, and it seems like a remnant of incomplete editing. - That said, eLife does not impose a format on such things. If you prefer, you can just bring citation by author & year back; in that case, this kind of abbreviation would make perfect sense (though it should still be explicitly defined).<br /> 129, 145: Likewise.

      We have modified this [13] and [19] where necessary.

      192-198: Surely this should be made part of the paragraph in lines 158-175, which has the exact same headline?

      This has been corrected.

      200-206: Surely this should be made part of the paragraph in lines 148-156, which has the exact same headline?

      These sections deal with different issues pertaining to the analyses of Whiteside et al. (2024) and so we have kept to organization as is.

      214: Delete "that".

      This has been deleted.

      312: "Vomer" isn't an adjective; I'd write "main vomer body" or "vomer's main body" or "main body of the vomer".

      This has been corrected.

      350: "figured"

      This has been corrected.

      400: Rather, "rearticulated" or "worked to rearticulate"? - And why "several"? Just write "two". "Several" implies larger numbers.

      These issues have been corrected.

      448, 500: As which? As what kind of feature? I'm aware that "as such" is fairly widely used for "therefore", but it still confuses me every time, and I have to suspect I'm not the only one. I recommend "therefore" or "for this reason" if that is what you mean.

      “As such” has been deleted.

      452: Adobe Reader doesn't let me check, but I think you have two spaces after "of".

      This has been corrected.

      514, 539, 546, 552, 588, Fig. 3, 5, 6, Table 1: "WEA24" strikes again.

      This has been corrected.

      515: Remove the parentheses.

      This has been corrected.

      531: Insert a space after the period.

      This has been corrected.

      532: Remove both commas and the second "that".

      This has been corrected.

      538: Remove the comma.

      This has been kept as is because changing it would render the sentence grammatically incorrect.

      545: "[...]" or, better, nothing.

      This has been corrected.

      547: Spaces on both sides of the dash or on neither (as in line 553).

      This has been corrected.

      552: Rather, "conducted a parsimony analysis".

      This has been corrected.

      556: Space after "[19]".

      This has been corrected.

      560: Comma after "narrow".

      This has been corrected.

      600: Comma after "above" to match the one in the preceding line - there's an insertion in the sentence that must be flanked by commas on both sides.

      This has been corrected.

      603: Compound adjectives like "alpha-taxonomic" need a hyphen to avoid tripping readers up.

      This has been corrected.

      612: Similarly, "ancestral-state reconstruction" needs one to make immediately clear it isn't a state reconstruction that is ancestral but a reconstruction of ancestral states.

      This has been corrected.

      613: If you want to keep this comma, you need to match it with another after "Cryptovaranoides" in line 611.

      We have kept this as is, because removing this comma would render the sentence grammatically incorrect.

      615: Likewise, you need a comma after "and" because "except for a few features" is an insertion. The other comma is actually optional; it depends on how much emphasis you want to place on what comes after it.

      this has been added.

      622: Comma after "[48, 49]".

      this has been added.

      672: Missing italics and two missing spaces.

      This has been corrected.

      678, 680-681, 693, 700-701, 734, 742, 747, 788, 797, 799, 803, 808, 810-811, 814, 817, 820, 823, 828, 841, 843: Missing italics.

      This has been corrected.

      683, 689: These are book chapters. Cite them accordingly.

      This has been corrected.

      737: Missing DOI.

      No DOI is available.

      793: Missing Bolosaurus major; and I'd rather cite it as "2024" than "in press", and "online early" instead of "n/a".

      This has been corrected.

      835: Hoffstetter, RJ?

      This has been corrected.

      836: Is there something missing?

      This has been corrected.

      839: This is the same reference as number 20 (lines 683-684), and it is miscited in a different way...!

      This has been corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) There is a brief mention of a phylogenetic analysis being re-run, but it is unclear if any modifications (changes in scoring) based on the very observations were made. Please state this explicitly.

      This is explained from lines 600-622, P.20-21, in the section “Apomorphic characters not empirically obtained.”  "In order to check the characters listed by Whiteside et al. [19] (p.19) as “two diagnostic characters” and “eight synapomorphies” in support of a squamate identity for †Cryptovaranoides, we conducted a parsimony analysis of the revised version of the dataset [32] provided by Whiteside et al. [19] in TNT v 1.5 [91]. We used Whiteside et al.’s [19] own data version"

      (2) Line 20: There is almost no discussion of non‑lepidosaur lepidosauromorphs. I suggest including this, as the archosauromorph‑like features reported in Cryptovaranoides appear rather plastic. Furthermore, diagnostic features of Archosauromorpha in other datasets (e.g., Ezcurra 2016 or the works of Spiekman) are notably absent (and unsampled) in Cryptovaranoides. Expanding this comparison would greatly strengthen the manuscript.

      The brief discussion (although not absent) of non-lepidosaur lepidosauromorphs is largely a function of the poor fossil record of this grade. But where necessary, we do discuss these taxa. Also see our previous study (Brownstein et al. 2023) for an extensive discussion of characters relevant to archosauromorphs.

      (3) Line 38: I suggest removing "Archosauromorpha" from the keywords. The authors make a compelling case that Cryptovaranoides is not a squamate, yet they do not fully test its placement within Archosauromorpha (as they acknowledge). Perhaps use "Reptilia" instead?

      We have removed this keyword.

      (4) Line 99: The authors' points here are well made and largely valid. The presence of the ent‑ and ectepicondylar foramina is indeed an amniote plesiomorphy and cannot confirm a squamate identity. Their absence, however, can be informative - although it is unclear whether the CT scans of the humerus are of sufficient resolution, and Figure 4 of Brownstein et al. looks hastily reconstructed (perhaps owing to limited resolution). Moreover, the foramina illustrated by Whiteside do resemble those of other reptiles, albeit possibly over‑prepared and exaggerated.

      The issue with the noted figure is indeed due to poor resolution from the scans. Although we agree with the reviewer, we hesitate to talk about absence in this taxon being phylogenetically informative given the confounding influence of ontogeny.

      (5) I encourage the authors to provide slice data to support the claim that the foramina are absent (which could certainly be correct!); otherwise, the assertion remains unsubstantiated.

      We only have access to the mesh files of segmented bones, not the raw (reconstructed slice) data.

      (6) PLEASE NOTE - because the specimen is juvenile, the apparent absence of the ectepicondylar foramen is equivocal: the supinator process develops through ontogeny and encloses this foramen (see Buffa et al. 2025 on Thadeosaurus, for example).

      See above.

      (7) Line 122: Italicize 'Delorhynchus'

      This has been corrected.

      (8) Lines 131‑132: I'd suggest deleting the final sentence; it feels a little condescending, and your argument is already persuasive.

      This has been corrected.

      (9) Line 129: Please note that owenettid "parareptiles" also lack this process, as do several other stem‑saurians. Its absence is therefore not diagnostic of Squamata.<br /> Also: Such plasticity is common outside the crown. Milleropsis and Younginidae develop this process during ontogeny, even though a lower temporal bar never fully forms.

      We appreciate this point. See discussion later in the manuscript.

      (11) Line 172: Consider adding ontogeny alongside taphonomy and preservation. A juvenile would likely have a poorly developed radial condyle, if any. Acknowledging this possibility will add some needed nuance.

      This sentence has been modified, but we have not added in discussion of ontogeny here because it is not immediately relevant to refuting the argument about inference of the presence of this feature when it is not preserved.

      (12) Line 177: The "septomaxilla" in Whiteside et al. (2024, Figure 1C) resembles the contralateral premaxilla in dorsal view, with the maxillary process on the left and the palatal (or vomerine) process on the right (the dorsal process appears eroded). The foramen looks like a prepalatal foramen, common to many stem and crown reptiles. Consequently, scoring the septomaxilla as absent may be premature; this bone often ossifies late. In my experience with stem‑reptile aggregations, only one of several articulated individuals may ossify this element.

      We agree that presence of a late-ossifying septomaxilla cannot be ruled out, but our point remains (and in agreement with Referee) that scoring the septomaxilla as present based on the amorphous fragments is premature.

      (13) Line 200: Tomography data should be shown before citing it. The posterior margin of the maxilla appears rather straight, and the maxilla itself is tall for an archosauromorph. It would be more convincing to score this feature as present only after illustrating the relevant slices - and, as you note, the trait is widespread among non‑archosauromorphs.

      See above and Brownstein et al. (2023).

      (14) Line 208: Well argued: how could Whiteside et al. confidently assign a disarticulated element? Their "vagus" foramen actually resembles a standard hypoglossal foramen - identical to that seen in many stem reptiles, which often have one large and one small opening.

      Thank you!

      (15) Line 248: Again, please illustrate this region. One cannot argue for absence without showing the slice data. Note that millerettids and procolophonians - contemporaneous with Cryptovaranoides - possess an enclosed vidian canal, so the feature is broadly distributed.

      See above.

      (16) Line 258: The choanal fossa is intriguing: originally created for squamate matrices, yet present (to varying degrees) in nearly every reptile I have examined. It is strongly developed in millerettids (see Jenkins et al. 2025 on Milleropsis and Milleretta) and younginids, much like in squamates - Tiago appropriately scores it as present. Thus, it may be more of a "Neodiapsida + millerettids" character. In any case, the feature likely forms an ordered cline rather than a simple binary state.

      We agree and look forward to future study of this feature.

      (17) Line 283: Bolosaurids are not diapsids and, per Simões, myself, and others, "Diapsida" is probably invalid, at least how it is used here. Better to say "neodiapsids" for choristoderes and "stem‑reptiles" or "sauropsids" for bolosaurids. Jenkins et al.'s placement is largely a function of misidentifying the bolosaurid stapes as the opisthotic.

      We are not entirely clear on this point since bolosaurids are not mentioned in this section.

      (18) Line 298: Here, you note that the CT scans are rather coarse, which makes some earlier statements about absence/presence less certain (e.g., humeral foramina). It may strengthen the paper to make fewer definitive claims where resolution limits interpretation.

      We appreciate this point. However, in the case of the humeral foramina the coarseness of the scans is one reason why we question Whiteside et al. scoring of the presence of these features.

      (19) Line 314: Multiple rows of vomerine teeth are standard for amniotes; lepidosauromorphs such as Paliguana and Megachirella also exhibit them (though they may not have been segmented in the latter's description). Only a few groups (e.g., varanopids, some millerettids) have a single medial row.

      We appreciate this point and have added in those citations into the following added sentence: “Multiple rows of vomerine teeth are common in reptiles outside of Squamata [76]; the presence of only one row is restricted to a handful of clades, including millerettids [77,78], †Tanystropheus [49], and some [79], but not all [71,80] choristoderes.” (L. 360-363, P. 12).

      (20) Line 317: This is likely a reptile plesiomorphy - present in all millerettids (e.g., Milleropsis and Milleretta per Jenkins et al.). Citing these examples would clarify that it is not uniquely squamate. Could it be secondarily lost in archosauromorphs?

      We appreciate this point and have cited Jenkins et al. here. It is out of the scope of this discussion to discuss the polarity of this feature relative to Archosauromorpha.

      (21) Line 336: Unfortunately, a distinct quadratojugal facet is usually absent in Neodiapsids and millerettids; where present, the quadratojugal is reduced and simply overlaps the quadrate.

      We appreciate this point but feel that reviewing the distribution of this feature across all reptiles is not relevant to the text noted.

      (22) Line 357: Pterygoid‑quadrate overlap is likely a tetrapod plesiomorphy. Whiteside et al. do not define its functional or phylogenetic significance, and the overlap length is highly variable even among sister taxa.

      We agree, but in any case this feature is impossible to assess in Cryptovaranoides.

      (23) Line 365: Another well‑written section - clear and persuasive.

      Thank you!

      (24) Line 385: The cephalic condyle is widespread among neodiapsids, so it is not uniquely squamate.

      We agree.

      (25) Character 391: Note that the frontal underlapping the parietal is widespread, appearing in both millerettids and neodiapsids such as Youngina.

      We appreciate this point, but the point here deals with the fact that this feature is not observable in the holotype of Cryptovaranoides.

      (26) Line 415: The "anterior process" is actually common among crown reptiles, including sauropterygians, so it cannot by itself place Cryptovaranoides within Archosauromorpha.

      We agree but also note that we do not claim this feature unambiguously unites Cryptovaranoides with Archosauromorpha.

      (28) Line 460: Yes - Whiteside et al. appear to have relabeled the standard amniote coracoid foramen. Excellent discussion.

      Thank you!

      (29) Line 496: While mirroring Whiteside's structure, discussing this mandibular character earlier, before the postcrania, might aid readability.

      We have chosen to keep this structure as is.

      (30) Lines 486-588: This section oversimplifies the quadrate articulation.

      We are unclear how this is an oversimplification.

      (31) Both Prolacerta and Macrocnemus possess a cephalic condyle and some mobility (though less than many squamates). In Prolacerta (Miedema et al. 2020, Figure 4), the squamosal posteroventral process loosely overlaps the quadrate head.

      We assume this comment refers to the section "Peg-in-notch articulation of quadrate head"; we appreciate clarification that this feature occurs in variable extent outside squamates, but this does not affect our statement that the material of Cryptovaranoides is too poorly preserved to confirm its presence.

      (32) Where is this process in Cryptovaranoides? It is not evident in Whiteside's segmentation of the slender squamosal - please illustrate.

      We are unclear as to which section this comment refers.

      (33) Additionally, the quadrate "conch" of Cryptovaranoides is well developed, bearing lateral and medial tympanic crests; the lateral crest is absent in the cited archosauromorphs.

      We note that no vertebrate has a medial tympanic crest (it is always laterally placed for the tympanic membrane, when present). If this is what the reviewer refers to, this is a feature commonly found across all tetrapods bearing a tympanum attached to the quadrate (e.g., most reptiles), and so it is not very relevant phylogenetically. Regarding its presence in Cryptovaranoides, the lateral margin of the quadrate is broken (Brownstein et al., 2023), so it cannot be determined. This incomplete preservation also makes an interpretation of a quadrate conch very hard to determine. But as currently preserved, there is no evidence whatsoever for this feature.

      (34) Line 591: The cervical vertebrae of Cryptovaranoides are not archosauromorph‑like. Archosauromorph cervicals are elongate, parallelogram‑shaped, and carry long cervical ribs-none of which apply here. As the manuscript lacks a phylogenetic analysis, including these features seems unnecessary. Should they be added to other datasets, I suspect Cryptovaranoides would align along the lepidosaur stem (though that remains to be tested).

      We politely disagree. The reviewer here mentions that the cervical vertebrae of archosauromorphs are generally shaped differently from those in Cryptovaranoides. The description provided (“elongate, parallelogram‑shaped, and carry long cervical ribs-none”) is basically limited to protorosaurians (e.g., tanystropheids, Macrocnemus) and early archosauriforms. We note that archosauromorph cervicals are notoriously variable in shape, especially in the crown, but also among early archosauromorphs. Further, the cervical ribs, are notoriously similar among early archosauromorphs (including protorosaurians) and Cryptovaranoides, as discussed and illustrated in Brownstein et al., 2023 (Figs. 2 and 3), especially concerning the presence of the anterior process.

      Further, we do include a phylogenetic analysis of the matrix provided in Whiteside et al. (2024) as noted in our results section. In any case, we direct the reviewer to our previous study (Brownstein et al., 2023), in which we conduct phylogenetic analyses that included characters relevant to this note.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should use specimen numbers all over the text because we are talking about multiple individuals, and the authors contest the previous affinity of some of them. For example, on page 16, line 447, they mention an isolated vertebra but without any number. The specimen can be identified in the referenced article, but it would be much easier for the reader if the number were also provided here

      Agreed and added.

      (2) Abstract: "Our team questioned this identification and instead suggested Cryptovaranoides had unclear affinities to living reptiles."

      That is very imprecise. The team suggested that it could be an archosauromorph or an indeterminate neodiapsid. Please change accordingly.

      We politely disagree. We stated in our 2023 study that whereas our phylogenetic analyses place this taxon in Archosauromorpha, it remains unclear where it would belong within the latter. This is compatible with “unclear affinities to living reptiles”.

      (3) Page 7, line 172: "Taphonomy and poor preservation cannot be used to infer the presence of an anatomical feature that is absent." Unfortunate wording. Taphonomy always has to be used to infer the presence or absence of anatomical features. Sometimes the feature is not preserved, but it leaves imprints/chemical traces or other taphonomic indicators that it was present in the organism. Please remove or rewrite the sentence.

      We agree and have modified the sentence to read: “Taphonomy and poor preservation cannot be used alone to justify the inference that an anatomical feature was present when it is not preserved and there is no evidence of postmortem damage. In a situation when the absence of a feature is potentially ascribable to preservation, its presence should be considered ambiguous.” (L. 141-145, P.5).

      (4) Page 4, line 91, please explain "WEA24" here, though it is unclear why this abbreviation is used instead of citation in the manuscript.

      This has been corrected to Whiteside et al. [19].

      (5) Page 6, line 144: "Together, these observations suggest that the presence of a jugal posterior process was incorrectly scored in the datasets used by WEA24 (type (ii) error)." That sentence is unclear. Why did the authors use "suggest"? Does it mean that they did not have access to the original data matrix to check it? If so, it should be clearly stated at the beginning of the manuscript.

      See earlier; this has been modified and “suggest” has been removed.

      (6) Page 7, line 174: "Finally, even in the case of the isolated humerus with a preserved capitulum, the condyle illustrated by Whiteside et al. [19] is fairly small compared to even the earliest known pan-squamates, such as Megachirella wachtleri (Figure 4)." Figure 4 does not show any humeri. Please correct.

      The reference to figure 4 has been removed.

      (7) Page 8, line 195-198: "This is not the condition specified in either of the morphological character sets that they cite [18,38], the presence of a distinct condyle that is expanded and is by their own description not homologous to the condition in other squamates." This is a bit unclear. Could the authors explain it a little bit further? How is the condition that is specified in the referred papers different compared to the Whiteside et al. description?

      We appreciate this comment and have broken this sentence up into three sentences to clarify what we mean:

      “The projection of the radial condyle above the adjacent region of the distal anterior extremity is not the condition specified in either of the morphological character sets that Whiteside et al. [19] cite [18,32]. The condition specified in those studies is the presence of a distinct condyle that is expanded. The feature described in Whiteside et al. [19] does not correspond to the character scored in the phylogenetic datasets.” (L.220-225, P.8).

      (8) Page 16, line 446: "they observed in isolated vertebrae that they again refer to C. microlanius without justification". That is not true. The referred paper explains the attribution of these vertebrae to Cryptovaranoides (see section 5.3 therein). The authors do not have to agree with that justification, but they cannot claim that no justification was made. Please correct it here and throughout the text.

      We have modified this sentence but note that the justification in Whiteside et al. (2024) lacked rigor. Whiteside et al. (2024) state: “Brownstein et al. [5] contested the affinities of three vertebrae, cervical vertebra NHMUK PV R37276, dorsal vertebra NHMUK PV R37277 and sacral vertebra NHMUK PV R37275. While all three are amphicoelous and not notochordal, the first two can be directly compared to the holotype. Cervical vertebra NHMUK PV R37276 is of the same form as the holotype CV3 with matching neural spine, ventral keel (=crest) and the posterior lateral ridges or lamina (figure 3c,d) shown by Brownstein et al. [5, fig. 1a]. The difference is that NHMUK PV R37276 has a fused neural arch to the pleurocentrum and a synapophysis rather than separate diapophysis and parapophysis of the juvenile holotype (figure 3c). Neurocentral fusion of the neural arch and centrum can occur late in modern squamates, ‘up to 82% of the species maximum size’ [28].

      The dorsal surface of dorsal vertebra NHMUK PV R37277 (figure 3e) can be matched to the mid-dorsal vertebra in the †Cryptovaranoides holotype (figure 4d, dor.ve) and has the same morphology of wide, dorsally and outwardly directed, prezygapophyses, downwardly directed postzygapophyses and similar neural spine. It is also of similar proportions to the holotype when viewed dorsally (figures 3e and 4d), both being about 1.2 times longer anteroposteriorly than they are wide, measured across the posterior margin. The image in figure 4d demonstrates that the posterior vertebrae are part of the same spinal column as the truncated proximal region but the spinal column between the two parts is missing, probably lost in quarrying or fossil collection.”

      This justification is based on pointing out the presence of supposed shared features between these isolated vertebrae and those in the holotype of Cryptovaranoides, even though none of these features are diagnostic for that taxon. We have changed the sentence in our manuscript to read:

      “Whiteside et al. [19] concur with Brownstein et al. [18] that the diapophyses and parapophyses are unfused in the anterior dorsals of the holotype of †Cryptovaranoides microlanius, and restate that fusion of these structures is based on the condition they observed in isolated vertebrae that they refer to †C. microlanius based on general morphological similarity and without reference to diagnostic characters of †C. microlanius” (L. 502-507, P. 17).

      (9) Figure 2. The figure caption lacks some explanations. Please provide information about affinity (e.g., squamate/gekkotan), ag,e and locality of the taxa presented. Are these left or right palatines? The second one seems to be incomplete, and maybe it is worth replacing it with something else?

      The figure caption has been modified:

      “Figure 2. Comparison of palatine morphologies. Blue shading indicates choanal fossa. Top image of †Cryptovaranoides referred left palatine is from Whiteside et al. [19]. Middle is the left palatine of †Helioscopos dickersonae (Squamata: Pan-Gekkota) from the Late Jurassic Morrison Formation [62]. Bottom is the right palatine of †Eoscincus ornatus (Squamata: Pan-Scincoidea) from the Late Jurassic Morrison Formation [31].”

      (10) Figure 8. The abbreviations are not explained in the figure caption.

      These have been added.

    1. Reviewer #3 (Public review):

      Summary:

      Ruppert et al. present a well-designed 2×2 factorial study directly comparing methionine restriction (MetR) and cold exposure (CE) across liver, iBAT, iWAT, and eWAT, integrating physiology with tissue-resolved RNA-seq. This approach allows a rigorous assessment of where dietary and environmental stimuli act additively, synergistically, or antagonistically. Physiologically, MetR progressively increases energy expenditure (EE) at 22{degree sign}C and lowers RER, indicating a lipid utilization bias. By contrast, a 24-hour 4 {degree sign}C challenge elevates EE across all groups and eliminates MetR-Ctrl differences. Notably, changes in food intake and activity do not explain the MetR effect at room temperature.

      Strengths:

      The data convincingly support the central claim: MetR enhances EE and shifts fuel preference to lipids at thermoneutrality, while CE drives robust EE increases regardless of diet and attenuates MetR-driven differences. Transcriptomic analysis reveals tissue-specific responses, with additive signatures in iWAT and CE-dominant effects in iBAT. The inclusion of explicit diet×temperature interaction modeling and GSEA provides a valuable transcriptomic resource for the field.

      Comments on revisions:

      The authors have addressed any concerns I had.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Activation of thermogenesis by cold exposure and dietary protein restriction are two lifestyle changes that impact health in humans and lead to weight loss in model organisms - here, in mice. How these affect liver and adipose tissues has not been thoroughly investigated side by side. In mice, the authors show that the responses to methionine restriction and cold exposure are tissue-specific, while the effects on beige adipose are somewhat similar.

      Strengths: 

      The strength of the work is the comparative approach, using transcriptomics and bioinformatic analyses to investigate the tissue-specific impact. The work was performed in mouse models and is state-of-the-art. This represents an important resource for researchers in the field of protein restriction and thermogenesis. 

      Weaknesses: 

      The findings are descriptive, and the conclusions remain associative. The work is limited to mouse physiology, and the human implications have not been investigated yet.

      We thank Reviewer 1 for their thoughtful review and for highlighting the strength of our comparative, tissue-specific analyses. We acknowledge that our study is descriptive and limited to mouse physiology, and agree that translation to humans will be an important next step. By making these data broadly accessible, we aim to provide a useful resource for future mechanistic and translational studies on dietary amino acid restriction and thermogenesis.

      Reviewer #2 (Public review): 

      Summary: 

      This study provides a library of RNA sequencing analysis from brown fat, liver, and white fat of mice treated with two stressors - cold challenge and methionine restriction - alone and in combination (interaction between diet and temperature). They characterize the physiologic response of the mice to the stressors, including effects on weight, food intake, and metabolism. This paper provides evidence that while both stressors increase energy expenditure, there are complex tissue-specific responses in gene expression, with additive, synergistic, and antagonistic responses seen in different tissues.

      Strengths: 

      The study design and implementation are solid and well-controlled. Their writing is clear and concise. The authors do an admirable job of distilling the complex transcriptome data into digestible information for presentation in the paper. Most importantly, they do not overreach in their interpretation of their genomic data, keeping their conclusions appropriately tied to the data presented. The discussion is well thought out and addresses some interesting points raised by their results.

      Weaknesses: 

      The major weakness of the paper is the almost complete reliance on RNA sequencing data, but it is presented as a transcriptomic resource.

      We thank Reviewer 2 for their positive evaluation of our study and for highlighting the strengths of our design, analyses, and interpretation. We acknowledge the limitation of relying primarily on RNA-seq, and emphasize that our intent was to provide a comprehensive transcriptomic resource to guide future mechanistic work by the community.

      Reviewer #3 (Public review): 

      Summary: 

      Ruppert et al. present a well-designed 2×2 factorial study directly comparing methionine restriction (MetR) and cold exposure (CE) across liver, iBAT, iWAT, and eWAT, integrating physiology with tissue-resolved RNA-seq. This approach allows a rigorous assessment of where dietary and environmental stimuli act additively, synergistically, or antagonistically. Physiologically, MetR progressively increases energy expenditure (EE) at 22{degree sign}C and lowers RER, indicating a lipid utilization bias. By contrast, a 24-hour 4 {degree sign}C challenge elevates EE across all groups and eliminates MetR-Ctrl differences. Notably, changes in food intake and activity do not explain the MetR effect at room temperature.

      Strengths: 

      The data convincingly support the central claim: MetR enhances EE and shifts fuel preference to lipids at thermoneutrality, while CE drives robust EE increases regardless of diet and attenuates MetR-driven differences. Transcriptomic analysis reveals tissue-specific responses, with additive signatures in iWAT and CE-dominant effects in iBAT. The inclusion of explicit diet×temperature interaction modeling and GSEA provides a valuable transcriptomic resource for the field.

      Weaknesses: 

      Limitations include the short intervention windows (7 d MetR, 24 h CE), use of male-only cohorts, and reliance on transcriptomics without complementary proteomic, metabolomic, or functional validation. Greater mechanistic depth, especially at the level of WAT thermogenic function, would strengthen the conclusions.

      We thank Reviewer 3 for their thorough review and for recognizing the strengths of our factorial design, physiological assessments, and transcriptomic analyses. We acknowledge the limitations of short intervention windows, male-only cohorts, and the reliance on transcriptomics. Our aim was to generate a well-controlled comparative dataset as a resource, and we agree that future work incorporating longer interventions, both sexes, and additional mechanistic layers will be important to build on these findings.

      Reviewer #1 (Recommendations for the authors): 

      In my opinion, the comparative analysis between tissues and treatments could be expanded.

      We thank the reviewer for this suggestion. We included top30 DEG heatmaps for the comparison MetR_CEvsCtrl_RT for up and downregulated genes in the figures for each tissue. We also provide additional data in the supplementary, including top30 heatmaps for Ctrl_CEvsCtrl_RT, MetR_RTvsCtrl_RT, the interaction term, as well as one excel sheet per tissue for all DEGs (p<0.05 and FC +/- 1.5 and for all gene sets (GSEA).

      Reviewer #3 (Recommendations for the authors): 

      (1) CE robustly increases food intake, yet MetR mice at room temperature, despite elevated EE, do not appear to increase feeding to maintain energy balance. The authors should discuss this discrepancy, as it represents an intriguing avenue for follow-up.

      See answer below.

      (2) CE raises EE to ~0.9 kcal/h irrespective of diet, suggesting that the additive weight loss seen with MetR+CE (Fig. 1H) must be due to reduced intake. This raises the possibility that MetR mice fail to appropriately sense negative energy balance, even under CE, and do not compensate with higher feeding. 

      We thank the reviewer for comments 1 and 2. We did not put an emphasis on this finding, as the literature on the effects on food intake under sulfur amino acid restriction are very inconsistent. Intial studies (e.g. by Gettys group) most often report on food intake per gram bodyweight and report an increase in caloric intake. We think that this reporting is flawed and should rather be reported as cumulative food intake. The recent paper by the Dixit group also reports that there is no effect on food intake, in line with our data. The recent paper by the Nudler group reports a decrease in food intake.

      (3) Report effect sizes and sample sizes alongside p-values in all figure panels, and ensure the GEO accession (currently listed as "GSEXXXXXX") is provided.

      We thank the reviewer for noticing this. So far we were unable to upload the datasets to GEO. We’re unable to connect to the NIH servers, presumably due to the US government shutdown. We are commited to sharing this dataset as soon as possible and will update the manuscript in the future accordingly. We included the sample size for experiment 1 and 2 in the figure legends and described our outlier detection method in the methods section. Significances are explained in the figure legends.

      (4) Explicitly define the criteria for "additive," "synergistic," and "antagonistic" interactions (both at the gene and pathway levels) to help readers align the text with the figures.

      We thank the reviewer for this helpful comment. We added an description of how we defined and computed the regulatory logic in the method section.

      (5) Revise the introduction to address recent data from the Dixit group (ref. #38), which shows that EE induced by cysteine restriction and weight loss is independent of FGF21 and UCP1. As written, the introduction states: "Recent studies have shown that DIT via dietary MetR augments energy expenditure in a UCP1-dependent...fashion". 

      See answer below.

      (6) "Mechanistically, MetR...results in secretion of FGF21. In turn, FGF21 augments EE by activating UCP1-driven thermogenesis in brown adipose tissue via β-adrenergic signaling (4,7)." This should be updated for accuracy and balance.

      We thank the reviewers for both comments 5 and 6. Both recent publications by the Dixit and the Nudler groups (now ref 9 and 10) provide very interesting further mechanistic detail into the bodyweight loss in response to dietary sulfur amino acid restriction. However, there are also older papers by the Gettys group that in part contradict their findings, particularly, when it comes to the importance of UCP1 for the adaptation to sulfur amino acid restriction. Overall, we think that further work is required to determine the importance of UCP1-driven EE from alternative mechanisms that ultimately drive body and fat mass loss. We rewrote the referenced paragraph in the introduction to reflect this.

    1. Reviewer #2 (Public review):

      Summary:

      This study aims to test the hypothesis that microsaccades are linked to the shifting of spatial attention, rather than the maintenance of attention at the cued location. In two experiments, participants were required to judge an orientation change at either a validly cued location (80% of the time) or an invalidly cued location (20% of the time). This change was presented at varying intervals (ranging from 500 to 3,200 ms) after cue onset. Accuracy and reaction times both showed attentional benefits at the valid versus invalid location across the different cue-target intervals. In contrast, microsaccade biases were time-dependent. The authors report a directional bias primarily observed around 400 ms after the cue, with later intervals (particularly in Experiment 2) exhibiting no biases in microsaccade direction towards the cued location. The authors argue that this finding supports their initial hypothesis that microsaccade biases reflect shifts in attention, but that maintaining attention at the cued location after an attention shift is not correlated with microsaccade direction.

      Strengths:

      The results are straightforward given the chosen experimental design. The manuscript is clearly written, and the presentation of the study and its visualisations are both of a high standard.

      Weaknesses:

      The major weakness of this paper is its incremental contribution to a widely studied phenomenon. The link between attention and microsaccades has been the subject of extensive research over the past two decades. This study merely provides a limited overview of the key insights gained from these papers and discussions. In fact, it attempts to summarise previous work by stating that many experiments found a link, while others did not, and provides only a relatively small number of references. To make a significant contribution, I believe the authors should evaluate the field more thoroughly, rather than merely scratching the surface.

      The authors then present a potential solution to the conflicting past findings, arguing that attention should be considered a dynamic process that can be broken down into an attention shift and a sustained attention phase. Although the authors present this as a novel concept, I cannot think of anyone in the field who considers spatial attention to be a static entity. Nevertheless, I was curious to see how the authors would attempt to determine the precise timing of the attention shift and manipulate the different stages individually. However, the authors only varied the interval between the onset of the attention cue and the test stimulus, failing to further pinpoint their dynamic attention concept.

      The current version of the experiment, therefore, takes a correlational approach, similar to initial studies by Engbert and Kliegl (2003) and Hafed and Clark (2002). Meanwhile, we have learned a great deal about the link between microsaccades and attention. Below, I will list just a few of these findings to demonstrate how much we already know. It is important to note that, while the present study cites some of these papers, it does not provide a clear overview of how the current study goes beyond previous research.

      (1) Yuval-Greenberg and colleagues (2014) presented stimuli contingent on online-detected microsaccades. A postcue indicated the target for a visual task, and the target could be congruent or incongruent with the microsaccade direction. The authors showed higher visual accuracy in congruent trials. The authors cited that paper, but it is still important to emphasize how this study already tried to go beyond purely correlational links on a single trial level.

      (2) The Desimone lab (Lower et al., 2018) showed that firing rates in monkey V4 and IT were increased when a microsaccade was generated in the direction of the attended target.

      (3) However, attention can modulate responses in the superior colliculus even in the absence of microsaccades (Yu et al., 2022)

      (4) Similarly, Poletti, Rucci & Carrasco (2017) observed attentional modulations in the absence of microsaccades, or comparable attention effects irrespective of whether a microsaccade occurred or not (Roberts & Carrasco, 2019).

      Thus, in light of these insights, I believe the current study only adds incrementally to our understanding of the link between microsaccades and spatial attention.

      In general, it is important to have an independent measure of the dynamics of an attention shift. I think a shift of 200-600 ms is quite long, and defining this interval is rather arbitrary. Why consider such a long delay as the shift? Rather than taking a data-driven approach to defining an interval for an attention shift, it would be more convincing to derive an interval of interest based on past research or an independent measure.

      The present analyses report microsaccade statistics across all trials, but do not directly link single-trial microsaccades to accuracy. Similarly, reaction times and accuracy were analyzed only with respect to valid vs. invalid trials. Here, it would be important to link the findings between microsaccades and performance on a single-trial level. For instance, can the authors report reaction times and accuracy also separately for trials with vs. without microsaccades, and for trials with congruent vs. incongruent microsaccades?

      The study would benefit greatly from including a neutral condition to substantiate claims of attentional benefits and costs. It is highly probable that invalid trials would also demonstrate costs in terms of reaction times and accuracy. It would be interesting to observe whether directional biases in microsaccades are also evident when compared to a neutral condition.

    1. Reviewer #1 (Public review):

      Summary:

      The authors report intracranial EEG findings from 12 epilepsy patients performing an associative recognition memory task under the influence of scopolamine. They show that scopolamine administered before encoding disrupts hippocampal theta phenomena and reduces memory performance, and that scopolamine administered after encoding but before retrieval impairs hippocampal theta phenomena (theta power, theta phase reset) and neural reinstatement but does not impair memory performance. This is an important study with exciting, novel results and translational implications. The manuscript is well-written, the analyses are thorough and comprehensive, and the results seem robust.

      Strengths:

      (1) Very rare experimental design (intracranial neural recordings in humans coupled with pharmacological intervention).

      (2) Extensive analysis of different theta phenomena.

      (3) Well-established task with different conditions for familiarity versus recollection.

      (4) Clear presentation of findings and excellent figures.

      (5) Translational implications for diseases with cholinergic dysfunction (e.g., AD).

      (6) Findings challenge existing memory models, and the discussion presents interesting novel ideas.

      Weaknesses:

      (1) One of the most important results is the lack of memory impairment when scopolamine is administered after encoding but before retrieval (scopolamine block 2). The effect goes in the same direction as for scopolamine during encoding (p = 0.15). Could it be that this null effect is simply due to reduced statistical power (12 subjects with only one block per subject, while there are two blocks per subject for the condition with scopolamine during encoding), which may become significant with more patients? Is there actually an interaction effect indicating that memory impairment is significantly stronger when scopolamine is applied before encoding (Figure 1d)? Similar questions apply to familiarity versus recollection (lines 78-80). This is a very critical point that could alter major conclusions from this study, so more discussion/analysis of these aspects is needed. If there are no interaction effects, then the statements in lines 84-86 (and elsewhere) should be toned down.

      (2) Further, could it simply be that scopolamine hadn't reached its major impact during retrieval after administration in block 2? Figure 2e speaks in favor of this possibility. I believe this is a critical limitation of the experimental design that should be discussed.

      (3) It is not totally clear to me why slow theta was excluded from the reinstatement analysis. For example, despite an overall reduction in theta power, relative patterns may have been retained between encoding and recall. What are the results when using 1-128 Hz as input frequencies?

      (4) In what way are the results affected by epileptic artifacts occurring during the task (in particular, IEDs)?

    2. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors report intracranial EEG findings from 12 epilepsy patients performing an associative recognition memory task under the influence of scopolamine. They show that scopolamine administered before encoding disrupts hippocampal theta phenomena and reduces memory performance, and that scopolamine administered after encoding but before retrieval impairs hippocampal theta phenomena (theta power, theta phase reset) and neural reinstatement but does not impair memory performance. This is an important study with exciting, novel results and translational implications. The manuscript is well-written, the analyses are thorough and comprehensive, and the results seem robust.

      Strengths:

      (1) Very rare experimental design (intracranial neural recordings in humans coupled with pharmacological intervention).

      (2) Extensive analysis of different theta phenomena.

      (3) Well-established task with different conditions for familiarity versus recollection.

      (4) Clear presentation of findings and excellent figures.

      (5) Translational implications for diseases with cholinergic dysfunction (e.g., AD).

      (6) Findings challenge existing memory models, and the discussion presents interesting novel ideas.

      Weaknesses:

      (1) One of the most important results is the lack of memory impairment when scopolamine is administered after encoding but before retrieval (scopolamine block 2). The effect goes in the same direction as for scopolamine during encoding (p = 0.15). Could it be that this null effect is simply due to reduced statistical power (12 subjects with only one block per subject, while there are two blocks per subject for the condition with scopolamine during encoding), which may become significant with more patients? Is there actually an interaction effect indicating that memory impairment is significantly stronger when scopolamine is applied before encoding (Figure 1d)? Similar questions apply to familiarity versus recollection (lines 78-80). This is a very critical point that could alter major conclusions from this study, so more discussion/analysis of these aspects is needed. If there are no interaction effects, then the statements in lines 84-86 (and elsewhere) should be toned down.

      The reviewer highlights important concerns regarding the statistical power of the behavioral effects. We address these concerns in the revised manuscript in two ways: (1) we provide a supplemental analysis using a matched number of blocks between the placebo and scopolamine conditions to avoid statistical bias related to differing trial counts, and (2) we include a supplemental figure illustrating paired comparisons between blocks.

      (2) Further, could it simply be that scopolamine hadn't reached its major impact during retrieval after administration in block 2? Figure 2e speaks in favor of this possibility. I believe this is a critical limitation of the experimental design that should be discussed.

      The reviewer raises an important methodological concern regarding the time required for scopolamine's effect to manifest and the subsequent impact on the study outcomes. Previous studies report that the average time to maximum serum concentration after intravenous (IV) scopolamine administration is approximately 5 minutes (Renner et al., 2005), with the corresponding clinical onset estimated at 10 minutes. In our study, the retrieval period in Block 2 commenced at 15 ± 0.2 post-injection across all subjects. Given this timing, there is sufficient reason to conclude that scopolamine had reached its major impact during the Block 2 retrieval phase. Furthermore, the observation of significant disruptions to theta oscillations during this same retrieval phase provides strong evidence that the drug was in full effect at that time.

      (3) It is not totally clear to me why slow theta was excluded from the reinstatement analysis. For example, despite an overall reduction in theta power, relative patterns may have been retained between encoding and recall. What are the results when using 1-128 Hz as input frequencies?

      Slow theta (2–4 Hz) was excluded from the reinstatement analysis to avoid potential confounding effects. Given the observed disruption to slow theta power following scopolamine administration, any subsequent changes in slow theta reinstatement would be causally ambiguous, potentially arising directly from the power effects. Therefore, we would be unable to determine whether changes in slow theta reinstatement were genuinely independent of changes in power.

      (4) In what way are the results affected by epileptic artifacts occurring during the task (in particular, IEDs)?

      To exclude abnormal events and interictal activity, a kurtosis threshold of 4 was applied to each trial, effectively filtering out segments exhibiting significant epileptic artifacts.

      Reviewer #2 (Public review):

      Summary:

      In this study, performed in human patients, the authors aimed at dissecting out the role of cholinergic modulation in different types of memory (recollection-based vs familiarity and novelty-based) and during different memory phases (encoding and retrieval). Moreover, their goal was to obtain the electrophysiological signature of cholinergic modulation on network activity of the hippocampus and the entorhinal cortex.

      Strengths:

      The authors combined cognitive tasks and intracranial EEG recordings in neurosurgical epilepsy patients. The study confirms previous evidence regarding the deleterious effects of scopolamine, a muscarinic acetylcholine receptor antagonist, on memory performance when administered prior to the encoding phase of the task. During both encoding and retrieval phases, scopolamine disrupts the power of theta oscillations in terms of amplitude and phase synchronization. These results raise the question of the role of theta oscillations during retrieval and the meaning of scopolamine's effect on retrieval-associated theta rhythm without cognitive changes. The authors clearly discussed this issue in the discussion session. A major point is the finding that the scopolamine-mediated effect is selective for recollection-based memory and not for familiarity- and novelty-based memory.

      The methodology used is powerful, and the data underwent a detailed and rigorous analysis.

      Weaknesses:

      A limited cohort of patients; the age of the patients is not specified in the table.

      To comply with human subject privacy protection policies, age was not reported; however, we did not find any significant effects of age on the behavioral or neural measures.

    1. Joint Public Review:

      Summary

      Non-alcoholic fatty liver disease (NAFLD) is a widespread metabolic disease associated with obesity. Endoplasmic reticulum and calcium dysregulation are hallmarks of NAFLD. Here, the authors explore whether the secreted liver protein transthyretin (TTR), which has been previously shown to modulate calcium signaling in the context of insulin resistance, could also impact NAFLD. The study is motivated by a small cohort of NASH patients who show elevated TTR levels. The authors then overexpress TTR in two mouse obesogenic models, which leads to elevated liver lipid deposition. In contrast, liver-specific TTR knockdown improves some liver lipid levels, reduces inflammation markers, and improves glucose tolerance, overall improving the NAFLD markers. These phenotypic findings are overall convincing and largely consistent in two different diet models.

      Because of TTR's connection to calcium regulation, the authors then assess whether the knockdown affects ER stress and impacts SERCA2 expression. However, the direct mechanistic evidence supporting the central claim that TTR physically interacts with and inhibits the SERCA2 calcium pump is preliminary and requires further validation. Whether the broader effects on lipid accumulation, inflammation markers, and glucose tolerance are mechanistically connected remains to be determined.

      Strengths

      The premise of the study is built on prior work from the authors identifying a link between increased transthyretin secretion and the development of insulin resistance, a related obesity condition. The in vivo studies are comprehensive, using human NASH samples, two distinct diet-induced mouse models (HFD and GAN), and in vitro hepatocyte models. The phenotypic data showing that TTR knockdown alleviates steatosis, inflammation, and insulin resistance are robust and convincing across these systems.

      Weaknesses

      The mechanistic studies in Figures 6-9 are incomplete. There are several issues encompassing experimental design, rigor, and interpretation that, if properly addressed, would make the study much stronger.

      (1) Exogenous TTR that is endocytosed by cells is unlikely to ever find itself inside the lumen of the ER. Conversely, endogenous TTR that is produced in cells and that has not yet been secreted is almost certain to have an ER lumenal localization (as in Figures 7B and 9A, and where an apparent colocalization with SERCA is likely to be incidental). In a model where TTR, acting as a hepatokine, has inhibitory effects on SERCA, these would almost certainly be realized from the cytosolic side of the ER membrane-a region inaccessible to lumenal endogenous TTR. It is possible that the overexpression and knockdown of endogenous TTR have the effects seen due to its secretion and uptake (that is, cell-non-autonomous effects), but this possibility was not directly tested through Transwell or similar assays. Given the identity of TTR as a secretory pathway client protein, the only localization data for TTR that are unexpected are those suggesting an ER localization of exogenously added TTR (Figure 7A), but this localization seems to involve only a minor population of TTR, is hindered by a technical issue with cell permeabilization (see below), and lacks orthogonal approaches to convincingly demonstrate meaningful localization of exogenous TTR at the ER membrane.

      (2) The experimental logic in Figure 8 is problematic. The authors use Thapsigargin (Tg), a potent and specific SERCA inhibitor, to probe SERCA function. However, since both Tg and TTR are proposed to inhibit SERCA2, the design lacks a critical control to demonstrate that TTR's effects are indeed mediated through SERCA2. SERCA2 activity should, in principle, be fully and irreversibly inhibited by Tg treatment, especially using such a high concentration (5 µM). If TTR's effect on calcium flux is exclusively through SERCA2, then SERCA2 impairment by TTR should have no additional effect in the presence of Tg, as Tg would already be maximally inhibiting the pump. The current data (Figures 8G-H) showing an effect of TTR-KD even with Tg present is difficult to interpret and may suggest off-target or compensatory mechanisms.

      (3) The coIP data in Figure 9 need to be better controlled, including by overexpression of FLAG- and MYC-tagged irrelevant proteins, ideally also localized to the ER. The coIP of overexpressed TTR with endogenous SERCA in Figure 9D, in addition to requiring a more rigorous control, is itself of relatively low quality, with the appearance of a possible gel/blotting artifact.

      (4) The ER stress markers in Figure 6 are not convincing. Molecular weight markers and positive controls (for example, livers from animals injected with tunicamycin) are missing. In addition, the species of ATF6 that is purportedly being detected (cleaved or full-length) is not indicated, and this protein is also notoriously difficult to detect with convincing specificity in mouse tissues. As well, CHOP protein is usually not detectable in control normal diet mouse livers, raising questions of whether the band identified as CHOP is, in fact, CHOP. These issues, along with the observation that ER stress-regulated RNAs are not altered (Figure S5), raise the question of whether ER stress is involved at all. Likewise, the quantification of SERCA2 levels from Figure 6 requires more rigor. For all blots, it isn't clear that analyzing only 3 or 4 of the animals provides adequate and unbiased power to detect differences; in addition, in Figure 6C, at least the SERCA2 exposure (assuming SERCA2 is being specifically detected; see above) is well beyond the linear range of quantification.

      In addition, the following important issues were raised:

      (5) n=4 for overexpression might not provide adequate statistical power.

      (6) The error for human NASH samples and controls in Figure 1A is surprisingly small. Larger gene expression data sets from NASH cohorts exist and should be used to test the finding in a larger population.

      (7) For experiments involving two independent variables (e.g., diet and TTR manipulation, as in Figures 2, 3, 4, 5), a Two-way ANOVA must be used instead of One-way ANOVA or t-tests. Also, the ND-TTR-KD group is missing - these data are an essential control to show the specificity of the knockdown and its effects in a non-diseased state.

      (8) Figure 7A: The co-localization signal between TTR-Alexa488 and the ER marker is not strong or convincing, which could be due to the inappropriate immunofluorescence protocol used, of permeabilization prior to fixation. The standard and recommended order is fixation first (to preserve cellular architecture), followed by permeabilization.

    1. Reviewer #1 (Public review):

      Summary:

      The novel advance by Wang et al is in the demonstration that, relative to a standard extinction procedure, the retrieval-extinction procedure more effectively suppresses responses to a conditioned threat stimulus when testing occurs just minutes after extinction. The authors provide solid evidence to show that this "short-term" suppression of responding involves engagement of the dorsolateral prefrontal cortex.

      Strengths:

      Overall, the study is well-designed and the results are valuable. There are, however, a few issues in the way that it is introduced and discussed. It would have been useful if the authors could have more explicitly related the results to a theory - it would help the reader understand why the results should have come out the way that they did. More specific comments are presented below.

      Please note: The authors appear to have responded to my original review twice. It is not clear that they observed the public review that I edited after the first round of revisions. As part of these edits, I removed the entire section titled Clarifications, Elaborations and Edits

      Theory and Interpretation of Results

      (1) It is difficult to appreciate why the first trial of extinction in a standard protocol does NOT produce the retrieval-extinction effect. This applies to the present study as well as others that have purported to show a retrieval-extinction effect. The importance of this point comes through at several places in the paper. E.g., the two groups in study 1 experienced a different interval between the first and second CS extinction trials; and the results varied with this interval: a longer interval (10 min) ultimately resulted in less reinstatement of fear than a shorter interval. Even if the different pattern of results in these two groups was shown/known to imply two different processes, there is nothing in the present study that addresses what those processes might be. That is, while the authors talk about mechanisms of memory updating, there is little in the present study that permits any clear statement about mechanisms of memory. The references to a "short-term memory update" process do not help the reader to understand what is happening in the protocol.

      In reply to this point, the authors cite evidence to suggest that "an isolated presentation of the CS+ seems to be important in preventing the return of fear expression." They then note the following: "It has also been suggested that only when the old memory and new experience (through extinction) can be inferred to have been generated from the same underlying latent cause, the old memory can be successfully modified (Gershman et al., 2017). On the other hand, if the new experiences are believed to be generated by a different latent cause, then the old memory is less likely to be subject to modification. Therefore, the way the 1st and 2nd CS are temporally organized (retrieval-extinction or standard extinction) might affect how the latent cause is inferred and lead to different levels of fear expression from a theoretical perspective." This merely begs the question: why might an isolated presentation of the CS+ result in the subsequent extinction experiences being allocated to the same memory state as the initial conditioning experiences?<br /> This is not addressed in the paper. The study was not designed to address this question; and that the question did not need to be addressed for the set of results to be interesting. However, understanding how and why the retrieval-extinction protocol produces the effects that it does in the long-term test of fear expression would greatly inform our understanding of how and why the retrieval-extinction protocol has the effects that it does in the short-term tests of fear expression. To be clear; the results of the present study are very interesting - there is no denying that. I am not asking the authors to change anything in response to this point. It simply stands as a comment on the work that has been done in this paper and the area of research more generally.

      (2) The discussion of memory suppression is potentially interesting but raises many questions. That is, memory suppression is invoked to explain a particular pattern of results but I, as the reader, have no sense of why a fear memory would be better suppressed shortly after the retrieval-extinction protocol compared to the standard extinction protocol; and why this suppression is NOT specific to the cue that had been subjected to the retrieval-extinction protocol. I accept that the present study was not intended to examine aspects of memory suppression, and that it is a hypothesis proposed to explain the results collected in this study. I am not asking the authors to change anything in response to this point. Again, it simply stands as a comment on the work that has been done in this paper.

      (3) The authors have inserted the following text in the revised manuscript: "It should be noted that while our long-term amnesia results were consistent with the fear memory reconsolidation literatures, there were also studies that failed to observe fear prevention (Chalkia, Schroyens, et al., 2020; Chalkia, Van Oudenhove, et al., 2020; Schroyens et al., 2023). Although the memory reconsolidation framework provides a viable explanation for the long-term amnesia, more evidence is required to validate the presence of reconsolidation, especially at the neurobiological level (Elsey et al., 2018). While it is beyond the scope of the current study to discuss the discrepancies between these studies, one possibility to reconcile these results concerns the procedure for the retrieval-extinction training. It has been shown that the eligibility for old memory to be updated is contingent on whether the old memory and new observations can be inferred to have been generated by the same latent cause (Gershman et al., 2017; Gershman and Niv, 2012). For example, prevention of the return of fear memory can be achieved through gradual extinction paradigm, which is thought to reduce the size of prediction errors to inhibit the formation of new latent causes (Gershman, Jones, et al., 2013). Therefore, the effectiveness of the retrieval-extinction paradigm might depend on the reliability of such paradigm in inferring the same underlying latent cause." ***It is perfectly fine to state that "the effectiveness of the retrieval-extinction paradigm might depend on the reliability of such paradigm in inferring the same underlying latent cause..." This is not uninteresting; but it also isn't saying much. Ideally, the authors would have included some statement about factors that are likely to determine whether one is or isn't likely to see a retrieval-extinction effect, grounded in terms of the latent state theories that have been invoked here. Presumably, the retrieval-extinction protocol has variable effects because of procedural differences that affect whether subjects infer the same underlying latent cause when shifted into extinction. Surely, the clinical implications of any findings are seriously curtailed unless one understands when a protocol is likely to produce an effect; and why the effect occurs at all? This question is rhetorical. I am not asking the authors to change anything in response to this point. Again, it stands as a comment on the work that has been done in this paper; and remains a comment after insertion of the new text, which is acknowledged and appreciated.

      (4) The authors find different patterns of responses to CS1 and CS2 when they were tested 30 min after extinction versus 24 h after extinction. On this basis, they infer distinct memory update mechanisms. However, I still can't quite see why the different patterns of responses at these two time points after extinction need to be taken to infer different memory update mechanisms. That is, the different patterns of responses at the two time points could be indicative of the same "memory update mechanism" in the sense that the retrieval-extinction procedure induces a short-term memory suppression that serves as the basis for the longer-term memory suppression (i.e., the reconsolidation effect). My pushback on this point is based on the notion of what constitutes a memory update mechanism; and is motivated by what I take to be a rather loose use of language/terminology in the reconsolidation literature and this paper specifically (for examples, see the title of the paper and line 2 of the abstract).

      To be clear: I accept the authors' reply that "The focus of the current manuscript is to demonstrate that the retrieval-extinction paradigm can also facilitate a short-term fear memory deficit measured by SCR". However, I disagree with the claim that any short-term fear memory deficit must be indicative of "update mechanisms other than reconsolidation", which appears on Line 27 in the abstract and very much indicates the spirit of the paper. To make the point: the present study has examined the effectiveness of a retrieval-extinction procedure in suppressing fear responses 30 min, 6 hours and 24 hours after extinction. There are differences across the time points in terms of the level of suppression, its cue specificity, and its sensitivity to manipulation of activity in the dlPFC. This is perfectly interesting when not loaded with additional baggage re separable mechanisms of memory updating at the short and long time points: there is simply no evidence in this study or anywhere else that the short-term deficit in suppression of fear responses has anything whatsoever to do with memory updating. It can be exactly what is implied by the description: a short-term deficit in the suppression of fear responses. Again, this stands as a comment on the work that has been done; and remains a comment for the revised paper.

      (5) It is not clear why thought control ability ought to relate to any aspect of the suppression that was evident in the 30 min tests - that is, I accept the correlation between thought control ability and performance in the 30 min tests but would have liked to know why this was looked at in the first place and what, if anything, it means. The issue at hand is that, as best as I can tell, there is no theory to which the result from the short- and long-term tests can be related. The attempts to fill this gap with reference to phenomena like retrieval-induced forgetting are appreciated but raise more questions than answers. This is especially clear in the discussion, where it is acknowledged/stated: "Inspired by the similarities between our results and suppression-induced declarative memory amnesia (Gagnepain et al., 2017), we speculate that the retrieval-extinction procedure might facilitate a spontaneous memory suppression process and thus yield a short-term amnesia effect. Accordingly, the activated fear memory induced by the retrieval cue would be subjected to an automatic fear memory suppression through the extinction training (Anderson and Floresco, 2022)." There is nothing in the subsequent discussion to say why this should have been the case other than the similarity between results obtained in the present study and those in the literature on retrieval induced forgetting, where the nature of the testing is quite different. Again, this is simply a comment on the work that has been done - no change is required for the revised paper.

    2. Reviewer #2 (Public review):

      Summary

      The study investigated whether memory retrieval followed soon by extinction training results in a short-term memory deficit when tested - with a reinstatement test that results in recovery from extinction - soon after extinction training. Experiment 1 documents this phenomenon using a between-subjects design. Experiment 2 used a within-subject control and sees that the effect is also observed in a control condition. In addition, it also revealed that if testing is conducted 6 hours after extinction, there is not effect of retrieval prior to extinction as there is recovery from extinction independently of retrieval prior to extinction. A third Group also revealed that retrieval followed by extinction attenuates reinstatement when the test is conducted 24 hours later, consistent with previous literature. Finally, Experiment 3 used continuous theta-burst stimulation of the dorsolateral prefrontal cortex and assessed whether inhibition of that region (vs a control region) reversed the short-term effect revealed in Experiments 1 and 2. The results of control groups in Experiment 3 replicated the previous findings (short-term effect), and the experimental group revealed that these can be reversed by inhibition of the dorsolateral prefrontal cortex.

      Strengths

      The work is performed using standard procedures (fear conditioning and continuous theta-burst stimulation) and there is some justification of the sample sizes. The results replicate previous findings - some of which have been difficult to replicate and this needs to be acknowledged - and suggest that the effect can also be observed in a short-term reinstatement test.

      The study establishes links between the memory reconsolidation and retrieval-induced forgetting (or memory suppression) literatures. The explanations that have been developed for these are distinct and the current results integrate these, by revealing that the DLPFC activity involved in retrieval-extinction short-term effect. There is thus some novelty in the present results, but numerous questions remain unaddressed.

      Weakness

      The fear acquisition data is converted to a differential fear SCR and this is what is analysed (early vs late). However, the figure shows the raw SCR values for CS+ and CS- and therefore it is unclear whether acquisition was successful (despite there being an "early" vs "late" effect - no descriptives are provided).

      In Experiment 1 (Test results) it is unclear whether the main conclusion stems from a comparison of the test data relative to the last extinction trial ("we defined the fear recovery index as the SCR difference between the first test trial and the last extinction trial for a specific CS") or the difference relative to the CS- ("differential fear recovery index between CS+ and CS-"). It would help the reader assess the data if Fig 1e presents all the indexes (both CS+ and CS-). In addition, there is one sentence which I could not understand "there is no statistical difference between the differential fear recovery indexes between CS+ in the reminder and no reminder groups (P=0.048)". The p value suggests that there is a difference, yet it is not clear what is being compared here. Critically, any index taken as a difference relative to the CS- can indicate recovery of fear to the CS+ or absence of discrimination relative to the CS-, so ideally the authors would want to directly compare responses to the CS+ in the reminder and no-reminder groups. In the absence of such comparison, little can be concluded, in particular if SCR CS- data is different between groups. The latter issue is particularly relevant in Experiment 2, in which the CS- seems to vary between groups during the test and this can obscure the interpretation of the result.

      In experiment 1, the findings suggest that there is a benefit of retrieval followed by extinction in a short-term reinstatement test. In Experiment 2, the same effect is observed to a cue which did not undergo retrieval before extinction (CS2+), a result that is interpreted as resulting from cue-independence, rather than a failure to replicate in a within-subjects design the observations of Experiment 1 (between-subjects). Although retrieval-induced forgetting is cue-independent (the effect on items that are supressed [Rp-] can be observed with an independent probe), it is not clear that the current findings are similar, and thus that the strong parallels made are not warranted. Here, both cues have been extinguished and therefore been equally exposed during the critical stage.

      The findings in Experiment 2 suggest that the amnesia reported in experiment 1 is transient, in that no effect is observed when the test is delayed by 6 hours. The phenomena whereby reactivated memories transition to extinguished memories as a function of the amount of exposure (or number of trials) is completely different from the phenomena observed here. In the former, the manipulation has to do with the number of trials (or total amount of time) that the cues are exposed. In the current Experiment 2, the authors did not manipulate the number of trials but instead the retention interval between extinction and test. The finding reported here is closer to a "Kamin effect", that is the forgetting of learned information which is observed with intervals of intermediate length (Baum, 1968). Because the Kamin effect has been inferred to result from retrieval failure, it is unclear how this can be explained here. There needs to be much more clarity on the explanations to substantiate the conclusions.

      There are many results (Ryan et al., 2015) that challenge the framework that the authors base their predictions on (consolidation and reconsolidation theory), therefore these need to be acknowledged. These studies showed that memory can be expressed in the absence of the biological machinery thought to be needed for memory performance. The authors should be careful about statements such as "eliminate fear memores" for which there is little evidence.

      The parallels between the current findings and the memory suppression literature are speculated in the general discussion, and there is the conclusion that "the retrieval-extinction procedure might facilitate a spontaneous memory suppression process". Because one of the basic tenets of the memory suppression literature is that it reflects an "active suppression" process, there is no reason to believe that in the current paradigm the same phenomenon is in place, but instead it is "automatic". In other words, the conclusions make strong parallels with the memory suppression (and cognitive control) literature, yet the phenomena that they observed is thought to be passive (or spontaneous/automatic). Ultimately, it is unclear why 10 mins between the reminder and extinction learning will "automatically" supress fear memories. Further down in the discussion it is argued that "For example, in the well-known retrieval-induced forgetting (RIF) phenomenon, the recall of a stored memory can impair the retention of related long-term memory and this forgetting effect emerges as early as 20 minutes after the retrieval procedure, suggesting memory suppression or inhibition can occur in a more spontaneous and automatic manner". I did not follow with the time delay between manipulation and test (20 mins) would speak about whether the process is controlled or automatic. In addition, the links with the "latent cause" theoretical framework are weak if any. There is little reason to believe that one extinction trial, separated by 10 mins from the rest of extinction trials, may lead participants to learn that extinction and acquisition have been generated by the same latent cause.

      Among the many conclusions, one is that the current study uncovers the "mechanism" underlying the short-term effects of retrieval-extinction. There is little in the current report that uncovers the mechanism, even in the most psychological sense of the mechanism, so this needs to be clarified. The same applies to the use of "adaptive".

      Whilst I could access the data in the OFS site, I could not make sense of the Matlab files as there is no signposting indicating what data is being shown in the files. Thus, as it stands, there is no way of independently replicating the analyses reported.<br /> The supplemental material shows figures with all participants, but only some statistical analyses are provided, and sometimes these are different from those reported in the main manuscript. For example, the test data in Experiment 1 is analysed with a two-way ANOVA with main effects of group (reminder vs no-reminder) and time (last trial of extinction vs first trial of test) in the main report. The analyses with all participants in the sup mat used a mixed two-way ANOVA with group (reminder vs no reminder) and CS (CS+ vs CS-). This makes it difficult to assess the robustness of the results when including all participants. In addition, in the supplementary materials there are no figures and analyses for Experiment 3.

      One of the overarching conclusions is that the "mechanisms" underlying reconsolidation (long term) and memory suppression (short term) phenomena are distinct, but memory suppression phenomena can also be observed after a 7-day retention interval (Storm et al., 2012), which then questions the conclusions achieved by the current study.

      References:

      Baum, M. (1968). Reversal learning of an avoidance response and the Kamin effect. Journal of Comparative and Physiological Psychology, 66(2), 495.<br /> Chalkia, A., Schroyens, N., Leng, L., Vanhasbroeck, N., Zenses, A. K., Van Oudenhove, L., & Beckers, T. (2020). No persistent attenuation of fear memories in humans: A registered replication of the reactivation-extinction effect. Cortex, 129, 496-509.<br /> Ryan, T. J., Roy, D. S., Pignatelli, M., Arons, A., & Tonegawa, S. (2015). Engram cells retain memory under retrograde amnesia. Science, 348(6238), 1007-1013.<br /> Storm, B. C., Bjork, E. L., & Bjork, R. A. (2012). On the durability of retrieval-induced forgetting. Journal of Cognitive Psychology, 24(5), 617-629.

      Comments on revisions:

      Thanks to the authors for trying to address my concerns.

      (1 and 2) My point about evidence for learning relates to the fact that in none of the experiments an increase in SCR to the CSs+ is observed during training (in Experiment 1 CS+/CS- differences are even present from the outset), instead what happens is that participants learn to discriminate between the CS+ and CS- and decrease their SCR responding to the safe CS-. This begs the question as to what is being learned, given that the assumption is that the retrieval-extinction treatment is concerned with the excitatory memory (CS+) rather than the CS+/CS- discrimination. For example, Figures 6A and 6B have short/Long term amnesia in the right axes, but it is unclear from the data what memory is being targeted. In Figure 6C, the right panels depicting Suppression and Reconsolidation mechanisms suggest that it is the CS+ memory that is being targeted. Because the dependent measure (differential SCR) captures how well the discrimination was learned (this point relates to point 2 which the authors now acknowledge that there are differences between groups in responding to the CS-), then I struggle to see how the data supports these CS+ conclusions. The fact that influential papers have used this dependent measure (i.e., differential SCR) does not undermine the point that differences between groups at test are driven by differences in responding to the CS-.

      (3, 4 and 5) The authors have qualified some of the statements, yet I fail to see some of these parallels. Much of the discussion is speculative and ultimately left for future research to address.

      (6) I can now make more sense of the publicly available data, although the files would benefit from an additional column that distinguishes between participants that were included in the final analyses (passed the multiple criteria = 1) and those who did not (did not pass the criteria = 0). Otherwise, anyone who wants to replicate these analyses needs to decipher the multiple inclusion criteria and apply it to the dataset.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Introduction & Theory

      (1) It is difficult to appreciate why the first trial of extinction in a standard protocol does NOT produce the retrieval-extinction effect. This applies to the present study as well as others that have purported to show a retrieval-extinction effect. The importance of this point comes through at several places in the paper. E.g., the two groups in Study 1 experienced a different interval between the first and second CS extinction trials; and the results varied with this interval: a longer interval (10 min) ultimately resulted in less reinstatement of fear than a shorter interval. Even if the different pattern of results in these two groups was shown/known to imply two different processes, there is nothing in the present study that addresses what those processes might be. That is, while the authors talk about mechanisms of memory updating, there is little in the present study that permits any clear statement about mechanisms of memory. The references to a "short-term memory update" process do not help the reader to understand what is happening in the protocol.

      We agree with the reviewer that whether and how the retrieval-extinction paradigm works is still under debate. Our results provide another line of evidence that such a paradigm is effective in producing long term fear amnesia. The focus of the current manuscript is to demonstrate that the retrieval-extinction paradigm can also facilitate a short-term fear memory deficit measured by SCR. Our TMS study provided some preliminary evidence in terms of the brain mechanisms involved in the causal relationship between the dorsolateral prefrontal cortex (dlPFC) activity and the short-term fear amnesia and showed that both the retrieval interval and the intact dlPFC activity were necessary for the short-term fear memory deficit and accordingly were referred to as the “mechanism” for memory update. We acknowledge that the term “mechanism” might have different connotations for different researchers. We now more explicitly clarify what we mean by “mechanisms” in the manuscript (line 99) as follows:

      “In theory, different cognitive mechanisms underlying specific fear memory deficits, therefore, can be inferred based on the difference between memory deficits.”

      In reply to this point, the authors cite evidence to suggest that "an isolated presentation of the CS+ seems to be important in preventing the return of fear expression." They then note the following: "It has also been suggested that only when the old memory and new experience (through extinction) can be inferred to have been generated from the same underlying latent cause, the old memory can be successfully modified (Gershman et al., 2017). On the other hand, if the new experiences are believed to be generated by a different latent cause, then the old memory is less likely to be subject to modification. Therefore, the way the 1stand 2ndCS are temporally organized (retrieval-extinction or standard extinction) might affect how the latent cause is inferred and lead to different levels of fear expression from a theoretical perspective." This merely begs the question: why might an isolated presentation of the CS+ result in the subsequent extinction experiences being allocated to the same memory state as the initial conditioning experiences? This is not yet addressed in any way.

      As in our previous response, this manuscript is not about investigating the cognitive mechanism why and how an isolated presentation of the CS+ would suppress fear expression in the long term. As the reviewer is aware, and as we have addressed in our previous response letters, both the positive and negative evidence abounds as to whether the retrieval-extinction paradigm can successfully suppress the long-term fear expression. Previous research depicted mechanisms instigated by the single CS+ retrieval at the molecular, cellular, and systems levels, as well as through cognitive processes in humans. In the current manuscript, we simply set out to test that in addition to the long-term fear amnesia, whether the retrieval-extinction paradigm can also affect subjects’ short-term fear memory.

      (2) The discussion of memory suppression is potentially interesting but, in its present form, raises more questions than it answers. That is, memory suppression is invoked to explain a particular pattern of results but I, as the reader, have no sense of why a fear memory would be better suppressed shortly after the retrieval-extinction protocol compared to the standard extinction protocol; and why this suppression is NOT specific to the cue that had been subjected to the retrieval-extinction protocol.

      Memory suppression is the hypothesis we proposed that might be able to explain the results we obtained in the experiments. We discussed the possibility of memory suppression and listed the reasons why such a mechanism might be at work. As we mentioned in the manuscript, our findings are consistent with the memory suppression mechanism on at least two aspects: 1) cue-independence and 2) thought-control ability dependence. We agree that the questions raised by the reviewer are interesting but to answer these questions would require a series of further experiments to disentangle all the various variables and conceptual questions about the purpose of a phenomenon, which we are afraid is out of the scope of the current manuscript. We refer the reviewer to the discussion section where memory suppression might be the potential mechanism for the short-term amnesia we observed (lines 562-569) as follows:

      “Previous studies indicate that a suppression mechanism can be characterized by three distinct features: first, the memory suppression effect tends to emerge early, usually 10-30 mins after memory suppression practice and can be transient (MacLeod and Macrae, 2001; Saunders and MacLeod, 2002); second, the memory suppression practice seems to directly act upon the unwanted memory itself (Levy and Anderson, 2002), such that the presentation of other cues originally associated with the unwanted memory also fails in memory recall (cue-independence); third, the magnitude of memory suppression effects is associated with individual difference in control abilities over intrusive thoughts (Küpper et al., 2014).”

      (3) Relatedly, how does the retrieval-induced forgetting (which is referred to at various points throughout the paper) relate to the retrieval-extinction effect? The appeal to retrieval-induced forgetting as an apparent justification for aspects of the present study reinforces points 2 and 3 above. It is not uninteresting but lacks clarification/elaboration and, therefore, its relevance appears superficial at best.

      We brought the topic of retrieval-induced forgetting (RIF) to stress the point that memory suppression can be unconscious. In a standard RIF paradigm, unlike the think/no-think paradigm, subjects are not explicitly told to suppress the non-target memories. However, to successfully retrieve the target memory, the cognitive system actively inhibits the non-target memories, effectively implementing a memory suppression mechanism (though unconsciously). Therefore, it is possible our results might be explained by the memory suppression framework. We elaborated this point in the discussion section (lines 578-584): 

      “In our experiments, subjects were not explicitly instructed to suppress their fear expression, yet the retrieval-extinction training significantly decreased short-term fear expression. These results are consistent with the short-term amnesia induced with the more explicit suppression intervention (Anderson et al., 1994; Kindt and Soeter, 2018; Speer et al., 2021; Wang et al., 2021; Wells and Davies, 1994). It is worth noting that although consciously repelling unwanted memory is a standard approach in memory suppression paradigm, it is possible that the engagement of the suppression mechanism can be unconscious.”

      (4) I am glad that the authors have acknowledged the papers by Chalkia, van Oudenhove & Beckers (2020) and Chalkia et al (2020), which failed to replicate the effects of retrieval-extinction reported by Schiller et al in Reference 6. The authors have inserted the following text in the revised manuscript: "It should be noted that while our long-term amnesia results were consistent with the fear memory reconsolidation literature, there were also studies that failed to observe fear prevention (Chalkia, Schroyens, et al., 2020; Chalkia, Van Oudenhove, et al., 2020; Schroyens et al., 2023). Although the memory reconsolidation framework provides a viable explanation for the long-term amnesia, more evidence is required to validate the presence of reconsolidation, especially at the neurobiological level (Elsey et al., 2018). While it is beyond the scope of the current study to discuss the discrepancies between these studies, one possibility to reconcile these results concerns the procedure for the retrieval-extinction training. It has been shown that the eligibility for old memory to be updated is contingent on whether the old memory and new observations can be inferred to have been generated by the same latent cause (Gershman et al., 2017; Gershman and Niv, 2012). For example, prevention of the return of fear memory can be achieved through gradual extinction paradigm, which is thought to reduce the size of prediction errors to inhibit the formation of new latent causes (Gershman, Jones, et al., 2013). Therefore, the effectiveness of the retrieval-extinction paradigm might depend on the reliability of such paradigm in inferring the same underlying latent cause." Firstly, if it is beyond the scope of the present study to discuss the discrepancies between the present and past results, it is surely beyond the scope of the study to make any sort of reference to clinical implications!!!

      As we have clearly stated in our manuscript that this paper was not about discussing why some literature was or was not able to replicate the retrieval-extinction results originally reported by Schiller et al. 2010. Instead, we aimed to report a novel short-term fear amnesia through the retrieval-extinction paradigm, above and beyond the long-term amnesia reported before. Speculating about clinical implications of these finding is unrelated to the long-term, amnesia debate in the reconsolidation world. We now refer the reader to several perspectives and reviews that have proposed ways to resolve these discrepancies as follows (lines 642-673).

      Secondly, it is perfectly fine to state that "the effectiveness of the retrieval-extinction paradigm might depend on the reliability of such paradigm in inferring the same underlying latent cause..." This is not uninteresting, but it also isn't saying much. Minimally, I would expect some statement about factors that are likely to determine whether one is or isn't likely to see a retrieval-extinction effect, grounded in terms of this theory.

      Again, as we have responded many times, we simply do not know why some studies were able to suppress the fear expression using the retrieval-extinction paradigm and other studies weren’t. This is still an unresolved issue that the field is actively engaging with, and we now refer the reader to several papers dealing with this issue. However, this is NOT the focus of our manuscript. Having a healthy debate does not mean that every study using the retrieval-extinction paradigm must address the long-standing question of why the retrieval-extinction paradigm is effective (at least in some studies).

      Clarifications, Elaborations, Edits

      (5) Some parts of the paper are not easy to follow. Here are a few examples (though there are others):

      (a) In the abstract, the authors ask "whether memory retrieval facilitates update mechanisms other than memory reconsolidation"... but it is never made clear how memory retrieval could or should "facilitate" a memory update mechanism.

      We meant to state that the retrieval-extinction paradigm might have effects on fear memory, above and beyond the purported memory reconsolidation effect. Sentence modified (lines 25-26) as follows:

      “Memory reactivation renders consolidated memory fragile and thereby opens the window for memory updates, such as memory reconsolidation.”

      (b) The authors state the following: "Furthermore, memory reactivation also triggers fear memory reconsolidation and produces cue specific amnesia at a longer and separable timescale (Study 2, N = 79 adults)." Importantly, in study 2, the retrieval-extinction protocol produced a cue-specific disruption in responding when testing occurred 24 hours after the end of extinction. This result is interesting but cannot be easily inferred from the statement that begins "Furthermore..." That is, the results should be described in terms of the combined effects of retrieval and extinction, not in terms of memory reactivation alone; and the statement about memory reconsolidation is unnecessary. One can simply state that the retrieval-extinction protocol produced a cue-specific disruption in responding when testing occurred 24 hours after the end of extinction.

      The sentence the reviewer referred to was in our original manuscript submission but had since been modified based on the reviewer’s comments from last round of revision. Please see the abstract (lines 30-35) of our revised manuscript from last round of revision:

      “Furthermore, across different timescales, the memory retrieval-extinction paradigm triggers distinct types of fear amnesia in terms of cue-specificity and cognitive control dependence, suggesting that the short-term fear amnesia might be caused by different mechanisms from the cue-specific amnesia at a longer and separable timescale (Study 2, N = 79 adults).”

      (c) The authors also state that: "The temporal scale and cue-specificity results of the short-term fear amnesia are clearly dissociable from the amnesia related to memory reconsolidation, and suggest that memory retrieval and extinction training trigger distinct underlying memory update mechanisms." ***The pattern of results when testing occurred just minutes after the retrieval-extinction protocol was different to that obtained when testing occurred 24 hours after the protocol. Describing this in terms of temporal scale is unnecessary; and suggesting that memory retrieval and extinction trigger different memory update mechanisms is not obviously warranted. The results of interest are due to the combined effects of retrieval+extinction and there is no sense in which different memory update mechanisms should be identified with the different pattern of results obtained when testing occurred either 30 min or 24 hours after the retrieval-extinction protocol (at least, not the specific pattern of results obtained here).

      Again, we are afraid that the reviewer referred to the abstract in the original manuscript submission, instead of the revised abstract we submitted in the last round. Please see lines 37-39 of the revised abstract where the sentence was already modified (or the abstract from last round of revision).

      The facts that the 30min, 6hr and 24hr test results are different in terms of their cue-specificity and thought-control ability dependence are, to us, an important discovery in terms of delineating different cognitive processes at work following the retrieval-extinction paradigm. We want to emphasize that the fear memories after going through the retrieval-extinction paradigm showed interesting temporal dynamics in terms of their magnitudes, cue-specificity and thought-control ability dependence.

      (d) The authors state that: "We hypothesize that the labile state triggered by the memory retrieval may facilitate different memory update mechanisms following extinction training, and these mechanisms can be further disentangled through the lens of temporal dynamics and cue-specificities." *** The first part of the sentence is confusing around usage of the term "facilitate"; and the second part of the sentence that references a "lens of temporal dynamics and cue-specificities" is mysterious. Indeed, as all rats received the same retrieval-extinction exposures in Study 2, it is not clear how or why any differences between the groups are attributed to "different memory update mechanisms following extinction"

      The term “facilitate” was used to highlight the fact that the short-term fear amnesia effect is also memory retrieval dependent, as study 1 demonstrated. The novelty of the short-term fear memory deficit can be distinguished from the long-term memory effect via cue-specificity and thought-control ability dependence. Sentence has been modified (lines 97-101) as follows:

      “We hypothesize that the labile state triggered by the memory retrieval may facilitate different memory deficits following extinction training, and these deficits can be further disentangled through the lens of temporal dynamics and cue-specificities. In theory, different cognitive mechanisms underlying specific fear memory deficits, therefore, can be inferred based on the difference between memory deficits.”

      Data

      (6A) The eight participants who were discontinued after Day 1 in Study 1 were all from the no reminder group. The authors should clarify how participants were allocated to the two groups in this experiment so that the reader can better understand why the distribution of non-responders was non-random (as it appears to be).

      (6B) Similarly, in study 2, of the 37 participants that were discontinued after Day 2, 19 were from Group 30 min and 5 were from Group 6 hours. The authors should comment on how likely these numbers are to have been by chance alone. I presume that they reflect something about the way that participants were allocated to groups: e.g., the different groups of participants in studies 1 and 2 could have been run at quite different times (as opposed to concurrently). If this was done, why was it done? I can't see why the study should have been conducted in this fashion - this is for myriad reasons, including the authors' concerns re SCRs and their seasonal variations.

      As we responded in the previous response letters (as well as in the revised the manuscript), subjects were excluded because their SCR did not reach the threshold of 0.02 S when electric shock was applied. Subjects were assigned to different treatments daily (eg. Day 1 for the reminder group and Day 2 for no-reminder group) to avoid potential confusion in switching protocols to different subjects within the same day. We suspect that the non-responders might be related to the body thermal conditions caused by the lack of central heating for specific dates. Please note that the discontinued subjects (non-responders) were let go immediately after the failure to detect their SCR (< 0.02 S) on Day 1 and never invited back on Day 2, so it’s possible that the discontinued subjects were all from certain dates on which the body thermal conditions were not ideal for SCR collection. Despite the number of excluded subjects, we verified the short-term fear amnesia effect in three separate studies, which to us should serve as strong evidence in terms of the validity of the effect.

      (6C) In study 2, why is responding to the CS- so high on the first test trial in Group 30 min? Is the change in responding to the CS- from the last extinction trial to the first test trial different across the three groups in this study? Inspection of the figure suggests that it is higher in Group 30 min relative to Groups 6 hours and 24 hours. If this is confirmed by the analysis, it has implications for the fear recovery index which is partly based on responses to the CS-. If not for differences in the CS- responses, Groups 30 min and 6 hours are otherwise identical. That is, the claim of differential recovery to the CS1 and CS2 across time may simply an artefact of the way that the recovery index was calculated. This is unfortunate but also an important feature of the data given the way in which the fear recovery index was calculated.

      We have provided detailed analysis to this question in our previous response letter, and we are posting our previous response there:

      Following the reviewer’s comments, we went back and calculated the mean SCR difference of CS- between the first test trial and the last extinction trial for all three studies (see Author response image 1 below). In study 1, there was no difference in the mean CS- SCR (between the first test trial and last extinction trial) between the reminder and no-reminder groups (Kruskal-Wallis test , though both groups showed significant fear recovery even in the CS- condition (Wilcoxon signed rank test, reminder: P = 0.0043, no-reminder: P = 0.0037). Next, we examined the mean SCR for CS- for the 30min, 6h and 24h groups in study 2 and found that there was indeed a group difference (one-way ANOVA,F<sub>2.76</sub> = 5.3462, P = 0.0067, panel b), suggesting that the CS- related SCR was influenced by the test time (30min, 6h or 24h). We also tested the CS- related SCR for the 4 groups in study 3 (where test was conducted 1 hour after the retrieval-extinction training) and found that across TMS stimulation types (PFC vs. VER) and reminder types (reminder vs. no-reminder) the ANOVA analysis did not yield main effect of TMS stimulation type (F<sub>1.71</sub> = 0.322, P = 0.572) nor main effect of reminder type (F<sub>1.71</sub> = 0.0499, P = 0.824, panel c). We added the R-VER group results in study 3 (see panel c) to panel b and plotted the CS- SCR difference across 4 different test time points and found that CS- SCR decreased as the test-extinction delay increased (Jonckheere-Terpstra test, P = 0.00028). These results suggest a natural “forgetting” tendency for CS- related SCR and highlight the importance of having the CS- as a control condition to which the CS+ related SCR was compared with.

      Author response image 1.

      (6D) The 6 hour group was clearly tested at a different time of day compared to the 30 min and 24 hour groups. This could have influenced the SCRs in this group and, thereby, contributed to the pattern of results obtained.

      Again, we answered this question in our previous response. Please see the following for our previous response:

      For the 30min and 24h groups, the test phase can be arranged in the morning, in the afternoon or at night. However, for the 6h group, the test phase was inevitably in the afternoon or at night since we wanted to exclude the potential influence of night sleep on the expression of fear memory (see Author response table 1 below). If we restricted the test time in the afternoon or at night for all three groups, then the timing of their extinction training was not matched.

      Author response table 1.

      Nevertheless, we also went back and examined the data for the subjects only tested in the afternoon or at nights in the 30min and 24h groups to match with the 6h group where all the subjects were tested either in the afternoon or at night. According to the table above, we have 17 subjects for the 30min group (9+8),18 subjects for the 24h group (9 + 9) and 26 subjects for the 6h group (12 + 14). As Author response image 2 shows, the SCR patterns in the fear acquisition, extinction and test phases were similar to the results presented in the original figure.

      Author response image 2.

      (6E) The authors find different patterns of responses to CS1 and CS2 when they were tested 30 min after extinction versus 24 h after extinction. On this basis, they infer distinct memory update mechanisms. However, I still can't quite see why the different patterns of responses at these two time points after extinction need to be taken to infer different memory update mechanisms. That is, the different patterns of responses at the two time points could be indicative of the same "memory update mechanism" in the sense that the retrieval-extinction procedure induces a short-term memory suppression that serves as the basis for the longer-term memory suppression (i.e., the reconsolidation effect). My pushback on this point is based on the notion of what constitutes a memory update mechanism; and is motivated by what I take to be a rather loose use of language/terminology in the reconsolidation literature and this paper specifically (for examples, see the title of the paper and line 2 of the abstract).

      As we mentioned previously, the term “mechanism” might have different connotations for different researchers. We aim to report a novel memory deficit following the retrieval-extinction paradigm, which differed significantly from the purported reconsolidation related long-term fear amnesia in terms of its timescale, cue-specificity and thought-control ability. Further TMS study confirmed that the intact dlPFC function is necessary for the short-term memory deficit. It’s based on these results we proposed that the short-term fear amnesia might be related to a different cognitive “mechanism”. As mentioned above, we now clarify what we mean by “mechanism” in the abstract and introduction (lines 31-34, 97-101).

      Reviewer #2 (Public review):

      The fear acquisition data is converted to a differential fear SCR and this is what is analysed (early vs late). However, the figure shows the raw SCR values for CS+ and CS- and therefore it is unclear whether acquisition was successful (despite there being an "early" vs "late" effect - no descriptives are provided).

      (1) There are still no descriptive statistics to substantiate learning in Experiment 1.

      We answered this question in our previous response letter. We are sorry that the definition of “early” and “late” trials was scattered in the manuscript. For example, we wrote “the late phase of acquisition (last 5 trials)” (Line 375-376) in the results section. Since there were 10 trials in total for the acquisition stage, we define the first 5 trials and the last 5 trials as “early” and “late” phases of the acquisition stage and explicitly added them into the first occasion “early” and “late” terms appeared (lines 316-318).

      In the results section, we did test whether the acquisition was successful in our previous manuscript (Line 316-325):

      “To assess fear acquisition across groups (Figure 1B and C), we conducted a mixed two-way ANOVA of group (reminder vs. no-reminder) x time (early vs. late part of the acquisition; first 5 and last 5 trials, correspondingly) on the differential fear SCR. Our results showed a significant main effect of time (early vs. late; F<sub>1,55</sub> \= 6.545, P \= 0.013, η<sup>2</sup> \= 0.106), suggesting successful fear acquisition in both groups. There was no main effect of group (reminder vs. no-reminder) or the group x time interaction (group: F<sub>1,55</sub> \= 0.057, P \= 0.813, η<sup>2</sup> \= 0.001; interaction: F<sub>1,55</sub> \= 0.066, P \= 0.798, η<sup>2</sup> \= 0.001), indicating similar levels of fear acquisition between two groups. Post-hoc t-tests confirmed that the fear responses to the CS+ were significantly higher than that of CS- during the late part of acquisition phase in both groups (reminder group: t<sub>29</sub> \= 6.642, P < 0.001; no-reminder group: t<sub>26</sub> = 8.522, P < 0.001; Figure 1C). Importantly, the levels of acquisition were equivalent in both groups (early acquisition: t<sub>55</sub> \= -0.063, P \= 0.950; late acquisition: t<sub>55</sub> \= -0.318, P \= 0.751; Figure 1C).”

      In Experiment 1 (Test results) it is unclear whether the main conclusion stems from a comparison of the test data relative to the last extinction trial ("we defined the fear recovery index as the SCR difference between the first test trial and the last extinction trial for a specific CS") or the difference relative to the CS- ("differential fear recovery index between CS+ and CS-"). It would help the reader assess the data if Fig 1e presents all the indexes (both CS+ and CS-). In addition, there is one sentence which I could not understand "there is no statistical difference between the differential fear recovery indexes between CS+ in the reminder and no reminder groups (P=0.048)". The p value suggests that there is a difference, yet it is not clear what is being compared here. Critically, any index taken as a difference relative to the CS- can indicate recovery of fear to the CS+ or absence of discrimination relative to the CS-, so ideally the authors would want to directly compare responses to the CS+ in the reminder and no-reminder groups. In the absence of such comparison, little can be concluded, in particular if SCR CS- data is different between groups. The latter issue is particularly relevant in Experiment 2, in which the CS- seems to vary between groups during the test and this can obscure the interpretation of the result.

      (2) In the revised analyses, the authors now show that CS- changes in different groups (for example, Experiment 2) so this means that there is little to conclude from the differential scores because these depend on CS-. It is unclear whether the effects arise from CS+ performance or the differential which is subject to CS- variations.

      There was a typo in the “P = 0.048” sentence and we have corrected it in our last response letter. Also in the previous response letter, we specifically addressed how the fear recovery index was defined (also in the revised manuscript).

      In most of the fear conditioning studies, CS- trials were included as the baseline control. In turn, most of the analyses conducted also involved comparisons between different groups. Directly comparing CS+ trials across groups (or conditions) is rare. In our study 2, we showed that the CS- response decreased as a function of testing delays (30min, 1hr, 6hr and 24hr). Ideally, it would be nice to show that the CS- across groups/conditions did not change. However, even in those circumstances, comparisons are still based on the differential CS response (CS+ minus CS-), that is, the difference of difference. It is also important to note that difference score is important as CS+ alone or across conditions is difficult to interpret, especially in humans, due to noise, signal fluctuations, and irrelevant stimulus features; therefore trials-wise reference is essential to assess the CS+ in the context of a reference stimulus in each trial (after all, the baselines are different). We are listing a few influential papers in the field that the CS- responses were not particularly equivalent across groups/conditions and argue that this is a routine procedure (Kindt & Soeter 2018 Figs. 2-3; Sevenster et al., 2013 Fig. 3; Liu et al., 2014 Fig. 1; Raio et al., 2017 Fig. 2).

      In experiment 1, the findings suggest that there is a benefit of retrieval followed by extinction in a short-term reinstatement test. In Experiment 2, the same effect is observed to a cue which did not undergo retrieval before extinction (CS2+), a result that is interpreted as resulting from cue-independence, rather than a failure to replicate in a within-subjects design the observations of Experiment 1 (between-subjects). Although retrieval-induced forgetting is cue-independent (the effect on items that are suppressed [Rp-] can be observed with an independent probe), it is not clear that the current findings are similar, and thus that the strong parallels made are not warranted. Here, both cues have been extinguished and therefore been equally exposed during the critical stage.

      (3) The notion that suppression is automatic is speculative at best

      We have responded the same question in our previous revision. Please note that our results from study 1 (the comparison between reminder and no-reminder groups) was not set up to test the cue-independence hypothesis for the short-term amnesia with only one CS+. Results from both study 2 (30min condition) and study 3 confirmed the cue-independence hypothesis and therefore we believe interpreting results from study 2 as “a failure to replicate in a within-subject design of the observations of Experiment 1” is not the case.

      We agree that the proposal of automatic or unconscious memory suppression is speculative and that’s why we mentioned it in the discussion. The timescale, cue-specificity and the thought-control ability dependence of the short-term fear amnesia identified in our studies was reminiscent of the memory suppression effects reported in the previous literature. However, memory suppression typically adopted a conscious “suppression” treatment (such as the think/no-think paradigm), which was absent in the current study. However, the retrieval-induced forgetting (RIF), which is also considered a memory suppression paradigm via inhibitory control, does not require conscious effort to suppress any particular thought. Based on these results and extant literature, we raised the possibility of memory suppression as a potential mechanism. We make clear in the discussion that the suppression hypothesis and connections with RIF will require further evidence (lines 615-616):

      “future research will be needed to investigate whether the short-term effect we observed is specifically related to associative memory or the spontaneous nature of suppression as in RIF (Figure 6C).”

      (4) It still struggle with the parallels between these findings and the "limbo" literature. Here you manipulated the retention interval, whereas in the cited studies the number of extinction (exposure) was varied. These are two completely different phenomena.

      We borrowed the “limbo” term to stress the transitioning from short-term to long-term memory deficits (the 6hr test group). Merlo et al. (2014) found that memory reconsolidation and extinction were dissociable processes depending on the extent of memory retrieval. They argued that there was a “limbo” transitional state, where neither the reconsolidation nor the extinction process was engaged. Our results suggest that at the test delay of 6hr, neither the short-term nor the long-term effect was present, signaling a “transitional” state after which the short-term memory deficit wanes and the long-term deficit starts to take over. We make this idea more explicit as follows (lines 622-626):

      “These works identified important “boundary conditions” of memory retrieval in affecting the retention of the maladaptive emotional memories. In our study, however, we showed that even within a boundary condition previously thought to elicit memory reconsolidation, mnemonic processes other than reconsolidation could also be at work, and these processes jointly shape the persistence of fear memory.”

      (5) My point about the data problematic for the reconsolidation (and consolidation) frameworks is that they observed memory in the absence of the brain substrates that are needed for memory to be observed. The answer did not address this. I do not understand how the latent cause model can explain this, if the only difference is the first ITI. Wouldn't participants fail to integrate extinction with acquisition with a longer ITI?

      We take the sentence “they observed memory in the absence of the brain substrates that are needed for memory to be observed” as referring to the long-term memory deficit in our study. As we responded before, the aim of this manuscript was not about investigating the brain substrates involved in memory reconsolidation (or consolidation). Using a memory retrieval-extinction paradigm, we discovered a novel short-term memory effect, which differed from the purported reconsolidation effect in terms of timescale, cue-specificity and thought-control ability dependence. We further showed that both memory retrieval and intact dlPFC functions were necessary to observe the short-term memory deficit effect. Therefore, we conclude that the brain mechanism involved in such an effect should be different from the one related to the purported reconsolidation effect. We make this idea more explicit as follows (lines 546-547):

      “Therefore, findings of the short-term fear amnesia suggest that the reconsolidation framework falls short to accommodate this more immediate effect (Figure 6A and B).”

      Whilst I could access the data in the OFS site, I could not make sense of the Matlab files as there is no signposting indicating what data is being shown in the files. Thus, as it stands, there is no way of independently replicating the analyses reported.

      (6) The materials in the OSF site are the same as before, they haven't been updated.

      Last time we thought the main issue was the OSF site not being publicly accessible and thus made it open to all visitors. We have added descriptive file to explain the variables to help visitors to replicate the analyses we took.

      (7) Concerning supplementary materials, the robustness tests are intended to prove that you 1) can get the same results by varying the statistical models or 2) you can get the same results when you include all participants. Here authors have done both so this does not help. Also, in the rebuttal letter, they stated "Please note we did not include non-learners in these analyses " which contradicts what is stated in the figure captions "(learners + non learners)"

      In the supplementary materials, we did the analyses of varying the statistical models and including both learners and non-learners separately, instead of both. In fact, in the supplementary material Figs. 1 & 2, we included all the participants and performed similar analysis as in the main text and found similar results (learners + non-learners). Also, in the text of the supplementary material, we used a different statistical analysis method to only learners (analyzing subjects reported in the main text using a different method) and achieved similar results. We believe this is exactly what the reviewer suggested us to do. Also there seems to be a misunderstanding for the "Please note we did not include non-learners in these analyses" sentence in the rebuttal letter. As the reviewer can see, the full sentence read “Please note we did not include non-learners in these analyses (the texts of the supplementary materials)”. We meant to express that the Figures and texts in the supplementary material reflect two approaches: 1) Figures depicting re-analysis with all the included subjects (learners + non learners); 2) Text describing different analysis with learners. We added clarifications to emphasize these approaches in the supplementary materials.

      (8) Finally, the literature suggesting that reconsolidation interference "eliminates" a memory is not substantiated by data nor in line with current theorising, so I invite a revision of these strong claims.

      We agree and have toned down the strong claims.

      Overall, I conclude that the revised manuscript did not address my main concerns.

      In both rounds of responses, we tried our best to address the reviewer’s concerns. We hope that the clarifications in this letter and revisions in the text address the remaining concerns. Thank you for your feedback.

      Reference:

      Kindt, M. and Soeter, M. 2018. Pharmacologically induced amnesia for learned fear is time and sleep dependent. Nat Commun, 9, 1316.

      Liu, J., Zhao, L., Xue, Y., Shi, J., Suo, L., Luo, Y., Chai, B., Yang, C., Fang, Q., Zhang, Y., Bao, Y., Pickens, C. L. and Lu, L. 2014. An unconditioned stimulus retrieval extinction procedure to prevent the return of fear memory. Biol Psychiatry, 76, 895-901.

      Raio, C. M., Hartley, C. A., Orederu, T. A., Li, J. and Phelps, E. A. 2017. Stress attenuates the flexible updating of aversive value. Proc Natl Acad Sci U S A, 114, 11241-11246.

      Sevenster, D., Beckers, T., & Kindt, M. 2013. Prediction error governs pharmacologically induced amnesia for learned fear. Science (New York, N.Y.), 339(6121), 830–833.

    1. Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile, at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies<br /> (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies<br /> (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, doesn't Pt always increase with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? Unless this is completely linear, the effect won't be controlled by including trial number as a co-regressor (which was done).

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Editors’ note: Reviewer #2 was unavailable to re-review the manuscript. Reviewer #3 was added for this round of review to ensure two reviewers and because of their expertise in the computational and modelling aspects of the work.

    2. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment<br /> This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting distinct contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative task design, behavioral modeling, and model-based fMRI analyses provides a solid foundation for the conclusions; however, the neuroimaging results have several limitations, particularly a potential confound between the posterior probability of a switch and the passage of time that may not be fully controlled by including trial number as a regressor. The control experiments intended to address this issue also appear conceptually inconsistent and, at the behavioral level, while informing participants of conditional probabilities rather than requiring learning is theoretically elegant, such information is difficult to apply accurately, as shown by well-documented challenges with conditional reasoning and base-rate neglect. Expressing these probabilities as natural frequencies rather than percentages may have improved comprehension. Overall, the study advances understanding of belief updating under uncertainty but would benefit from more intuitive probabilistic framing and stronger control of temporal confounds in future work.

      We thank the editors for the assessment. The editor added several limitations based on the new reviewer 3 in this round, which we address below.

      With regard to temporal confounds, we clarified in the main text and response to Reviewer 3 that we had already addressed the potential confound between posterior probability of a switch and passage of time in GLM-2 with the inclusion of intertemporal prior. After adding intertemporal prior in the GLM, we still observed the same fMRI results on probability estimates. In addition, we did two other robustness checks, which we mentioned in the manuscript.

      With regard to response mode (probability estimation rather than choice or indicating natural frequencies), we wish to point out that the in previous research by Massey and Wu (2005), which the current study was based on, the concern of participants showing system-neglect tendencies due to the mode of information delivery, namely indicating beliefs through reporting probability estimates rather than through choice or other response mode was addressed. Massy and Wu (2005, Study 3) found the same biases when participants performed a choice task that did not require them to indicate probability estimates.

      With regard to the control experiments, the control experiments in fact were not intended to address the confounds between posterior probability and passage of time. Rather, they aimed to address whether the neural findings were unique to change detection (Experiment 2) and to address visual and motor confounds (Experiment 3). These and the results of the control experiments were mentioned on page 18-19.

      Finally, we wish to highlight that we had performed detailed model comparisons after reviewer 2’s suggestions. Although reviewer 2 was unable to re-review the manuscript, we believe this provides insight into the literature on change detection. See “Incorporating signal dependency into system-neglect model led to better models for regime-shift detection” (p.27-30). The model comparison showed that system-neglect models that incorporate signal dependency are better models than the original system-neglect model in describing participants probability estimates. This suggests that people respond to change-consistent and change-inconsistent signals differently when judging whether the regime had changed. This was not reported in previous behavioral studies and was largely inspired by the neural finding on signal dependency in the frontoparietal cortex. It indicates that neural findings can provide novel insights into computational modeling of behavior.           

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      We thank the reviewer for the comments.

      Weaknesses:

      The authors have adequately addressed most of my prior concerns.

      We thank the reviewer for recognizing our effort in addressing your concerns.

      My only remaining comment concerns the z-test of the correlations. I agree with the non-parametric test based on bootstrapping at the subject level, providing evidence for significant differences in correlations within the left IFG and IPS.

      However, the parametric test seems inadequate to me. The equation presented is described as the Fisher z-test, but the numerator uses the raw correlation coefficients (r) rather than the Fisher-transformed values (z). To my understanding, the subtraction should involve the Fisher z-scores, not the raw correlations.

      More importantly, the Fisher z-test in its standard form assumes that the correlations come from independent samples, as reflected in the denominator (which uses the n of each independent sample). However, in my opinion, the two correlations are not independent but computed within-subject. In such cases, parametric tests should take into account the dependency. I believe one appropriate method for the current case (correlated correlation coefficients sharing a variable [behavioral slope]) is explained here:

      Meng, X.-l., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111(1), 172-175. https://doi.org/10.1037/0033-2909.111.1.172

      It should be implemented here:

      Diedenhofen B, Musch J (2015) cocor: A Comprehensive Solution for the Statistical Comparison of Correlations. PLoS ONE 10(4): e0121945. https://doi.org/10.1371/journal.pone.0121945

      My recommendation is to verify whether my assumptions hold, and if so, perform a test that takes correlated correlations into account. Or, to focus exclusively on the non-parametric test.

      In any case, I recommend a short discussion of these findings and how the authors interpret that some of the differences in correlations are not significant.

      Thank you for the careful check. Yes. This was indeed a mistake from us. We also agree that the two correlations are not independent. Therefore, we modified the test that accounts for dependent correlations by following Meng et al. (1992) suggested by the reviewer.

      We referred to the correlation between neural and behavioral sensitivity at change-consistent (blue) signals as , and that at change-inconsistent (red) signals as 𝑟<sub>𝑟𝑒𝑑</sub>. To statistically compare these two correlations, we adopted the approach of Meng et al. (1992), which specifically tests differences between dependent correlations according to the following equation

      where  is the number of subjects, 𝑧<sub>𝑟𝑖</sub> is the Fisher z-transformed value of 𝑟<sub>𝑖</sub>, 𝑟<sub>1</sub> = 𝑟<sub>𝑏𝑙𝑢𝑒</sub> and 𝑟<sub>2</sub> = 𝑟<sub>𝑟𝑒𝑑</sub>. 𝑟<sub>𝑥</sub> is the correlation between the neural sensitivity at change-consistent signals and change-inconsistent signals.

      Where is the mean of the , and 𝑓 should be set to 1 if > 1.

      We found that among the five ROIs in the frontoparietal network, two of them, namely the left IFG and left IPS, the difference in correlation was significant (one-tailed z test; left IFG: 𝑧 = 1.8908, 𝑝 = 0.0293; left IPS: 𝑧 = 2.2584, 𝑝 = 0.0049). For the remaining three ROIs, the difference in correlation was not significant (dmPFC: 𝑧 = 0.9522, 𝑝 = 0.1705; right IFG: 𝑧 = 0.9860, 𝑝 = 0.1621; right IPS: 𝑧 = 1.4833, 𝑝 = 0.0690). We chose one-tailed test because we already know the correlation under the blue signals was significantly greater than 0. These updated results are consistent with the nonparametric tests we had already performed and we will update them in the revised manuscript.

      Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile, at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      We thank the reviewer for the overall descriptions of the manuscript.

      Strengths:

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies<br /> (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Thank you for these assessments.

      Weaknesses:

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      We appreciate the reviewer’s concern on this issue. The concern was addressed in Massey and Wu (2005) as participants performed a choice task in which they were not asked to provide probability estimates (Study 3 in Massy and Wu, 2005). Instead, participants in Study 3 were asked to predict the color of the ball before seeing a signal. This was a more intuitive way of indicating his or her belief about regime shift. The results from the choice task were identical to those found in the probability estimation task (Study 1 in Massey and Wu). We take this as evidence that the system-neglect behavior the participants showed was less likely to be due to the mode of information delivery.

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      We thank the reviewer for this comment. It is true that the system-neglect model is not entirely inconsistent with regression to the mean, regardless of whether the implementation has a hyper prior or not. In fact, our behavioral measure of sensitivity to transition probability and signal diagnosticity, which we termed the behavioral slope, is based on linear regression analysis. In general, the modeling approach in this paper is to start from a generative model that defines ideal performance and consider modifying the generative model when systematic deviations in actual performance from the ideal is observed. In this approach, a generative model with hyper-prior would be more complex to begin with, and a regression to the mean idea by itself does not generate a priori predictions.

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      Thank you for raising this point. The modeling principle we adopt is the following. We start from the normative model—the Bayesian model—that defined what normative behavior should look like. We compared participants’ behavior with the Bayesian model and found systematic deviations from it. To explain those systematic deviations, we considered modeling options within the confines of the same modeling framework. In other words, we considered a parameterized version of the Bayesian model, which is the system-neglect model and examined through model comparison the best modeling choice. This modeling approach is not uncommon, and many would agree this is the standard approach in economics and psychology. For example, Kahneman and Tversky adopted this approach when proposing prospect theory, a modification of expected utility theory where expected utility theory can be seen as one specific model for how utility of an option should be computed.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, doesn't Pt always increase with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? Unless this is completely linear, the effect won't be controlled by including trial number as a co-regressor (which was done).

      Thank you for raising this concern. Yes, Pt always increases with sample number regardless of evidence (seeing change-consistent or change-inconsistent signals). This is captured by the ‘intertemporal prior’ in the Bayesian model, which we included as a regressor in our GLM analysis (GLM-2), in addition to Pt. In short, GLM-1 had Pt and sample number. GLM-2 had Pt, intertemporal prior, and sample number, among other regressors. And we found that, in both GLM-1 and GLM-2, both vmPFC and ventral striatum correlated with Pt.

      To make this clearer, we updated the main text to further clarify this on p.18:

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. The purpose of Experiment 3 was to control for visual and motor confounds. In other words, if subjects saw the similar visual layout and were just instructed to press numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      The purpose of Experiment 2 was to establish whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about change detection. And we used Experiment 2 to examine whether this were true.

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We received different feedbacks from previous reviews on what to include in Discussion. To address the reviewer’s concern, we will revise the Discussion to better highlight the key contributions of the current study at the beginning of Discussion.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Many of the figures are too tiny - the writing is very small, as are the pictures of brains. I'd suggest adjusting these so they will be readable without enlarging.

      Thank you. We will enlarge the figures to make them more readable.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      (1) The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      Thank you for recognizing our contribution to the regime-change detection literature and our effort in discussing our findings in relation to the experience-based paradigms.

      (2) The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well.

      Thank you for recognizing the contribution of our Bayesian framework and systemneglect model.

      (3) The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Thank you for recognizing our execution of model-based fMRI analyses and effort in using those analyses to link with behavioral biases.

      Weaknesses:

      My major concern is about the correlational analysis in the section "Under- and overreactions are associated with selectivity and sensitivity of neural responses to system parameters", shown in Figures 5c and d (and similarly in Figure 6). The authors argue that a frontoparietal network selectively represents sensitivity to signal diagnosticity, while the vmPFC selectively represents transition probabilities. This claim is based on separate correlational analyses for red and blue across different brain areas. The authors interpret the finding of a significant correlation in one case (blue) and an insignificant correlation (red) as evidence of a difference in correlations (between blue and red) but don't test this directly. This has been referred to as the "interaction fallacy" (Niewenhuis et al., 2011; Makin & Orban de Xivry 2019). Not directly testing the difference in correlations (but only the differences to zero for each case) can lead to wrong conclusions. For example, in Figure 5c, the correlation for red is r = 0.32 (not significantly different from zero) and r = 0.48 (different from zero). However, the difference between the two is 0.1, and it is likely that this difference itself is not significant. From a statistical perspective, this corresponds to an interaction effect that has to be tested directly. It is my understanding that analyses in Figure 6 follow the same approach.

      Relevant literature on this point is:

      Nieuwenhuis, S, Forstmann, B & Wagenmakers, EJ (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nat Neurosci 14, 11051107. https://doi.org/10.1038/nn.2886

      Makin TR, Orban de Xivry, JJ (2019). Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife 8:e48175. https://doi.org/10.7554/eLife.48175

      There is also a blog post on simulation-based comparisons, which the authors could check out: https://garstats.wordpress.com/2017/03/01/comp2dcorr/

      I recommend that the authors carefully consider what approach works best for their purposes. It is sometimes recommended to directly compare correlations based on Monte-Carlo simulations (cf Makin & Orban). It might also be appropriate to run a regression with the dependent variable brain activity (Y) and predictors brain area (X) and the model-based term of interest (Z). In this case, they could include an interaction term in the model:

      Y = \beta_0 + \beta_1 \cdot X + \beta_2 \cdot Z + \beta_3 \cdot X \cdot Z

      The interaction term reflects if the relationship between the model term Z and brain activity Y is conditional on the brain area of interest X.

      Thank you for the suggestion. In response, we tested for the difference in correlation both parametrically and nonparametrically. The results were identical. In the parametric test, we used the Fisher z transformation to transform the difference in correlation coefficients to the z statistic. That is, for two correlation coefficients, 𝑟<sub>1</sub> (with sample size 𝑛<sub>1</sub>) and 𝑟<sub>2</sub>, (with sample size 𝑛<sub>2</sub>), the z statistic of the difference in correlation is given by

      We referred to the correlation between neural and behavioral sensitivity at change-consistent (blue) signals as 𝑟<sub>𝑏𝑙𝑢𝑒</sub>, and that at change-inconsistent (red) signals as 𝑟<sub>𝑟𝑒𝑑</sub>. For the Fisher z transformation 𝑟<sub>1</sub>= 𝑟<sub>𝑏𝑙𝑢𝑒</sub> and 𝑟<sub>2</sub> \= 𝑟<sub>𝑟𝑒𝑑</sub>. We found that among the five ROIs in the frontoparietal network, two of them, namely the left IFG and left IPS, the difference in correlation was significant (one-tailed z test; left IFG: 𝑧 = 1.8355, 𝑝 =0.0332; left IPS: 𝑧 = 2.3782, 𝑝 = 0.0087). For the remaining three ROIs, the difference in correlation was not significant (dmPFC: 𝑧 = 0.7594, 𝑝 = 0.2238; right IFG: 𝑧 = 0.9068, 𝑝 = 0.1822; right IPS: 𝑧 = 1.3764, 𝑝 = 0.0843). We chose one-tailed test because we already know the correlation under the blue signals was significantly greater than 0.

      In the nonparametric test, we performed nonparametric bootstrapping to test for the difference in correlation (Efron & Tibshirani, 1994). We resampled with replacement the dataset (subject-wise) and used the resampled dataset to compute the difference in correlation. We then repeated the above for 100,000 times so as to estimate the distribution of the difference in correlation coefficients, tested for significance and estimated p-value based on this distribution. Consistent with our parametric tests, here we also found that the difference in correlation was significant in left IFG and left IPS (left IFG: 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub> \= 0.46, 𝑝 = 0.0496; left IPS: 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub> \= 0.5306, 𝑝 = 0.0041), but was not significant in dmPFC, right IFG, and right IPS (dmPFC: 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub> \= 0.1634, 𝑝 = 0.1919; right IFG: 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub> \= 0.2123, 𝑝 = 0.1681; right IPS: 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub> \= 0.3434, 𝑝 = 0.0631).

      In summary, we found that neural sensitivity to signal diagnosticity in the frontoparietal network measured at change-consistent signals significantly correlated with individual subjects’ behavioral sensitivity to signal diagnosticity (𝑟<sub>𝑏𝑙𝑢𝑒</sub>). By contrast, neural sensitivity to signal diagnosticity measured at change-inconsistent did not significantly correlate with behavioral sensitivity (𝑟<sub>𝑟𝑒𝑑</sub>). The difference in correlation, 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub>, however, was statistically significant in some (left IPS and left IFG) but not all brain regions within the frontoparietal network.

      To incorporate these updates, we added descriptions of the methods and results in the revised manuscript. In the Results section (p.26-27):

      “We further tested, for each brain region, whether the difference in correlation was significant using both parametric and nonparametric tests (see Parametric and nonparametric tests for difference in correlation coefficients in Methods). The results were identical. In the parametric test, we used the Fisher 𝑧 transformation to transform the difference in correlation coefficients to the 𝑧 statistic. We found that among the five ROIs in the frontoparietal network, two of them, namely the left IFG and left IPS, the difference in correlation was significant (one-tailed z test; left IFG: 𝑧 = 1.8355, 𝑝 = 0.0332; left IPS: 𝑧 = 2.3782, 𝑝 = 0.0087). For the remaining three ROIs, the difference in correlation was not significant (dmPFC: 𝑧 = 0.7594, 𝑝 = 0.2238; right IFG: 𝑧 = 0.9068, 𝑝 = 0.1822; right IPS: 𝑧 = 1.3764, 𝑝 = 0.0843). We chose one-tailed test because we already know the correlation under change-consistent signals was significantly greater than 0. In the nonparametric test, we performed nonparametric bootstrapping to test for the difference in correlation. We referred to the correlation between neural and behavioral sensitivity at change-consistent (blue) signals as 𝑟<sub>𝑏𝑙𝑢𝑒</sub>, and that at change-inconsistent (red) signals as 𝑟<sub>𝑟𝑒𝑑</sub>. Consistent with the parametric tests, we also found that the difference in correlation was significant in left IFG and left IPS (left IFG: 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub> \= 0.46, 𝑝 = 0.0496; left IPS: 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub> \= 0.5306, 𝑝 = 0.0041), but was not significant in dmPFC, right IFG, and right IPS (dmPFC: 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub> \=0.1634, 𝑝 = 0.1919; right IFG: 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub> \= 0.2123, 𝑝 = 0.1681; right IPS: 𝑟<sub>𝑏𝑙𝑢𝑒</sub> − 𝑟<sub>𝑟𝑒𝑑</sub> \= 0.3434, 𝑝 = 0.0631). In summary, we found that neural sensitivity to signal diagnosticity measured at change-consistent signals significantly correlated with individual subjects’ behavioral sensitivity to signal diagnosticity. By contrast, neural sensitivity to signal diagnosticity measured at change-inconsistent signals did not significantly correlate with behavioral sensitivity. The difference in correlation, however, was statistically significant in some (left IPS and left IFG) but not all brain regions within the frontoparietal network.”

      In the Methods section, we added on p.53:

      “Parametric and nonparametric tests for difference in correlation coefficients. We implemented both parametric and nonparametric tests to examine whether the difference in Pearson correlation coefficients was significant. In the parametric test, we used the Fisher 𝑧 transformation to transform the difference in correlation coefficients to the 𝑧 statistic. That is, for two correlation coefficients, 𝑟<sub>1</sub> (with sample size 𝑛<sub>2</sub>) and 𝑟<sub>2</sub>, (with sample size 𝑛<sub>1</sub>), the 𝑧 statistic of the difference in correlation is given by

      We referred to the correlation between neural and behavioral sensitivity at changeconsistent (blue balls) signals as 𝑟<sub>𝑏𝑙𝑢𝑒</sub>, and that at change-inconsistent (red balls) signals as 𝑟<sub>𝑟𝑒𝑑</sub>. For the Fisher 𝑧 transformation, 𝑟<sub>1</sub> \= 𝑟 𝑟<sub>𝑏𝑙𝑢𝑒</sub> and 𝑟<sub>2</sub> \= 𝑟<sub>𝑟𝑒𝑑</sub>. In the nonparametric test, we performed nonparametric bootstrapping to test for the difference in correlation (Efron & Tibshirani, 1994). That is, we resampled with replacement the dataset (subject-wise) and used the resampled dataset to compute the difference in correlation. We then repeated the above for 100,000 times so as to estimate the distribution of the difference in correlation coefficients, tested for significance and estimated p-value based on this distribution.”

      Another potential concern is that some important details about the parameter estimation for the system-neglect model are missing. In the respective section in the methods, the authors mention a nonlinear regression using Matlab's "fitnlm" function, but it remains unclear how the model was parameterized exactly. In particular, what are the properties of this nonlinear function, and what are the assumptions about the subject's motor noise? I could imagine that by using the inbuild function, the assumption was that residuals are Gaussian and homoscedastic, but it is possible that the assumption of homoscedasticity is violated, and residuals are systematically larger around p=0.5 compared to p=0 and p=1. Relatedly, in the parameter recovery analyses, the authors assume different levels of motor noise. Are these values representative of empirical values?

      We thank the reviewer for this excellent point. The reviewer touched on model parameterization, assumption of noise, and parameter recovery analysis. We answered these questions point-by-point below.

      On how our model was parameterized

      We parameterized the model according to the system-neglect model in Eq. (2) and estimated the alpha parameter separately for each level of transition probability and the beta parameter separately for each level of signal diagnosticity. As a result, we had a total of 6 parameters (3 alpha and 3 beta parameters) in the model. The system-neglect model is then called by fitnlm so that these parameters can be estimated. The term ‘nonlinear’ regression in fitnlm refers to the fact that you can specify any model (in our case the system-neglect model) and estimate its parameters when calling this function. In our use of fitnlm, we assume that the noise is Gaussian and homoscedastic (the default option).

      On the assumptions about subject’s motor noise

      We actually never called the noise ‘motor’ because it can be estimation noise as well. In the context of fitnlm, we assume that the noise is Gaussian and homoscedastic.

      On the possibility that homoscedasticity is violated

      We take the reviewer’s point. In response, we separately estimated the residual standard deviation at different probability intervals ([0.0–0.2), [0.2–0.4), [0.4–0.6), [0.6– 0.8), and [0.8–1.0]). The result is shown in the figure below. The black data points are the average residual standard deviation (across subjects) and the error bars are the standard error of the mean. The residual standard deviation is indeed heteroscedastic— smallest at 0.1 probability and increasing as probability increases and asymptote at 0.5 (Fig. S4).

      To examine how this would affect model fitting (parameter estimation), we performed parameter recovery analysis based on these empirically estimated, probabilitydependent residual standard deviation. That is, we simulated subjects’ probability estimates using the system-neglect model and added the heteroscedastic noise according to the empirical values and then estimated the parameter estimates of the system-neglect model. The recovered parameter estimates did not seem to be affected by the heteroscedasticity of the variance. The parameter recovery results were identical to the parameter recovery results when homoscedasticity was assumed. This suggested that although homoscedasticity was violated, it did not affect the accuracy of the parameter estimates (Fig.S4).

      We added a section ‘Impact of noise homoscedasticity on parameter estimation’ in Methods section (p.47-48) and a figure in the supplement (Fig. S4) to describe this:

      On whether the noise levels in parameter recovery analysis are representative of empirical values

      To address the reviewer’s question, we conducted a new analysis using maximum likelihood estimation to simultaneously estimate the system-neglect model and the noise level of each individual subject. To estimate each subject’s noise level, we incorporated a noise parameter into the system-neglect model. We assumed that probability estimates are noisy and modeled them with a Gaussian distribution where the noise parameter (𝜎,-./&) is the standard deviation. At each period, a probability estimate of regime shift was computed according to the system-neglect model where Θ is the set of parameters including parameters in the system-neglect model and the noise parameter. The likelihood function, 𝐿(Θ), is the probability of observing the subject’s actual probability estimate at period 𝑡, 𝑝), given Θ, 𝐿(Θ) = 𝑃(𝑝)|Θ). Since we modeled the noisy probability estimates with a Gaussian distribution, we can therefore express 𝐿(Θ) as 𝐿(Θ)~𝑁(𝑝); 𝑝)*+, 𝜎,-./&) where 𝑝)*+ is the probability estimate predicted by the system-neglect (SN) model at period 𝑡. As a reminder, we referred to a ‘period’ as the time when a new signal appeared during a trial (for a given transition probability and signal diagnosticity). To find that maximum likelihood estimates of ΘMLE, we summed over all periods the negative natural logarithm of likelihood and used MATLAB’s fmincon function to find ΘMLE. Across subjects, we found that the mean noise estimate was 0.1735 and ranged from 0.1118 to 0.2704 (Supplementary Figure S3).”

      Compared with our original parameter recovery analysis where the maximum noise level was set at 0.1, our data indicated that some subjects’ noise was larger than this value. Therefore, we expanded our parameter recovery analysis to include noise levels beyond 0.1 to up to 0.3. The results are now updated in Supplementary Fig. S3.

      We updated the parameter recovery section (p. 47) in Methods:

      The main study is based on N=30 subjects, as are the two control studies. Since this work is about individual differences (in particular w.r.t. to neural representations of noise and transition probabilities in the frontoparietal network and the vmPFC), I'm wondering how robust the results are. Is it likely that the results would replicate with a larger number of subjects? Can the two control studies be leveraged to address this concern to some extent?

      We can address the issue of robustness through looking at the effect size. In particular, with respect to individual differences in neural sensitivity of transition probability and signal diagnosticity, since the significant correlation coefficients between neural and behavioral sensitivity were between 0.4 and 0.58 for signal diagnosticity in frontoparietal network (Fig. 5C), and -0.38 and -0.37 for transition probability in vmPFC (Fig. 5D), the effect size of these correlation coefficients was considered medium to large (Cohen, 1992).

      It would be challenging to use the control studies to address the robustness concern. The two control studies did not allow us to examine individual differences – in particular with respect to neural selectivity of noise and transition probability – and therefore we think it is less likely to leverage the control studies. Having said that, it is possible to look at neural selectivity of noise (signal diagnosticity) in the first control experiment where subjects estimated the probability of blue regime in a task where there was no regime change (transition probability was 0). However, the fact that there were no regime shifts changed the nature of the task. Instead of always starting at the Red regime in the main experiment, in the first control experiment we randomly picked the regime to draw the signals from. It also changed the meaning and the dynamics of the signals (red and blue) that would appear. In the main experiment the blue signal is a signal consistent with change, but in the control experiment this is no longer the case. In the main experiment, the frequency of blue signals is contingent upon both noise and transition probability. In general, blue signals are less frequent than red signals because of small transition probabilities. But in the first control experiment, the frequency of blue signals may not be less frequent because the regime was blue in half of the trials. Due to these differences, we do not see how analyzing the control experiments could help in establishing robustness because we do not have a good prediction as to whether and how the neural selectivity would be impacted by these differences.

      It seems that the authors have not counterbalanced the colors and that subjects always reported the probability of the blue regime. If so, I'm wondering why this was not counterbalanced.

      We are aware of the reviewer’s concern. The first reason we did not do these (color counterbalancing and report blue/red regime balancing) was to not confuse the subjects in an already complicated task. Balancing these two variables also comes at the cost of sample size, which was the second reason we did not do it. Although we can elect to do these balancing at the between-subject level to not impact the task complexity, we could have introduced another confound that is the individual differences in how people respond to these variables. This is the third reason we were hesitant to do these counterbalancing.

      Reviewer #2 (Public review):

      Summary:

      This paper focuses on understanding the behavioral and neural basis of regime shift detection, a common yet hard problem that people encounter in an uncertain world.

      Using a regime-shift task, the authors examined cognitive factors influencing belief updates by manipulating signal diagnosticity and environmental volatility. Behaviorally, they have found that people demonstrate both over and under-reaction to changes given different combinations of task parameters, which can be explained by a unified system-neglect account. Neurally, the authors have found that the vmPFC-striatum network represents current belief as well as belief revision unique to the regime detection task. Meanwhile, the frontoparietal network represents cognitive factors influencing regime detection i.e., the strength of the evidence in support of the regime shift and the intertemporal belief probability. The authors further link behavioral signatures of system neglect with neural signals and have found dissociable patterns, with the frontoparietal network representing sensitivity to signal diagnosticity when the observation is consistent with regime shift and vmPFC representing environmental volatility, respectively. Together, these results shed light on the neural basis of regime shift detection especially the neural correlates of bias in belief update that can be observed behaviorally.

      Strengths:

      (1) The regime-shift detection task offers a solid ground to examine regime-shift detection without the potential confounding impact of learning and reward. Relatedly, the system-neglect modeling framework provides a unified account for both over or under-reacting to environmental changes, allowing researchers to extract a single parameter reflecting people's sensitivity to changes in decision variables and making it desirable for neuroimaging analysis to locate corresponding neural signals.

      Thank you for recognizing our task design and our system-neglect computational framework in understanding change detection.

      (2) The analysis for locating brain regions related to belief revision is solid. Within the current task, the authors look for brain regions whose activation covary with both current belief and belief change. Furthermore, the authors have ruled out the possibility of representing mere current belief or motor signal by comparing the current study results with two other studies. This set of analyses is very convincing.

      Thank you for recognizing our control studies in ruling out potential motor confounds in our neural findings on belief revision.

      (3) The section on using neuroimaging findings (i.e., the frontoparietal network is sensitive to evidence that signals regime shift) to reveal nuances in behavioral data (i.e., belief revision is more sensitive to evidence consistent with change) is very intriguing. I like how the authors structure the flow of the results, offering this as an extra piece of behavioral findings instead of ad-hoc implanting that into the computational modeling.

      Thank you for appreciating how we showed that neural insights can lead to new behavioral findings.

      Weaknesses:

      (1) The authors have presented two sets of neuroimaging results, and it is unclear to me how to reason between these two sets of results, especially for the frontoparietal network. On one hand, the frontoparietal network represents belief revision but not variables influencing belief revision (i.e., signal diagnosticity and environmental volatility). On the other hand, when it comes to understanding individual differences in regime detection, the frontoparietal network is associated with sensitivity to change and consistent evidence strength. I understand that belief revision correlates with sensitivity to signals, but it can probably benefit from formally discussing and connecting these two sets of results in discussion. Relatedly, the whole section on behavioral vs. neural slope results was not sufficiently discussed and connected to the existing literature in the discussion section. For example, the authors could provide more context to reason through the finding that striatum (but not vmPFC) is not sensitive to volatility.

      We thank the reviewer for the valuable suggestions.

      With regard to the first comment, we wish to clarify that we did not find frontoparietal network to represent belief revision. It was the vmPFC and ventral striatum that we found to represent belief revision (delta Pt in Fig. 3). For the frontoparietal network, we identified its involvement in our task through finding that its activity correlated with strength of change evidence (Fig. 4) and individual subjects’ sensitivity to signal diagnosticity (Fig. 5). Conceptually, these two findings reflect how individuals interpret the signals (signals consistent or inconsistent with change) in light of signal diagnosticity. This is because (1) strength of change evidence is defined as signals (+1 for signal consistent with change, and -1 for signal inconsistent with change) multiplied by signal diagnosticity and (2) sensitivity to signal diagnosticity reflects how individuals subjectively evaluate signal diagnosticity. At the theoretical level, these two findings can be interpreted through our computational framework in that both the strength of change evidence and sensitivity to signal diagnosticity contribute to estimating the likelihood of change (Eqs. 1 and 2). We added a paragraph in Discussion to talk about this.

      We added on p. 36:

      “For the frontoparietal network, we identified its involvement in our task through finding that its activity correlated with strength of change evidence (Fig. 4) and individual subjects’ sensitivity to signal diagnosticity (Fig. 5). Conceptually, these two findings reflect how individuals interpret the signals (signals consistent or inconsistent with change) in light of signal diagnosticity. This is because (1) strength of change evidence is defined as signals (+1 for signal consistent with change, and −1 for signal inconsistent with change) multiplied by signal diagnosticity and (2) sensitivity to signal diagnosticity reflects how individuals subjectively evaluate signal diagnosticity. At the theoretical level, these two findings can be interpreted through our computational framework in that both the strength of change evidence and sensitivity to signal diagnosticity contribute to estimating the likelihood of change (Equations 1 and 2 in Methods).”

      With regard to the second comment, we added a discussion on the behavioral and neural slope comparison. We pointed out previous papers conducting similar analysis (Vilares et al., 2011; Ting et al., 2015; Yang & Wu, 2020), their findings and how they relate to our results. Vilares et al. found that sensitivity to prior information (uncertainty in prior distribution) in the orbitofrontal cortex (OFC) and putamen correlated with behavioral measure of sensitivity to prior. In the current study, transition probability acts as prior in the system-neglect framework (Eq. 1) and we found that ventromedial prefrontal cortex represents subjects’ sensitivity to transition probability. Together, these results suggest that OFC (with vmPFC being part of OFC, see Wallis, 2011) is involved in the subjective evaluation of prior information in both static (Vilares et al., 2011) and dynamic environments (current study).

      We added on p. 37-38:

      “In the current study, our psychometric-neurometric analysis focused on comparing behavioral sensitivity with neural sensitivity to the system parameters (transition probability and signal diagnosticity). We measured sensitivity by estimating the slope of behavioral data (behavioral slope) and neural data (neural slope) in response to the system parameters. Previous studies had adopted a similar approach (Ting et al., 2015a; Vilares et al., 2012; Yang & Wu, 2020). For example, Vilares et al. (2012) found that sensitivity to prior information (uncertainty in prior distribution) in the orbitofrontal cortex (OFC) and putamen correlated with behavioral measure of sensitivity to the prior.

      In the current study, transition probability acts as prior in the system-neglect framework (Eq. 2 in Methods) and we found that ventromedial prefrontal cortex represents subjects’ sensitivity to transition probability. Together, these results suggest that OFC (with vmPFC being part of OFC, see Wallis, 2011) is involved in the subjective evaluation of prior information in both static (Vilares et al., 2012) and dynamic environments (current study). In addition, distinct from vmPFC in representing sensitivity to transition probability or prior, we found through the behavioral-neural slope comparison that the frontoparietal network represents how sensitive individual decision makers are to the diagnosticity of signals in revealing the true state (regime) of the environment.”

      (2) More details are needed for behavioral modeling under the system-neglect framework, particularly results on model comparison. I understand that this model has been validated in previous publications, but it is unclear to me whether it provides a superior model fit in the current dataset compared to other models (e.g., a model without \alpha or \beta). Relatedly, I wonder whether the final result section can be incorporated into modeling as well - i.e., the authors could test a variant of the model with two \betas depending on whether the observation is consistent with a regime shift and conduct model comparison.

      Thank you for the great suggestion. We rewrote the final Results section to specifically focus on model comparison. To address the reviewer’s suggestion (separately estimate beta parameters for change-consistent and change-inconsistent signals), we indeed found that these models were better than the original system-neglect model.

      To incorporate these new findings, we rewrote the entire final result section “Incorporating signal dependency into system-neglect model led to better models for regime-shift detection “(p.28-30).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Use line numbers for the next round of reviews.

      We added line numbers in the revised manuscript.

      (2) Figure 2b: Can the empirical results be reproduced by the system-neglect model? This would complement the analyses presented in Figure S4.

      Yes. We now add Figure S6 based on system-neglect model fits. For each subject, we first computed period-by-period probability estimates based on the parameter estimates of the system-neglect model. Second, we computed index of overreaction (IO) for each combination of transition probability and signal diagnosticity. Third, we plot the IO like we did using empirical results in Fig. 2b. We found that the empirical results in Fig. 2b are similar to the system-neglect model shown in Figure S6, indicating that the empirical results can be reproduced by the model.

      (3) Page 14: Instead of referring to the "Methods" in general, you could be more specific about where the relevant information can be found.

      Fixed. We changed “See Methods” to “See System-neglect model in Methods”.

      (4) Page 18: Consider avoiding the term "more significantly". Consider effect sizes if interested in comparing effects to each other.

      Fixed. On page 19, we changed that to

      “In the second analysis, we found that for both vmPFC and ventral striatum, the regression coefficient of 𝑃) was significantly different between Experiment 1 and Experiment 2 (Fig. 3C) and between Experiment 1 and Experiment 3 (Fig. 3D; also see Tables S5 and S6 in SI).”

      (5) Page 30: Cite key studies using reversal-learning paradigms. Currently, readers less familiar with the literature might have difficulties with this.

      We now cite key studies using reversal-learning paradigms on p.32:

      “Our work is closely related to the reversal-learning paradigm—the standard paradigm in neuroscience and psychology to study change detection (Fellows & Farah, 2003; Izquierdo et al., 2017; O'Doherty et al., 2001; Schoenbaum et al., 2000; Walton et al., 2010). In a typical reversal-learning task, human or animal subjects choose between two options that differ in the reward magnitude or probability of receiving a reward. Through reward feedback the participants gradually learn the reward contingencies associated with the options and have to update knowledge about reward contingencies when contingencies are switched in order to maximize rewards.”

      Reviewer #2 (Recommendations for the authors):

      (1) Some literature on change detection seems missing. For example, the author should also cite Muller, T. H., Mars, R. B., Behrens, T. E., & O'Reilly, J. X. (2019). Control of entropy in neural models of environmental state. elife, 8, e39404. This paper suggests that medial PFC is correlated with the entropy of the current state, which is closely related to regime change and environmental volatility.

      Thank you for pointing to this paper. We have now added it and other related papers in the Introduction and Discussion.

      In Introduction, we added on p.5-6:

      “Different behavioral paradigms, most notably reversal learning, and computational models were developed to investigate its neurocomputational substrates (Behrens et al., 2007; Izquierdo et al., 2017; Payzan-LeNestour et al., 2011, 2013; Nasser et al., 2010; McGuire et al., 2014; Muller et al., 2019). Key findings on the neural implementations for such learning include identifying brain areas and networks that track volatility in the environment (rate of change) (Behrens et al., 2007), the uncertainty or entropy of the current state of the environment (Muller et al., 2019), participants’ beliefs about change (Payzan-LeNestour et al., 2011; McGuire et al., 2014; Kao et al., 2020), and their uncertainty about whether a change had occurred (McGuire et al., 2014; Kao et al., 2020).”

      In Discussion (p.35), we added a new paragraph:

      “Related to OFC function in decision making and reinforcement learning, Wilson et al. (2014) proposed that OFC is involved in inferring the current state of the environment. For example, medial OFC had been shown to represent probability distribution on possible states of the environment (Chan et al., 2016), the current task state (Schuck et al., 2016) and uncertainty or entropy associated with the state of the environment (Muller et al., 2019). In the context of regime-shift detection, regimes can be regarded as states of the environment and therefore a change in regime indicates a change in the state of the environment. Muller et al. (2019) found that in dynamic environments where changes in the state of the environment happen regularly, medial OFC represented the level of uncertainty in the current state of the environment. Our finding that vmPFC represented individual participants’ probability estimates of regime shifts suggest that vmPFC and/or OFC are involved in inferring the current state of the environment through estimating whether the state has changed. Our finding that vmPFC represented individual participants’ sensitivity to transition probability further suggest that vmPFC and/or OFC contribute to individual participants’ biases in state inference (over- and underreactions to change) in how these brain areas respond to the volatility of the environment.”

      (2) The language used when describing the selective relationship between frontoparietal network activation and change-consistent signal can be clearer. When describing separating those two signals, the authors refer to them as when the 'blue' signal shows up and when the 'red' signal shows up, assuming that the current belief state is blue. This is a little confusing cuz it is hard to keep in mind what is the default color in this example. It would be more intuitive if the author used language such as the 'change consistent' signal.

      Thank you for the suggestion. We have changed the wording according to your suggestion. That is, we say ‘change-consistent (blue) signals’ and ‘change-inconsistent (red) signals’ throughout pages 22-28.

      (3) Figure 4B highlights dmPFC. However, in the associated text, it says p = .10 so it is not significant. To avoid misleading readers, I would recommend pointing this out explicitly beyond saying 'most brain regions in the frontoparietal network also correlated with the intertemporal prior'.

      Thank you for pointing this out. We now say on p.20

      “With independent (leave-one-subject-out, LOSO) ROI analysis, we examined whether brain regions in the frontoparietal network (shown to represent strength of change evidence) correlated with intertemporal prior and found that all brain regions, with the exception of dmPFC, in the frontoparietal network correlated with the intertemporal prior.”

      (4) There is a full paragraph in the discussion talking about the central opercular cortex, but this terminology has not shown up in the main body of the paper. If this is an important brain region to the authors, I would recommend mentioning it more often in the result section.

      Thank you for this suggestion. We have now added central opercular cortex in the Results section (p.18):

      “For 𝑃<sub>𝑡</sub>, we found that the ventromedial prefrontal cortex (vmPFC) and ventral striatum correlated with this behavioral measure of subjects’ belief about change. In addition, many other brain regions, including the motor cortex, central opercular cortex, insula, occipital cortex, and the cerebellum also significantly correlated with 𝑃<sub>𝑡</sub>.”

      (5) The authors have claimed that people make more extreme estimates under high diagnosticity (Supplementary Figure 1). This is an interesting point because it seems to be different from what is shown in the main graph where it seems that people are not extreme enough compared to an ideal Bayesian observer. I understand that these are effects being investigated under different circumstances. It would be helpful if for Supplementary Figure 1 the authors could overlay, or generate a different figure showing what an ideal Bayesian observer would do in this situation.

      We thank the reviewer for pointing this out. We wish to clarify that when we said “more extreme estimates under high diagnosticity” we meant compared with low diagnosticity and not with the ideal Bayesian observer. We clarified this point by rephrasing our sentence on p.11:

      “We also found that subjects tended to give more extreme Pt under high signal diagnosticity than low diagnosticity (Fig. S1 in Supplementary Information, SI).”

      When it comes to comparing subjects’ probability estimates with the normative Bayesian, subjects tended to “underreact” under high diagnosticity. This can be seen in Fig. 4B, which shows a trend of increasing underreaction (or decreasing overreaction) as diagnosticity increased (row-wise comparison for a given transition probability).

      We see the reviewer’s point in overlaying the Bayesian on Fig. S1 and update it by adding the normative Bayesian in orange.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Silbaugh, Koster, and Hansel investigated how the cerebellar climbing fiber (CF) signals influence neuronal activity and plasticity in mouse primary somatosensory (S1) cortex. They found that optogenetic activation of CFs in the cerebellum modulates responses of cortical neurons to whisker stimulation in a cell-type-specific manner and suppresses potentiation of layer 2/3 pyramidal neurons induced by repeated whisker stimulation. This suppression of plasticity by CF activation is mediated through modulation of VIP- and SST-positive interneurons. Using transsynaptic tracing and chemogenetic approaches, the authors identified a pathway from the cerebellum through the zona incerta and the thalamic posterior medial (POm) nucleus to the S1 cortex, which underlies this functional modulation.

      Strengths:

      This study employed a combination of modern neuroscientific techniques, including two-photon imaging, opto- and chemo-genetic approaches, and transsynaptic tracing. The experiments were thoroughly conducted, and the results were clearly and systematically described. The interplay between the cerebellum and other brain regions - and its functional implications - is one of the major topics in this field. This study provides solid evidence for an instructive role of the cerebellum in experience-dependent plasticity in the S1 cortex.

      Weaknesses:

      There may be some methodological limitations, and the physiological relevance of the CFinduced plasticity modulation in the S1 cortex remains unclear. In particular, it has not been elucidated how CF activity influences the firing patterns of downstream neurons along the pathway to the S1 cortex during stimulation.

      Our study addresses the important question of whether CF signaling can influence the activity and plasticity of neurons outside the olivocerebellar system, and further identifies the mechanism through which this indeed occurs. We provide a detailed description of the involvement of specific neuron subtypes and how they are modulated by climbing fiber activation to impact S1 plasticity. We also identify at least one critical pathway from the cerebellar output to the S1 circuit. It is indeed correct that we did not investigate how the specific firing patterns of all of these downstream neurons are affected, or the natural behaviors in which this mechanism is involved. Now that it is established that CF signaling can impact activity and plasticity outside the olivocerebellar system -- and even in the primary somatosensory cortex -- these questions will be important to further investigate in future studies.

      (1) Optogenetic stimulation may have activated a large population of CFs synchronously, potentially leading to strong suppression followed by massive activation in numerous cerebellar nuclear (CN) neurons. Given that there is no quantitative estimation of the stimulated area or number of activated CFs, observed effects are difficult to interpret directly. The authors should at least provide the basic stimulation parameters (coordinates of stim location, power density, spot size, estimated number of Purkinje cells included, etc.).

      As discussed in the paper, we indeed expect that synchronous CF activation is needed to allow for an effect on S1 circuits under natural or optogenetic activation conditions. The basic optogenetic stimulation parameters (also stated in the methods) are as follows: 470 nm LED; Ø200 µm core, 0.39 NA rotary joint patch cable; absolute power output of 2.5 mW; spot size at the surface of the cortex 0.6 mm; estimated power density 8 mW/mm2. A serious estimate of the number of Purkinje cells that are activated is difficult to provide, in particular as ‘activation’ would refer to climbing fiber inputs, not Purkinje cells directly.

      (2) There are CF collaterals directly innervating CN (PMID:10982464). Therefore, antidromic spikes induced by optogenetic stimulation may directly activate CN neurons. On the other hand, a previous study reported that CN neurons exhibit only weak responses to CF collateral inputs (PMID: 27047344). The authors should discuss these possibilities and the potential influence of CF collaterals on the interpretation of the results.

      A direct activation of CN neurons by antidromic spikes in CF collaterals cannot be ruled out. However, we believe that this effect will not be substantial. The activation of the multi-synaptic pathway that we describe in this study is more likely to require a strong nudge as resulting from synchronized Purkinje cell input and subsequent rebound activation in CN neurons (PMID: 22198670), rather than small-amplitude input provided by CF collaterals (PMID: 27047344). A requirement for CF/PC synchronization would also set a threshold for activation of this suppressive pathway.

      (3) The rationale behind the plasticity induction protocol for RWS+CF (50 ms light pulses at 1 Hz during 5 min of RWS, with a 45 ms delay relative to the onset of whisker stimulation) is unclear.

      a) The authors state that 1 Hz was chosen to match the spontaneous CF firing rate (line 107); however, they also introduced a delay to mimic the CF response to whisker stimulation (line 108). This is confusing, and requires further clarification, specifically, whether the protocol was designed to reproduce spontaneous or sensory-evoked CF activity.

      This protocol was designed to mimic sensory-evoked CF activity as reported in Bosman et al (J. Physiol. 588, 2010; PMID: 20724365).

      b) Was the timing of delivering light pulses constant or random? Given the stochastic nature of CF firing, randomly timed light pulses with an average rate of 1Hz would be more physiologically relevant. At the very least, the authors should provide a clear explanation of how the stimulation timing was implemented.

      Light pulses were delivered at a constant 1 Hz. Our goal was to isolate synchrony as the variable distinguishing sensory-evoked from spontaneous CF activity; additionally varying stochasticity, rate, or amplitude would have confounded this. Future studies could explore how these additional parameters shape S1 responses.

      (4) CF activation modulates inhibitory interneurons in the S1 cortex (Figure 2): responses of interneurons in S1 to whisker stimulation were enhanced upon CF coactivation (Figure 2C), and these neurons were predominantly SST- and PV-positive interneurons (Figure 2H, I). In contrast, VIP-positive neurons were suppressed only in the late time window of 650-850 ms (Figure 2G). If the authors' hypothesis-that the activity of VIP neurons regulates SST- and PVneuron activity during RWS+CF-is correct, then the activity of SST- and PV-neurons should also be increased during this late time window. The authors should clarify whether such temporal dynamics were observed or could be inferred from their data.

      Yes, we see a significant activity increase in PV neurons in this late time window (see updates to Data S2). Activity was also increased in SST neurons, though this did not reach statistical significance (Data S2). One reason might be that – given the small effect size overall – such an effect would only be seen in paired recordings. Chemogenetic activity modulation in VIP neurons, which provides a more crude test, shows, however, that SST- and PV-positive interneurons are indeed regulated via inhibition from VIP-positive interneurons (Fig. 5).

      (5) Transsynaptic tracing from CN nicely identified zona incerta (ZI) neurons and their axon terminals in both POm and S1 (Figure 6 and Figure S7).

      a) Which part of the CN (medial, interposed, or lateral) is involved in this pathway is unclear.

      We used a dual-injection transsynaptic tracing approach to specifically label the outputs of ZI neurons that receive input from the deep cerebellar nuclei. The anterograde viral vector injected into the CN is unlabeled (no fluorophore) and therefore, it is not possible to reliably assess the extent of viral spread in those experiments as performed. However, we have previously performed similar injections into the deep cerebellar nuclei and post hoc histology suggest all three nuclei will have at least some viral expression (Koster and Sherman, 2024). Due to size and injection location, we will mostly have reached the lateral (dentate) nuclei, but cannot exclude partial transsynaptic tracing from the interposed and medial nuclei.  

      b) Were the electrophysiological properties of these ZI neurons consistent with those of PV neurons?

      Although most recorded cells demonstrated electrophysiological properties consistent with PV+ interneurons in other brain regions (i.e. fast spiking, narrow spike width, non-adapting; see Tremblay et al., 2016), interneuron subtypes in the ZI have been incompletely characterized, with SST+ cells showing similar features to those typically associated with PV+ cells (if interested, compare Fig. 4 in DOI: 10.1126/sciadv.abf6709 vs. Fig. S10 in https://doi.org/10.1016/j.neuron.2020.04.027). Therefore, we did not attempt to delineate cell identity based on these characteristics.

      c) There appears to be a considerable number of axons of these ZI neurons projecting to the S1 cortex (Figure S7C). Would it be possible to estimate the relative density of axons projecting to the POm versus those projecting to S1? In addition, the authors should discuss the potential functional role of this direct pathway from the ZI to the S1 cortex.

      An absolute quantification is difficult to provide based on the images that we obtained. However, any crude estimate would indicate the relative density of projections to POm is higher than the density of projections to S1 (this is apparent from the images themselves). While the anatomical and functional connections from POm to S1 have been described in detail (Audette et al., 2018), this is not the case for the direct projections to ZI. A direct ZI to S1 projection would potentially involve a different recruitment of neurons in the S1 circuit. Any discussion on the specific consequences of the activation of this direct pathway would be purely speculative.

      Reviewer #2 (Public review):

      Summary:

      The authors examined long-distance influence of climbing fiber (CF) signaling in the somatosensory cortex by manipulating whiskers through stimulation. Also, they examined CF signaling using two-photon imaging and mapped projections from the cerebellum to the somatosensory cortex using transsynaptic tracing. As a final manipulation, they used chemogenetics to perturb parvalbumin-positive neurons in the zona incerta and recorded from climbing fibers.

      Strengths:

      There are several strengths to this paper. The recordings were carefully performed, and AAVs used were selective and specific for the cell types and pathways being analyzed. In addition, the authors used multiple approaches that support climbing fiber pathways to distal regions of the brain. This work will impact the field and describes nice methods to target difficult-to-reach brain regions, such as the inferior olive.

      Weaknesses:

      There are some details in the methods that could be explained further. The discussion was very short and could connect the findings in a broader way.

      In the revised manuscript, we provide more methodological details, as requested. We provided as simple as possible explanations in the discussion, so as not to bias further investigations into this novel phenomenon. In particular, we avoid an extended discussion of the gating effect of CF activity on S1 plasticity. While this is the effect on plasticity specifically observed here, we believe that the consequences of CF signaling on S1 activity may entirely depend on the contexts in which CF signals are naturally recruited, the ongoing activity of other brain regions, and behavioral state. Our key finding is that such modulation of neocortical plasticity can occur. How CF signaling controls plasticity of the neocortex in all contexts remains unknown, but needs to be thoughtfully tested in the future.

      Reviewer #3 (Public review):

      Summary:

      The authors developed an interesting novel paradigm to probe the effects of cerebellar climbing fiber activation on short-term adaptation of somatosensory neocortical activity during repetitive whisker stimulation. Normally, RWS potentiated whisker responses in pyramidal cells and weakly suppressed them in interneurons, lasting for at least 1h. Crusii Optogenetic climbing fiber activation during RWS reduced or inverted these adaptive changes. This effect was generally mimicked or blocked with chemogenetic SST or VIP activation/suppression as predicted based on their "sign" in the circuit.

      Strengths:

      The central finding about CF modulation of S1 response adaptation is interesting, important, and convincing, and provides a jumping-off point for the field to start to think carefully about cerebellar modulation of neocortical plasticity.

      Weaknesses:

      The SST and VIP results appeared slightly weaker statistically, but I do not personally think this detracts from the importance of the initial finding (if there are multiple underlying mechanisms, modulating one may reproduce only a fraction of the effect size). I found the suggestion that zona incerta may be responsible for the cerebellar effects on S1 to be a more speculative result (it is not so easy with existing technology to effectively modulate this type of polysynaptic pathway), but this may be an interesting topic for the authors to follow up on in more detail in the future.

      Our interpretation of the anatomical and physiological findings is that a pathway via the ZI is indeed critical for the observed effects. This pathway also represents perhaps the most direct pathway (i.e. least number of synapses connecting the cerebellar nuclei to S1). However, several other direct and indirect pathways are plausible as well and we expect distinct activation requirements and consequences for neurons in the S1 circuit. These are indeed interesting topics for future investigation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 77: "CF transients" is not a standard or widely recognized term. Please use a more precise expression, such as "CF-induced calcium transients."

      We now avoid the use of the term “CF transients” and replaced it with “CF-induced calcium transients.”

      (2) Titer of AAVs injected should be provided.

      AAV titers have been included in an additional data table (Data S9).

      (3) Several citations to the figures are incorrect (for example, "Supplementary Data 2a (Line 398)" does not exist).

      We apologize for the mistakes in this version of the article. Incorrect citations to the figures have been corrected.

      (4) Line 627-628: "The tip of the patch cable was centered over Crus II in all optogenetic stimulation experiments." The stereotaxic coordinate of the tip position should be provided.

      The stereotaxic coordinate of the tip position has been provided in the methods.

      (5) Line 629: "Blue light pulses were delivered with a 470 nm Fiber-Coupled LED (Thorlabs catalog: M470F3)." The size of the light stim and estimated power density (W/mm^2) at the surface of the cortex should be provided.

      The spot size and estimated power density at the surface of the cortex has been provided in the methods.

      (6) Line 702-706: References for DCZ should be cited.

      We now cited Nagai et al, Nat. Neurosci. 23 (2020) as the original reference.

      (7) Two-photon image processing (Line 807-809): The rationale for normalizing ∆F/F traces to a pre-stimulus baseline is unclear because ∆F/F is, by definition, already normalized to baseline fluorescence: (Ft-F0)/F0. The authors should clarify why this additional normalization step was necessary and how it affected the interpretation of the data.

      A single baseline fluorescence value (F₀) was computed for each neuron across the entire recording session, which lasted ~120-minutes. However, some S1 neurons exhibit fluctuations in baseline fluorescence over time—often related to locomotive activity or spontaneous network oscillations—which can obscure stimulus-evoked changes. To isolate fluorescence changes specifically attributable to whisker stimulation, we normalized each ∆F/F trace to the prestimulus baseline for that trial. This additional normalization allowed us to quantify potentiation or depression of sensory responses themselves, independently of spontaneous oscillations or locomotion-related changes in the ongoing neural activity.

      Reviewer #2 (Recommendations for the authors):

      (1) Did the climbing fiber stimulation for Figure 1 result in any changes to motor activity? Can you make any additional comments on other behaviors that were observed during these manipulations?

      Acute CF stimulation did not cause any changes in locomotive or whisking activity. The CF stimulation also did not influence the overall level of locomotion or whisking during plasticity induction.

      (2) Figure 3B and F- it is very difficult to see the SST+ neurons. Can this be enhanced?

      We linearly adjusted the brightness and contrast for the bottom images in Figure 3B and F to improve visualization of SST+ neurons. Note the expression of both hM3D(Gq) and hM4D(Gi) in SST+ neurons is sparse, which was necessary to avoid off-target effects.

      (3) Can you be more specific about the subregions of cerebellar nuclei and cell types that are targeted in the tracing studies? Discussions of the cerebellar nuclei subregions are missing and would be interesting, as others have shown discrete pathways between cerebellar nuclei subregions and long-distance projections.

      See our response to comment 5a from Reviewer 1 (copied again here): we used a dual-injection transsynaptic tracing approach to specifically label the outputs of ZI neurons that receive input from the deep cerebellar nuclei. The anterograde viral vector injected into the CN is unlabeled (no fluorophone) and therefore, it is not possible to reliably assess the extent of viral spread in those experiments as performed. However, we have previously performed similar injections into the deep cerebellar nuclei and post hoc histology suggest all three nuclei will have at least some viral expression (Koster and Sherman, 2024). Due to size and injection location, we will mostly have reached the lateral (dentate) nuclei, but cannot exclude partial transsynaptic tracing from the interposed and medial nuclei.  

      It would indeed be interesting to further investigate the effect of CFs residing in different cerebellar lobules, which preferentially target different cerebellar nuclei, on targets of these nuclei.

      (4) Did you see any connection to the ventral tegmental area? Can you comment on whether dopamine pathways are influenced by CF and in your manipulations?

      We did not specifically look at these pathways and thus are not able to comment on this.

      (5) These are intensive surgeries, do you think glia could have influenced any results?

      This was not tested and seems unlikely, but we cannot exclude such possibility.

      (6) It is unclear in the methods how long animals were recorded for in each experiment. Can you add more detail?

      Additional detail was added to the methods. Recordings for all experimental configurations did not last more than 120 minutes in total. All data were analyzed across identical time windows for each experiment.

      (7) In the methods it was mentioned that recording length can differ between animals. Can this influence the results, and if so, how was that controlled for?

      There was a variance in recording length within experimental groups, but no systematic difference between groups.

      (8) I do not see any mention of animal sex throughout this manuscript. If animals were mixed groups, were sex differences considered? Would it be expected that CF activity would be different in male and female mice?

      As mentioned in the Methods (Animals), mice of either sex were used. No sex-dependent differences were observed.

      (9) Transsynaptic tracing results of the zona incerta are very interesting. The zona incerta is highly understudied, but has been linked to feeding, locomotion, arousal, and novelty seeking. Do you think this pathway would explain some of the behavioral results found through other studies of cerebellar lobule perturbations? Some discussion of how this brain region would be important as a cerebellar connection in animal behavior would be interesting.

      Since the multi-synaptic pathway from the cerebellum to S1 involves several brain regions with their own inputs and modulatory influences, it seems plausible to assume that behaviors controlled by these regions or affecting signaling pathways that regulate them would show some level of interaction. Our study does not address these interactions, but this will be an interesting question to be addressed in future work.

      Reviewer #3 (Recommendations for the authors):

      General comments on the data presentation:

      I'm not a huge fan of taking areas under curves ('AUC' throughout the study) when the integral of the quantity has no physical meaning - 'normalizing' the AUC (1I,L etc) is even stranger, because of course if you instead normalize the AUC by the # of data points, you literally just get the mean (which is probably what should be used instead).

      Indeed, AUC is equal to the average response in the time window used, multiplied by the window duration (thus, AUC is directly proportional to the mean). We choose to report AUC, a descriptive statistic, rather than the mean within this window. In 1I and L, we normalize the AUC across animals, essentially removing the variability across animals in the ‘Pre’ condition for visualization. Note the significance of these comparisons are consistent whether or not we normalize to the ‘Pre’ condition (non-normalized RWS data in I shows a significant increase in PN activity, p = 0.0068, signrank test; non-normalized RWS+CF data in I shows a significant decrease in PN activity, p = 0.0135, paired t-test; non-normalized RWS data in L shows a significant decrease in IN activity, p <0.001, paired t-test; non-normalized RWS+CF data in L shows no significant change in IN activity, p = 0.7789, paired t-test).

      I think unadorned bar charts are generally excluded from most journals now. Consider replacing these with something that shows the raw datapoints if not too many, or the distribution across points.

      We have replaced bar charts with box plots and violin plots. We have avoided plotting individual data points due to the quantity of points.

      In various places, the statistics produce various questionable outcomes that will draw unwanted reader scrutiny. Many of the examples below involve tiny differences in means with overlapping error bars that are "significant" or a few cases of nonoverlapping error bars that are "not significant." I think replacing the bar charts may help to resolve things here if we can see the whole distribution or the raw data points. As importantly, I think a big problem is that the statistical tests all seem to be nonparametric (they are ambiguously described in Table S3 as "Wilcoxon," which should be clarified, since there is an unpaired Wilcoxon test [rank sum] and a paired Wilcoxon test [sign rank]), and thus based on differences in the *median* whereas the bar charts are based on the *mean* (and SEM rather than MAD or IQR or other medianappropriate measure of spread). This should be fixed (either change the test or change the plots), which will hopefully allay many of the items below.

      We thank the reviewer for this important point. As mentioned in the Statistics and quantification section, Wilcoxon signed rank tests were used for non-normal data. We have replaced the bar charts with box plots which show the IQR and median, which indeed allays may of the items below.

      Here are some specific points on the statistics presentation:

      (1) 1G, the test says that following RWS+CF, the decrease in PN response is not significant. In 1I, the same data, but now over time, shows a highly significant decrease. This probably means that either the first test should be reconsidered (was this a paired comparison, which would "build in" the normalization subsequently used automatically?) or the second test should be reconsidered. It's especially strange because the n value in G, if based on cells, would seem to be ~50-times higher than that in I if based on mice.

      In Figure 1G, the analysis tests whether individual pyramidal neurons significantly changed their responses before vs. after RWS+CF stimulation. This is a paired comparison at the single-cell level, and here indicates that the average per-neuron response did not reliably decrease after RWS+CF when comparing each cell’s pre- and post-values directly. In contrast, Figure 1I examines the same dataset analyzed across time bins using a two-way ANOVA, which tests for effects of time, group (RWS vs. RWS+CF), and their interaction. The analysis showed a significant group effect (p < 0.001), indicating that the overall level of activity across all time points differed between RWS and RWS+CF conditions. The difference in significance between these two analyses arises because the first test (Fig. 1G) assesses within-neuron changes (paired), whereas the second test (Fig. 1I) assesses overall population-level differences between groups over time (independent groups). Thus, the tests address related but distinct questions—one about per-cell response changes, the other about how activity differs across experimental conditions.

      (2) 1J RWS+CF then shows a much smaller difference with overlapping error bars than the ns difference with nonoverlapping errors in 1G, but J gets three asterisks (same n-values).

      Bar graphs have been replaced with box plots.

      (3) 1K, it is very unclear what is under the asterisk could possibly be significant here, since the black and white dots overlap and trade places multiple times.

      See response to point 1. A significant group effect will exist if the aggregate difference across all time bins exceeds within-group variability. The asterisk therefore reflects a statistically significant main group effect (RWS versus RWS+CF) rather than differences at any single time point. Note, however, the very small effect size here.

      (4) 2B, 2G, 2H, 2I, 3G, 3H, 5C etc, again, significance with overlapping error bars, see suggestions above.

      Bar graphs have been replaced with box plots.

      (5) Time windows: e.g., L149-153 / 2B - this section reads weirdly. I think it would be less offputting to show a time-varying significance, if you want to make this point (there are various approaches to this floating around), or a decay rate, or something else.

      Here, we wanted to understand the overall direction of influence of CFs on VIP activity. We find that CFs exert a suppressive effect on VIP activity, which is statistically significant in this later time window. The specific effect of CF modulation on the activity of S1 neurons across multiple time points will be described in more detail in future investigations.

      (6) 4G, 6I, these asterisks again seem impossible (as currently presented).

      Bar graphs have been replaced with box plots.

      The writing is in generally ok shape, but needs tightening/clarifying:

      (1) L45 "mechanistic capacity" not clear.

      We have simplified this term to “capacity.” We use the term here to express that the central question we pose is whether CF signals are able to impact S1 circuits. We demonstrate CF signals indeed influence S1 circuits and further describe the mechanism through which this occurs, but we do not yet know all of the natural conditions in which this may occur. We feel that “capacity” describes the question we pose -- and our findings -- very well.

      (2) L48-58 there's a lot of material here, not clear how much is essential to the present study.

      We would like to give an overview of the literature on instructive CF signaling within the cerebellum. Here, we feel it is important to describe how CFs supervise learning in the cerebellum via coincident activation of parallel fiber inputs and CF inputs. Our results demonstrate CFs have the capacity to supervise learning in the neocortex in a similar manner, as coincident CF activation with sensory input modulates plasticity of S1 neurons.

      (3) L59 "has the capacity to" maybe just "can".

      This has been adopted. We agree that “can” is a more straightforward way of saying “has the capacity to” here. In this sentence, “can” and “has the capacity to” both mean a general ability to do something, without explicit knowledge about the conditions of use.

      (4) L61-62 some of this is circular "observation that CF regulates plasticity in S1..has consequences for plasticity in S1".

      We now changed this to read “…consequences for input processing in S1.”

      (5) L91 "already existing whisker input" although I get it, strictly speaking, not clear what this means.

      This sentence has been reworded for clarity.

      (6) L94 "this form of plasticity" what form?

      Edited to read “sensory-evoked plasticity.”

      (7) L119 should say "to test the".

      This has been corrected.

      (8) L120 should say "well-suited to measure receptive fields".

      We agree; this wording has been adopted.

      (9) L130 should say "optical imaging demonstrated that receptive field".

      This has been adopted.

      (10) L138, the disclaimer is helpful, but wouldn't it be less confusing to just pick a different set of terms? Response potentiation etc.

      Perhaps, but we want to stress that components of LTP and LTD (traditionally tested using electrophysiological methods to specifically measure synaptic gain changes) can be optically measured as long as it is specified what is recorded.

      (11) L140, this whole section is not very clear. What was the experiment? What was done and how?

      The text in this section has been updated.

      (12) L154, 156, 158, 160, 960, what is a "basic response"? Is this supposed to contrast with RWS? If so, I would just say "we measured the response to whisker stimulation without first performing RWS, and compared this to the whisker stimulation with simultaneous CF activation."

      What we meant by “basic response” was the acute response of S1 neurons to a single 100 ms air puff. Here, we indeed measured the acute responses of S1 neurons to whisker stimulation (100 ms air puff) and compared them to whisker stimulation with simultaneous CF activation (100 ms air puff with a 50 ms light pulse; the light pulse was delayed 45 ms with respect to the air puff). This paragraph has been reworded for clarity.

      (13) L156 "comprised of a majority" unclear. You mean most of the nonspecific IN group is either PV or SST?

      Yes, that was meant here. This paragraph has been reworded for clarity.

      (14) L165 tense. "are activated" "we tested" prob should be "were activated."

      This sentence was reworded.

      (15) L173 Not requesting additional experiments, but demonstrating that the effect is mimicked by directly activating SST or suppressing VIP questions the specificity of CF activation per se, versus presumably many other pathways upstream of the same mechanisms, which might be worth acknowledging in the text.

      We indeed observe that directly activating SST or suppressing VIP neurons in S1 is sufficient to mediate the effect of CF activation on S1 pyramidal neurons, implicating SST and VIP neurons as the local effectors of CF signaling. In the text, we wrote “...the notion of sufficiency does not exclude potential effects of plasticity processes elsewhere that might well modulate effector activation in this context and others not yet tested.” Here, we mean that CFs are certainly not the only modulators of the inhibitory network in S1. One example we highlight in the discussion is that projections from M1 are known to modulate this disinhibitory VIP-to-SST-to-PN microcircuit in S1. We conclude from our chemogenetic manipulation experiments that CFs ultimately have the capacity to modulate S1 interneurons, which must occur indirectly (either through the thalamus or “upstream” regions as this reviewer points out). The fact that many other brain regions may also modulate the interneuron network in S1 -- or be modulated by CF activity themselves -- only expands the capacity of CFs to exert a variety of effects on S1 neurons in different contexts.

      (16) L247 "induced ChR2" awkward.

      We changed this to read “we expressed ChR2.”

      (17) 6C, what are the three colors supposed to represent?

      We apologize for the missing labels in this version of the manuscript. Figure 6C and the figure legend have been updated.

    1. 第一梯队:必刷,且要深挖 (★★★★★) 这部分直接对应PNC的核心算法逻辑,面试必考,工作中常用。

      1. 图论 (Graph Theory) 地位: PNC的灵魂。

      为什么刷: 全局路径规划(Global Routing)完全依赖图搜索。

      重点题目类型:

      BFS / DFS (广度/深度优先搜索): 是一切搜索的基础。

      最短路径 (Dijkstra / Floyd): 必须滚瓜烂熟。

      拓扑排序 (Topological Sort): 处理任务依赖关系时偶尔用到。

      (注:LeetCode上很少有直接的 A 题目,但你需要用 Dijkstra 的题去练习 A 的写法)

      1. 数组 (Array) 地位: 基础中的基础。

      为什么刷: 自动驾驶处理的是矩阵、栅格地图(Grid Map)、点云。

      重点题目类型:

      二维矩阵操作: 比如“矩阵旋转”、“岛屿数量”(本质是搜索)、“搜索二维矩阵”。

      前缀和 (Prefix Sum): 快速计算某段轨迹的累积代价。

      1. 栈与队列 (Stack & Queue) -> 特指 优先队列 (Priority Queue) 地位: 路径规划加速器。

      为什么刷: 图片里可能把“堆”归类在了这里。你需要精通 std::priority_queue(最小堆/最大堆)。

      重点题目类型: Top K 问题、合并K个排序链表(类似多路归并)。这直接对应 A* 算法中 OpenList 的维护。

      第二梯队:选刷,理解思想 (★★★) 这部分有助于解决特定子问题,或者优化性能。

      1. 动态规划 (Dynamic Programming) PNC视角: 在PNC中,DP常用于速度规划(Speed Planning)。例如在 S-T 图(路程-时间图)上寻找一条代价最小的速度曲线,本质就是一个在一个网格中找最优路径的DP问题。

      刷题策略: 不需要刷太偏太难的数学DP,重点刷“网格路径类”和“打家劫舍类”(相邻约束问题)。

      1. 二叉树 (Binary Tree) PNC视角: 标准二叉树用得少,但空间划分树(KD-Tree, Octree)用得多。

      刷题策略: 重点练习树的遍历(递归与非递归)、计算树的深度。这是为了让你理解如何在一个层级结构中快速查找数据。

      1. 滑动窗口 / 双指针 (Two Pointers) PNC视角: 轨迹平滑和处理。

      场景: 比如你需要检查一条长轨迹中,是否存在一段连续的曲率过大的点。这就是一个滑动窗口问题。

      1. 贪心算法 (Greedy) PNC视角: 行为规划(Behavior Planning)中有时会用贪心策略做决策(先变道还是先加速?)。刷一些基础题保持脑子灵活即可。

      第三梯队:可以直接跳过 / 浏览即可 (★) 这部分在PNC领域性价比极低,除非为了应付纯计算机类的通用面试,否则别浪费时间。

      1. 字符串 (String)理由: 自动驾驶处理的是坐标 $(x, y, z, v, a)$,不是文本。除了简单的日志解析,你基本不会遇到“回文串”、“括号匹配”这种问题。

      2. 链表 (Linked List)理由: 正如之前所说,链表内存不连续,对 Cache 不友好,在追求极致性能的 C++ PNC 代码中几乎被 std::vector 全面取代。面试手撕链表通常是为了考察指针操作能力,而不是因为工程中真这么用。会反转链表就行,别钻太深。

      3. 单调栈 (Monotonic Stack) / 回溯算法 (Backtracking)理由:回溯: 也就是暴力穷举。自动驾驶要求 10ms-100ms 必须出结果,回溯的时间复杂度通常是指数级的,工程上不可接受(除非解空间极小)。单调栈: 太针对特定题目,通用性不强。

    1. https://youtube.com/watch?v=TAQ7yBLRZ3U&feature=shared

      Certainly! Here’s a detailed summary and key insights from the YouTube talk “Use.GPU - Declarative/Reactive 3D Graphics by Steven Wittens #LambdaConf2024” (link to video):


      Overview

      Steven Wittens introduces Use.GPU, a TypeScript library for driving WebGPU with a declarative and reactive programming model. The talk explores the motivation, design, and technical underpinnings of Use.GPU, emphasizing productivity, maintainability, and the bridging of web and graphics paradigms.


      Key Topics Covered

      1. The Problem with Traditional 3D Graphics Development

      • High Complexity & Maintenance Cost: Building custom 3D graphics (e.g., configurators, data visualizations, CAD apps) is often slow, expensive, and results in code that’s hard for teams to maintain.
      • Specialization Barrier: The field is so specialized that many companies avoid using advanced GPU graphics due to the expertise required.

      2. The Permutation Problem

      • Example: A 3D house configurator requires manually assembling assets and coding every possible combination of options, leading to exponential complexity.
      • Customization Pain: Existing visualization libraries (like Deck.gl) are hard to deeply customize without forking and maintaining complex codebases.

      3. The Web vs. Graphics Divide

      • Graphics World: Driven by games/CAD, large teams, offline delivery, monolithic codebases, and focus on rendering performance.
      • Web World: Driven by SaaS, small teams, continuous delivery, focus on compatibility, composition, and reuse.
      • Different Priorities: These differences make it hard to bring GPU graphics into mainstream web development.

      4. Live: A React-like Runtime

      • What is Live? A React-inspired, incremental, and reactive runtime that allows for declarative UI and graphics code.
      • Key Features:
      • Incremental updates: Only re-executes code in response to changes.
      • Implicit, one-way data flow.
      • Declarative side effects: Auto-mounting and disposal.
      • Enables features like undo/redo and multiplayer state management.
      • Unique Twist: Live allows data to flow back from child to parent components—something not possible in React—which is crucial for certain graphics/data workflows.

      5. Use.GPU: Declarative WebGPU

      • Goal: Make GPU graphics as easy to use and maintain as modern web UIs.
      • Approach: Use familiar JSX-like syntax and React-style components to describe 3D scenes and behaviors.
      • Incremental Rendering: The system is designed as if rendering one frame, and only reruns necessary parts for interactivity/animation.
      • Bridging the Gap: By combining Live’s reactive model with WebGPU, Use.GPU makes advanced graphics accessible to web developers.

      6. Technical Insights

      • Immediate vs. Retained Mode:
      • Immediate mode (e.g., Canvas): Easy but doesn’t scale for complex interactivity.
      • Retained mode (e.g., GPU): More efficient but much harder to program and maintain.
      • GPU as a Pure Function Applicator: The challenge is efficiently feeding unique data to millions of parallel shader invocations, with memory bandwidth as a key constraint.
      • Use.GPU’s Innovation: Abstracts away much of the boilerplate and complexity, letting developers focus on high-level structure and reactivity.

      Why This Matters

      • Productivity: Use.GPU aims to democratize GPU programming for web developers, reducing the need for deep graphics expertise.
      • Maintainability: Declarative, reactive patterns make complex interactive graphics more maintainable and composable.
      • New Possibilities: Opens the door for more sophisticated, interactive, and visually rich web applications.

      Further Resources


      TL;DR

      Use.GPU is a new TypeScript/WebGPU library that brings React-style declarative, reactive programming to 3D graphics in the browser. Built on the “Live” runtime, it enables maintainable, high-performance graphics apps with familiar web development patterns—potentially revolutionizing how interactive graphics are built on the web.


      If you want a specific section of the talk summarized, or code examples from Use.GPU, let me know!

      Citations: [1] watch?v=TAQ7yBLRZ3U https://www.youtube.com/watch?v=TAQ7yBLRZ3U

  3. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. 21.6. Bibliography# [u1] Plato. Phaedrus: Translated by Benjamin Jowett. January 2013. Page Version ID: 1189255462. [u2] Luddite. December 2023. Page Version ID: 1189255462. URL: https://en.wikipedia.org/w/index.php?title=Luddite&oldid=1189255462 (visited on 2023-12-10). [u3] Ted Chiang. Will A.I. Become the New McKinsey? The New Yorker, May 2023. URL: https://www.newyorker.com/science/annals-of-artificial-intelligence/will-ai-become-the-new-mckinsey (visited on 2023-12-10). [u4] xkcd comics. The Pace of Modern Life. June 2013. URL: https://xkcd.com/1227/ (visited on 2023-12-10). [u5] xkcd comics. 1227: The Pace of Modern Life - explain xkcd. June 2013. URL: https://www.explainxkcd.com/wiki/index.php/1227:_The_Pace_of_Modern_Life (visited on 2023-12-10). [u6] Steven Spielberg. Jurassic Park. June 1993. URL: https://www.imdb.com/title/tt0107290/. [u7] Alex Blechman [@AlexBlechman]. Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus. November 2021. URL: https://twitter.com/AlexBlechman/status/1457842724128833538 (visited on 2023-12-10). [u8] Silicon Valley. April 2014. URL: https://www.imdb.com/title/tt2575988/. [u9] Eli Whitney. December 2023. Page Version ID: 1189351897. URL: https://en.wikipedia.org/w/index.php?title=Eli_Whitney&oldid=1189351897 (visited on 2023-12-10). [u10] Alfred Nobel. December 2023. Page Version ID: 1189282550. URL: https://en.wikipedia.org/w/index.php?title=Alfred_Nobel&oldid=1189282550 (visited on 2023-12-10). [u11] Einstein and the Manhattan Project. URL: https://www.amnh.org/exhibitions/einstein/peace-and-war/the-manhattan-project (visited on 2023-12-10). [u12] Steve Krenzel [@stevekrenzel]. With Twitter's change in ownership last week, I'm probably in the clear to talk about the most unethical thing I was asked to build while working at Twitter. 🧵. November 2022. URL: https://twitter.com/stevekrenzel/status/1589700721121058817 (visited on 2023-12-10). [u13] Britney Nguyen. Ex-Twitter engineer says he quit years ago after refusing to help sell identifiable user data, worries Elon Musk will 'do far worse things with data'. November 2022. URL: https://www.businessinsider.com/former-twitter-engineer-worried-how-elon-musk-treat-user-data-2022-11 (visited on 2023-12-10). [u14] Alphabet Workers Union-Communications Workers of America Local 9009. Our People: Workers are coming together to build power across Alphabet. URL: https://www.alphabetworkersunion.org/our-people (visited on 2023-12-10). [u15] Jason Parham. A People’s History of Black Twitter, Part I. Wired, July 2021. URL: https://www.wired.com/story/black-twitter-oral-history-part-i-coming-together/ (visited on 2023-12-10). [u16] Jason Parham. There Is No Replacement for Black Twitter. Wired, November 2022. URL: https://www.wired.com/story/black-twitter-elon-musk/ (visited on 2023-12-10). [u17] Catherine Buni. Media, company, behemoth: What, exactly, is Facebook? November 2016. URL: https://www.theverge.com/2016/11/16/13655102/facebook-journalism-ethics-media-company-algorithm-tax (visited on 2023-12-10). [u18] Rafi Letzter. A teenager on TikTok disrupted thousands of scientific studies with a single video. September 2021. URL: https://www.theverge.com/2021/9/24/22688278/tiktok-science-study-survey-prolific (visited on 2023-12-10). [u19] Catherine D'Ignazio and Lauren F. Klein. Data Feminism. Strong Ideas. MIT Libraries Experimental Collections Fund, Cambridge, 1 edition, 2020. ISBN 978-0-262-04400-4. URL: https://direct.mit.edu/books/oa-monograph/4660/Data-Feminism, doi:10.7551/mitpress/11805.001.0001. [u20] Janet Abbate. Recoding Gender: Women's Changing Participation in Computing. MIT Press, Cambridge, UNITED STATES, 2012. ISBN 978-0-262-30546-4. URL: http://ebookcentral.proquest.com/lib/washington/detail.action?docID=3339524 (visited on 2023-12-10). [u21] Mar Hicks. Programmed Inequality: How Britain Discarded Women Technologists and Lost Its Edge in Computing. MIT Press, Cambridge, UNITED STATES, 2017. ISBN 978-0-262-34294-0. URL: http://ebookcentral.proquest.com/lib/washington/detail.action?docID=6246618 (visited on 2023-12-10). [u22] Charlton D. McIlwain. Black software: the internet and racial justice, from the AfroNet to Black Lives Matter. 2020. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99162262159401452. [u23] Simone Browne. Dark Matters: On the Surveillance of Blackness. Duke University Press, September 2015. ISBN 978-0-8223-7530-2. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99161921055701452 (visited on 2023-12-10), doi:10.1215/9780822375302. [u24] Safiya Umoja Noble. Algorithms of Oppression: How Search Engines Reinforce Racism. New York University Press, New York, UNITED STATES, 2018. ISBN 978-1-4798-3364-1. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99162068349301452 (visited on 2023-12-10). [u25] Shalini Kantayya. Coded Bias. November 2020. URL: https://www.netflix.com/title/81328723 (visited on 2023-12-10). [u26] Tarleton Gillespie. Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Yale University Press, New Haven, UNITED STATES, 2018. ISBN 978-0-300-23502-9. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99162362661601452 (visited on 2023-12-10). [u27] Sarah T. Roberts. Behind the screen: content moderation in the shadows of social media. 2019. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99162217744201452. [u28] Jean Burgess, Alice Marwick, and Thomas Poell. The SAGE Handbook of Social Media. SAGE Publications, 55 City Road, London, 2018. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99162105658401452 (visited on 2023-12-10), doi:10.4135/9781473984066. [u29] Yuri Takhteyev. Coding Places: Software Practice in a South American City. The MIT Press, September 2012. ISBN 978-0-262-30559-4. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99161981926801452 (visited on 2023-12-10), doi:10.7551/mitpress/9109.001.0001. [u30] Virginia Eubanks. Automating inequality: how high-tech tools profile, police, and punish the poor. 2018. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99162064355601452. [u31] Mary L. Gray and Siddharth Suri. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Houghton Mifflin Harcourt Publishing Company, Boston, United States, 2019. ISBN 978-1-328-56628-7. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99162207131801452 (visited on 2023-12-10). [u32] Shoshana Zuboff. The age of surveillance capitalism: the fight for a human future at the new frontier of power. 2019. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99162177355601452. [u33] Cathy O'Neil. Weapons of math destruction: how big data increases inequality and threatens democracy. 2016. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99161951137601452. [u34] Sasha Costanza-Chock. Design justice: community-led practices to build the worlds we need. Information policy series. The MIT Press, Cambridge, Massachesetts, 2020. ISBN 978-0-262-35686-2. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99162363060401452. [u35] Thomas S. Mullaney, Benjamin Peters, Mar Hicks, and Kavita Philip. Your computer is on fire. The MIT Press, Cambridge, Massachusetts, 2021. ISBN 978-0-262-36077-7. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99162423945901452, doi:10.7551/mitpress/10993.001.0001. [u36] Sara Wachter-Boettcher. Technically wrong: sexist apps, biased algorithms, and other threats of toxic tech. October 2018. URL: https://orbiscascade-washington.primo.exlibrisgroup.com/permalink/01ALLIANCE_UW/8iqusu/alma99329653362401451. [u37] Saunders, Joe and Carl Fox, editors. Media Ethics, Free Speech, and the Requirements of Democracy. Routledge, New York, December 2018. ISBN 978-0-203-70244-4. URL: https://www.taylorfrancis.com/books/edit/10.4324/9780203702444/media-ethics-free-speech-requirements-democracy-carl-fox-joe-saunders, doi:10.4324/9780203702444. [u38] Ruha Benjamin. Viral Justice: How We Grow the World We Want. Princeton University Press, October 2022. ISBN 978-0-691-22288-2. URL: https://press.princeton.edu/books/hardcover/9780691222882/viral-justice (visited on 2023-12-10). [u39] Meta for Developers. 2023. URL: https://developers.facebook.com/ (visited on 2023-12-10). [u40] API Reference — Facebook SDK for Python 4.0.0-pre documentation. 2015. URL: https://facebook-sdk.readthedocs.io/en/latest/api.html (visited on 2023-12-10). [u41] TikTok for Developers. 2023. URL: https://developers.tiktok.com/ (visited on 2023-12-10). [u42] Getting started with Official Account Developer Mode. January 2013. URL: https://developers.weixin.qq.com/doc/offiaccount/en/Getting_Started/Getting_Started_Guide.html (visited on 2023-12-10).

      After checking out Coded Bias, I was honestly surprised how much everyday technology relies on algorithms that were never tested on diverse groups of people. The documentary shows how facial recognition failed on darker-skinned women, which made me think about how “neutral” tech isn’t neutral at all. What really got me is how the developers didn’t seem to think about these consequences until people called them out. It connects perfectly to the chapter’s theme that innovation often ignores ethics until harm already happens. It also made me wonder how many other systems we use every day have hidden biases we just haven’t noticed yet.

  4. bafybeig7nrhxx3nyb5rfmuj7cfy5xbl4ldtwr57ol6lykibww625qkxnke.ipfs.dweb.link bafybeig7nrhxx3nyb5rfmuj7cfy5xbl4ldtwr57ol6lykibww625qkxnke.ipfs.dweb.link
    1. Origo Folder for my hyperpost Peergos Account

      No Groan Zome

      but

      Not just Converge but UpVerge in an autopoietic emregent upward spiral

      Beyond all expectations

      Imagined a whole new way what that leads to is beyond prior imaginings

    1. Le droit des enfants à une justice adaptée : Synthèse du rapport 2025 du Défenseur des droits

      Résumé Exécutif

      Le rapport 2025 du Défenseur des droits, intitulé « Le droit des enfants à une justice adaptée », dresse un état des lieux critique de la justice pénale des mineurs en France. S'appuyant sur une vaste consultation de plus de 1 600 jeunes, le rapport réaffirme le principe fondamental selon lequel un enfant n'est pas un adulte, ce qui justifie une justice spécialisée, dont la primauté doit être éducative plutôt que répressive.

      Les conclusions clés sont les suivantes :

      Un principe fondamental menacé :

      La spécificité de la justice des mineurs, fondée sur l'atténuation de la responsabilité pénale et la recherche du relèvement éducatif, est fragilisée par des discours publics et des réformes législatives prônant un durcissement des sanctions, au mépris de l'intérêt supérieur de l'enfant et des engagements internationaux de la France.

      La délinquance, symptôme de vulnérabilités :

      Loin d'être un phénomène isolé, la délinquance juvénile est intrinsèquement liée à des facteurs de vulnérabilité multiples : 55 % des mineurs délinquants sont suivis par la protection de l’enfance, souvent après avoir été victimes de maltraitances.

      La pauvreté, l'échec scolaire, les troubles de santé mentale et l'exposition à la violence sont des déterminants majeurs.

      Un parcours pénal parsemé de défaillances :

      De l'interpellation à l'incarcération, le rapport met en évidence des manquements systémiques au respect des droits des enfants.

      Les contrôles d'identité discriminatoires, les violences lors des interpellations, les conditions de garde à vue inadaptées et les atteintes à la dignité en détention nourrissent une profonde défiance des jeunes envers les institutions.

      Une réponse judiciaire sous-dotée et incohérente :

      Malgré les efforts des professionnels, le système souffre d'un manque criant de moyens.

      Les mesures éducatives ne sont pas toujours mises en œuvre faute de personnel, et les conditions d'incarcération, qui devrait être l'ultime recours, compromettent gravement les chances de réinsertion en raison d'un accès insuffisant à l'éducation, aux soins et aux activités.

      La parole des jeunes, un appel à une justice plus humaine :

      La consultation révèle une méconnaissance généralisée des droits et une perception négative de la justice chez les jeunes qui y ont été confrontés.

      Ils appellent à une justice plus juste, compréhensible, préventive et bienveillante, qui prenne en compte leur vécu et leur offre une véritable seconde chance.

      En conclusion, le rapport alerte sur le risque d'une justice qui, en privilégiant une approche exclusivement répressive, reproduirait l'exclusion qu'elle entend combattre.

      Il formule 25 recommandations visant à sanctuariser les principes d'une justice adaptée, à renforcer la prévention en luttant contre les vulnérabilités, et à garantir le respect des droits des enfants à chaque étape de leur parcours pénal.

      --------------------------------------------------------------------------------

      I. Les Fondements d'une Justice Spécifique pour les Mineurs

      Le rapport rappelle que la nécessité d'une justice pénale distincte pour les mineurs repose sur des principes juridiques, constitutionnels et scientifiques solides, bien que régulièrement remis en cause dans le débat public.

      1. Le Principe Fondamental : Un Enfant n'est pas un Adulte

      Le discernement, c'est-à-dire la capacité à comprendre et vouloir son acte, se développe progressivement.

      Les neurosciences confirment que le cortex préfrontal, responsable du raisonnement et de la régulation des émotions, n'atteint sa pleine maturité qu'autour de 24-25 ans.

      Les adolescents sont donc physiologiquement plus sujets à l'impulsivité, à l'influence du groupe et à une mauvaise évaluation des conséquences de leurs actes.

      « On n’est pas assez mature, on n’a pas conscience de nos actes. » - Jeune consulté

      Le Code de la justice pénale des mineurs (CJPM) de 2021 a instauré une présomption simple de non-discernement pour les enfants de moins de 13 ans.

      Le Défenseur des droits estime cette mesure insuffisante et recommande d'inscrire dans la loi un principe de non-responsabilité pénale absolue en deçà de cet âge (Recommandation 1).

      2. Le Cadre Juridique : Primauté de l'Éducatif sur le Répressif

      La justice des mineurs en France, héritière de l'ordonnance du 2 février 1945, repose sur des principes à valeur constitutionnelle :

      L'atténuation de la responsabilité pénale en fonction de l'âge.

      La primauté de l'éducatif sur le répressif, visant le « relèvement éducatif et moral » de l'enfant.

      La spécialisation des juridictions (juge des enfants, tribunal pour enfants) et des professionnels.

      Ces principes sont conformes aux engagements internationaux de la France, notamment la Convention internationale des droits de l’enfant (CIDE).

      Le rapport s'inquiète des récentes tentatives de les éroder, comme la loi du 23 juin 2025 qui visait initialement à instaurer une comparution immédiate pour les mineurs de plus de 16 ans, une mesure largement censurée par le Conseil constitutionnel.

      3. La Parole des Jeunes : Une Perception Contrastée de la Justice

      La consultation nationale « J’ai des droits, entends-moi ! » révèle une fracture profonde :

      • Les jeunes n'ayant jamais eu affaire à la justice ont une perception plutôt positive de son rôle protecteur.

      • Ceux qui y ont été confrontés décrivent une expérience marquée par le déficit d'information, le sentiment de ne pas être écoutés et des pratiques discriminatoires, notamment pour les jeunes issus de quartiers prioritaires ou perçus comme d'origine étrangère.

      « Dans la justice, y a une injustice : quand c’est des Blancs ou des Arabes c’est différent, ce n’est pas le même traitement. » - Jeune consulté

      Globalement, les jeunes aspirent à une justice « compréhensible, éducative, préventive, cadrante mais bienveillante, accompagnante », qui répare et offre une seconde chance.

      « Une justice adaptée, ce n’est pas seulement juger, c’est aider les jeunes dans leur souffrance. (...) Nous enfermer (...) n’est probablement pas la meilleure solution. Nous voulons être éduqués et obtenir une seconde chance. » - Lettre collective de mineurs incarcérés

      II. Prévention : Agir sur les Racines de la Délinquance

      Le rapport insiste sur le fait que la lutte contre la délinquance juvénile passe avant tout par un investissement massif dans la prévention et la protection des enfants contre les facteurs de vulnérabilité.

      1. Les Facteurs de Risque Identifiés

      La délinquance est souvent la conséquence de parcours de vie marqués par des ruptures et des fragilités.

      Facteur de Vulnérabilité

      Données et Constats du Rapport

      Situation familiale et sociale

      55 % des mineurs délinquants sont suivis par la protection de l’enfance. 46 % de ceux en Centre Éducatif Fermé (CEF) ont un père absent.

      La précarité socio-économique est citée par les jeunes comme la première cause du passage à l'acte.

      Rupture scolaire

      Le risque de délinquance est multiplié par huit en cas d'absentéisme scolaire. 72 % des jeunes suivis par la PJJ à Marseille sont ou ont été déscolarisés.

      Santé mentale et handicap

      90 % des jeunes en CEF présentent au moins un trouble psychiatrique. Le manque de structures de soins et d'accompagnement adapté aggrave leur fragilité.

      Exposition à la violence

      L'exposition à la violence (familiale, scolaire, numérique, sexuelle) favorise la reproduction des comportements violents. Le rapport note une augmentation de 77 % des mineurs mis en cause pour violences sexuelles entre 2017 et 2024.

      Exploitation par des réseaux

      Des mineurs, notamment les non-accompagnés (MNA), sont victimes de traite des êtres humains à des fins de délinquance forcée (trafic de stupéfiants, prostitution). Ils sont souvent traités comme des auteurs et non comme des victimes.

      2. Les Leviers de la Prévention

      Pour contrer ces facteurs, le rapport préconise de renforcer plusieurs dispositifs.

      La prévention spécialisée : Les "éducateurs de rue" qui vont à la rencontre des jeunes en marge jouent un rôle capital. Cependant, ce secteur souffre d'un déploiement inégal sur le territoire et d'une pénurie de professionnels.

      Le soutien à la parentalité : Le rapport privilégie un accompagnement des familles en difficulté plutôt qu'une approche purement punitive, s'interrogeant sur l'efficacité des sanctions financières contre des parents souvent déjà précaires.

      La protection de l’enfance : L'articulation entre l'Aide Sociale à l'Enfance (ASE) et la Protection Judiciaire de la Jeunesse (PJJ) est jugée indispensable mais défaillante, entravant une prise en charge globale des jeunes.

      III. Le Parcours Pénal : Une Garantie des Droits Défaillante

      Le rapport détaille, étape par étape, comment les droits spécifiques des mineurs sont mis à mal tout au long de la procédure pénale.

      1. Premier Contact : Contrôles d'Identité et Interpellations

      Contrôles d'identité : Le rapport dénonce l'existence de pratiques discriminatoires, s'appuyant sur ses propres enquêtes qui montrent que les jeunes hommes perçus comme noirs ou arabes ont 12 fois plus de risques de subir un contrôle "poussé".

      Ces pratiques, reconnues par la justice française (Cour de cassation, Conseil d'État) et européenne (CEDH), nourrissent un sentiment d'injustice et de défiance.

      Interpellations : Les témoignages de jeunes font état d'un usage disproportionné de la force, d'humiliations et de propos racistes, transformant l'interpellation en une expérience traumatisante.

      « Ils cherchent à provoquer les jeunes lors des contrôles, pour que cela dérape et qu’ils puissent les embarquer. » - Jeune consulté

      2. Enquête : Audition, Retenue et Garde à Vue

      Bien que le CJPM prévoie des garanties fortes (droit à un avocat sans dérogation, enregistrement audiovisuel, information des parents), leur application est défaillante.

      Auditions : Des mineurs sont interrogés sans notification de leurs droits ou dans des conditions inadaptées.

      Garde à vue : Décrite comme une expérience traumatisante, avec des conditions matérielles souvent médiocres, un manque d'information et un isolement anxiogène. La situation des mineurs en situation de handicap est particulièrement préoccupante.

      3. Jugement et Sanctions

      La réforme du CJPM a permis de réduire les délais de jugement (de 23 à 9,4 mois en moyenne), mais a engendré de nouvelles difficultés.

      Mise à l'épreuve éducative : Cette période entre l'audience de culpabilité et celle de sanction n'est souvent pas effective faute de moyens, vidant la réforme de son sens.

      Recours à l'audience unique : Prévue comme une exception, cette procédure qui statue en une seule fois sur la culpabilité et la sanction tend à se généraliser, au détriment de l'évaluation éducative.

      Compréhension : Les jeunes se plaignent d'un langage judiciaire inaccessible et du sentiment de ne pas être écoutés par les magistrats.

      4. L'Incarcération : L'Ultime Recours aux Effets Délétères

      L'incarcération des mineurs, possible dès 13 ans, doit rester exceptionnelle. Le rapport alerte sur ses conséquences dramatiques.

      "Choc carcéral" et suicides : L'enfermement est un traumatisme majeur. Cinq adolescents se sont suicidés en détention entre octobre 2023 et août 2024.

      Conditions de détention :

      Éducation : L'accès à la scolarité est très insuffisant (bien en deçà des 12 à 20 heures hebdomadaires prévues) et entravé par les contraintes sécuritaires.  

      Santé : La continuité des soins, notamment psychiatriques, est rompue.  

      Coordination : La collaboration entre l'Administration Pénitentiaire (AP) et la PJJ est difficile, avec des logiques parfois contradictoires (sécurité vs. éducatif).  

      Dignité : Les jeunes dénoncent la qualité et la quantité de la nourriture, le coût élevé des communications avec la famille, et des pratiques de fouilles intégrales jugées humiliantes et abusives.

      « Mettre ensemble plusieurs jeunes “perturbateurs”, ça ne fait que rassembler des idées de perturbations encore plus grandes. » - Jeune incarcéré

      IV. Réinsertion et Prévention de la Récidive

      La réinsertion n'est pas une simple étape post-sanction, mais un processus qui doit être engagé dès le début du parcours pénal.

      Préparer la sortie : Les fins de placement ou de détention sont des moments à haut risque de récidive.

      Le rapport souligne le besoin crucial d'anticiper ces transitions en coordonnant l'action de tous les acteurs (PJJ, ASE, éducation, etc.).

      Le droit à l'oubli : L'effacement des condamnations du casier judiciaire est essentiel pour permettre aux jeunes de se reconstruire sans être stigmatisés.

      Ce droit reste largement méconnu des principaux intéressés.

      Les jeunes eux-mêmes insistent sur l'importance de l'accompagnement, du soutien à leurs projets et de la possibilité de rencontrer des pairs au parcours de réinsertion réussi, qui incarnent une source d'espoir.

      « Nous devons avoir la possibilité de nous racheter sans être stigmatisés à vie. » - Jeune consulté

      V. Sélection de Recommandations Clés

      Parmi les 25 recommandations du rapport, plusieurs se distinguent par leur portée structurelle.

      Principes fondamentaux :

      Recommandation 1 : Inscrire dans la loi le principe de non-responsabilité pénale des mineurs de moins de 13 ans, sans exception.   

      Recommandation 4 : Créer un code de l’enfance pour unifier et clarifier l'ensemble des dispositions civiles et pénales.

      Prévention :

      Recommandation 5 : Renforcer les moyens alloués à la prévention du décrochage scolaire (plus de psychologues, d'assistants sociaux, etc.).   

      Recommandation 9 : Remettre la prévention spécialisée au cœur des politiques publiques avec un financement sécurisé et renforcé.

      Parcours Pénal :

      Recommandation 12 : Assurer la traçabilité des contrôles d’identité pour lutter contre les discriminations.   

      Recommandation 18 : Rendre la justice compréhensible pour les enfants en formant les professionnels à l'usage d'un langage simple et clair.

      Détention et Réinsertion :

      Recommandation 21 : Garantir l'effectivité de l'accès à l'éducation, à la santé et au maintien des liens familiaux en détention.   

      Recommandation 24 : Anticiper systématiquement la fin d’un placement ou d’une incarcération pour favoriser la réinsertion.  

      Recommandation 25 : Rendre systématique l'information des mineurs sur les procédures d’effacement du casier judiciaire pour rendre effectif le droit à l’oubli.

    1. Proposition pour une Réforme des Temps de l'Enfant : Synthèse Stratégique du Rapport de la Convention Citoyenne

      1.0 Introduction : Un Impératif National et une Opportunité Démocratique

      La réforme de l'organisation des temps de l'enfant est devenue un impératif national.

      Le modèle actuel, fragmenté et inadapté aux besoins fondamentaux de développement, de santé et d'apprentissage de millions d'enfants, fragilise notre cohésion sociale et hypothèque notre avenir collectif.

      L'épuisement des élèves, la croissance des inégalités et la pression constante exercée sur les familles ne sont plus des signaux faibles, mais les symptômes d'une crise systémique qui appelle une action politique courageuse et structurée.

      Comme le soulignait la lettre de saisine du Premier ministre, le système actuel est une superposition de « temps familial, temps scolaire et temps périscolaire » qui ne sont pas « pensés de façon articulée et globale ».

      Face à cette fragmentation, la Convention Citoyenne sur les temps de l'enfant a été mandatée pour produire une vision d'ensemble cohérente, capable de réaligner les politiques publiques sur l'intérêt supérieur de l'enfant.

      Cette démarche démocratique est inédite.

      En confiant cette réflexion à 133 citoyennes et citoyens tirés au sort, qui ont délibéré pendant 21 jours, les pouvoirs publics ont permis l'émergence d'une parole authentique, libre de tout clivage politique et de tout intérêt corporatiste.

      La légitimité des recommandations qui en émanent est donc particulièrement forte, car elle est le fruit d'un travail collectif, informé et représentatif de la diversité de la société française.

      La présente note de synthèse a pour objectif de présenter de manière stratégique les conclusions de ce travail exceptionnel.

      Elle exposera d'abord le diagnostic alarmant posé par la Convention, puis la vision directrice qui a guidé ses travaux, avant de détailler les axes de réforme concrets et les conditions impératives à leur succès.

      La compréhension fine du diagnostic est en effet le fondement de la nécessité d'agir.

      2.0 Le Diagnostic : Des Rythmes Inadaptés et des Inégalités Croissantes

      Les propositions de la Convention ne sont pas des opinions isolées ; elles reposent sur une analyse rigoureuse et partagée des dysfonctionnements profonds du système actuel.

      Ce diagnostic met en lumière une crise systémique où les problèmes ne sont pas seulement additionnels mais s'aggravent mutuellement, créant un cercle vicieux qui pénalise en premier lieu les plus vulnérables.

      Cinq constats centraux forment le socle de cette analyse.

      Une organisation subie : Les temps de l'enfant sont dictés par les contraintes des adultes et des institutions (horaires de travail, transports, logistique) et non par les besoins physiologiques, psychologiques et cognitifs de l'enfant.

      Des rythmes contre-productifs : Le rythme scolaire est en profond décalage avec les rythmes biologiques des enfants, ce qui nuit à leur concentration, altère leurs apprentissages et génère un déficit de sommeil chronique pour 20 à 30 % d'entre eux.

      Une pression constante : La densité des programmes scolaires, la place omniprésente des évaluations et la compétition génèrent anxiété et stress, dans une société qui valorise excessivement la performance et la productivité.

      L'érosion du temps libre : Le temps libre, essentiel au développement, se raréfie et se trouve dominé par une surexposition aux écrans, qui atteint près de 4h48 par jour en moyenne chez les 11-14 ans, avec des conséquences majeures sur la santé et les apprentissages.

      Un sous-investissement chronique : Le manque de moyens financiers et humains fragilise l'ensemble de la chaîne éducative et sociale, mettant en tension les professionnels (enseignants, animateurs, AESH) et dégradant la qualité de l'accompagnement.

      Ces constats sont aggravés par quatre enjeux transversaux qui démontrent que les problèmes de rythme ont des conséquences sociales profondes : la montée des violences et du harcèlement, le manque d'inclusion des enfants à besoins spécifiques, la dégradation de la santé physique et mentale, et surtout l'aggravation des inégalités.

      Sur ce dernier point, le rapport rappelle une réalité accablante : l'école française reste l'une des plus inégalitaires de l'OCDE, où l'origine sociale détermine encore massivement la réussite scolaire, comme en témoigne le fait que 71 % des enfants issus de familles modestes ne sont pas inscrits dans un club ou une association.

      Face à ce diagnostic sévère et multidimensionnel, la Convention a stratégiquement refusé la voie des ajustements marginaux pour élaborer une vision d'avenir cohérente et désirable.

      3.0 Une Vision Cohérente : Placer l'Enfant au Cœur du Projet de Société

      La Convention a correctement compris que la correction de défaillances systémiques exige une vision alternative convaincante, et non des solutions de fortune.

      Pour être efficace, une réforme ne peut être un simple ajustement technique ; elle doit être portée par un projet global et humaniste, qui repositionne l'enfant de simple sujet des politiques publiques à leur finalité centrale.

      La vision de la Convention s'articule autour de trois piliers fondamentaux.

      Un socle commun élargi pour apprendre autrement

      Ce pilier est une réponse directe au diagnostic d'un système qui génère une « pression constante » en survalorisant un ensemble restreint de compétences académiques.

      La Convention propose de valoriser à égalité les apprentissages théoriques, concentrés le matin lorsque l'attention est maximale, et les apprentissages pratiques, artistiques, culturels et sportifs, développés l'après-midi.

      Cette approche, qui intègre des ateliers de vie quotidienne concrets (bricolage, cuisine, couture, gestion du budget), vise à reconnaître toutes les formes d'intelligence, à redonner du sens et du plaisir aux apprentissages et à permettre à chaque enfant de se réaliser.

      Une gouvernance équilibrée Pour s'attaquer aux « inégalités croissantes » identifiées dans le diagnostic, la Convention préconise un modèle de gouvernance à deux niveaux.

      Un pilotage national fort doit fixer un cap clair, garantir le cadre commun et assurer l'égalité des chances sur tout le territoire.

      Parallèlement, une mise en œuvre locale autonome doit permettre à chaque territoire d'adapter les politiques à ses spécificités, de mobiliser ses ressources propres (associations, acteurs culturels, environnement naturel) et de construire un projet éducatif pertinent et partagé, qui ne soit pas un mandat uniforme.

      Des temps de vie de qualité

      En réponse à « l'érosion du temps libre » et à la pression exercée sur les familles, ce pilier vise à redonner aux enfants du « temps libre vraiment libre », essentiel à leur développement personnel et à leur créativité, notamment en allégeant la charge des devoirs. Simultanément, la Convention appelle à soutenir une parentalité accompagnée, qui permette aux parents de retrouver du temps et de la sérénité dans leur relation avec leurs enfants, libérée de la surcharge logistique et de l'anxiété liées au système actuel.

      C'est sur la base de cette vision que la Convention a structuré ses 20 propositions d'action, conçues comme un ensemble cohérent et interdépendant pour une transformation systémique.

      4.0 Axes Stratégiques de la Réforme : Recommandations pour l'Action

      Cette section constitue le cœur opérationnel de la proposition.

      Les 20 recommandations adoptées par la Convention ne sont pas une simple liste de mesures, mais s'articulent logiquement en trois axes d'intervention complémentaires.

      Ensemble, ils visent une transformation systémique de l'organisation des temps de l'enfant et de l'écosystème qui l'entoure.

      4.1. Axe 1 : Restructurer les temps de l'enfant pour un développement harmonieux

      Cet axe regroupe les propositions (1 à 11) qui ciblent stratégiquement les causes profondes de la fatigue et du stress identifiées dans le diagnostic.

      En réalignant les rythmes de vie sur les besoins biologiques et psychologiques des enfants, il vise à transformer l'école d'une source de pression en un environnement structuré pour un développement sain.

      La journée scolaire repensée Cette refonte de la journée scolaire s'attaque directement au décalage chronobiologique et à la fatigue chronique mis en évidence dans le diagnostic. Les mesures clés incluent :

      • Le début des cours à 9h au collège et au lycée (Prop. 2) pour s'adapter au rythme de sommeil des adolescents.

      • La réduction des cours à 45 minutes effectives dans le secondaire (Prop. 4) pour maintenir une attention optimale.

      • Une pause déjeuner d'au moins 1h30 (Prop. 6 & 7), garantissant un temps de repas serein et un vrai temps de liberté.

      • La réalisation des devoirs essentiellement à l'école (Prop. 8) pour alléger la charge de travail à la maison et réduire les inégalités.

      La semaine et l'année rééquilibrées Pour garantir régularité et repos, conformément aux recommandations des chronobiologistes, la Convention propose :

      • Le passage à la semaine de 5 jours pour tous les niveaux, du lundi au vendredi (Prop. 9), pour lisser les apprentissages.

      • L'adoption d'un rythme annuel stable de 7 semaines de cours suivies de 2 semaines de vacances (Prop. 11), ce qui implique une réorganisation des zones de vacances.

      4.2. Axe 2 : Coordonner les acteurs, aménager les espaces et faciliter la mobilité

      Cet axe se concentre sur les propositions (12 à 17) visant à construire un environnement éducatif cohérent et des espaces de vie adaptés aux nouvelles ambitions pédagogiques, répondant ainsi au diagnostic d'un système fragmenté et d'infrastructures inadaptées.

      Une gouvernance unifiée et déconcentrée La Convention propose une refonte de la gouvernance à double niveau : un Ministère de l'Enfance puissant au niveau national (Prop. 12) pour corriger les inégalités systémiques identifiées dans le diagnostic, et des Projets Éducatifs de Territoire (PEdT) "nouvelle génération" obligatoires (Prop. 13) pour garantir une mise en œuvre adaptée au contexte local et non un mandat uniforme.

      Des espaces de vie adaptés La vision inclut la transformation des établissements en "campus des jeunes" via un plan bâtimentaire sur 20-30 ans (Prop. 14), avec des espaces flexibles, modulaires et ouverts sur l'extérieur (Prop. 15 & 16).

      Cette ambition vise à créer des environnements de bien-être adaptés aux nouvelles pédagogies et au changement climatique.

      Une mobilité facilitée et sécurisée Le "plan de mobilité jeunes" (Prop. 17) s'attaque directement à l'une des "contraintes des adultes" identifiées dans le diagnostic.

      Il vise à limiter les temps de trajet à 45 minutes maximum et à promouvoir activement les mobilités douces, réduisant ainsi une source majeure de fatigue et de stress.

      4.3. Axe 3 : Garantir des temps de qualité et accompagner la parentalité

      Cet axe répond aux défis modernes de l'éducation et de la vie familiale (Propositions 18 à 20), en s'attaquant directement aux nouvelles sources de pression et à l'érosion du temps libre diagnostiquées.

      Encadrer l'usage des écrans Face à l'omniprésence du numérique, une double approche est proposée.

      Elle consiste d'une part à informer, sensibiliser et accompagner les enfants et les parents (Prop. 18), et d'autre part à appliquer et renforcer la législation en vigueur (Prop. 19), notamment l'interdiction effective des réseaux sociaux avant 15 ans et le paramétrage par défaut des téléphones pour protéger les enfants.

      Soutenir la parentalité Pour mieux concilier vie familiale et professionnelle et alléger la pression sur les familles, il est proposé de renforcer le cadre légal des aides à la parentalité (Prop. 20), reconnaissant le rôle essentiel des parents et leur besoin de soutien pour se libérer de la surcharge logistique et de l'anxiété.

      Cependant, la Convention identifie lucidement que ces réformes ambitieuses sont conditionnées par un ensemble de prérequis structurels non négociables, qui doivent être abordés avec la même détermination.

      5.0 Prérequis pour la Réussite : Les Conditions d'une Mise en Œuvre Efficace

      La Convention a lucidement identifié que les réformes proposées, aussi pertinentes soient-elles, ne pourront porter leurs fruits sans la mise en place de leviers structurels indispensables.

      Ces prérequis transforment la vision en un plan d'action réaliste pour les pouvoirs publics, en conditionnant le succès à des engagements clairs.

      Investissement et Stabilité : Il est impératif de rompre avec le sous-investissement chronique.

      Cela exige un investissement financier pérenne et conséquent, sanctuarisé par une loi de programmation pluriannuelle.

      De plus, il est crucial de « penser le temps long » pour garantir la stabilité des politiques éducatives et échapper aux cycles politiques courts qui paralysent les réformes de fond.

      Valorisation du Capital Humain : Aucune réforme ne réussira sans les professionnels qui la mettent en œuvre.

      Il est donc impératif de réduire significativement les effectifs des classes et d'engager une revalorisation globale de l'ensemble des métiers de l'éducation (enseignants, animateurs, AESH, etc.), incluant les salaires, la formation continue et la reconnaissance de leur statut.

      Modernisation Pédagogique : Le changement de rythme doit s'accompagner d'une évolution des contenus.

      Il est nécessaire de repenser les programmes scolaires pour les alléger et les aligner sur la nouvelle structure de la journée.

      De plus, il faut garantir que les enfants, les jeunes et les professionnels soient systématiquement inclus dans les processus de décision qui les concernent.

      Adaptation des Infrastructures : Les conditions matérielles sont un prérequis au bien-être.

      La réussite de la réforme dépend directement de l'adaptation du bâti scolaire (rénovation, végétalisation, modularité) et de la réduction effective des temps de trajet, qui ne sont pas des objectifs secondaires mais des conditions fondamentales à l'épanouissement des enfants.

      La mise en œuvre de ces propositions, conditionnée par ces prérequis, constitue un projet de société ambitieux qui appelle une volonté politique sans faille.

      6.0 Conclusion : Un Investissement pour l'Avenir de la Nation

      La proposition de la Convention Citoyenne sur les temps de l'enfant suit une logique implacable : un diagnostic sévère sur l'état de notre système éducatif et social appelle une vision ambitieuse pour l'avenir de nos enfants.

      Cette vision est traduite en un ensemble de réformes concrètes et interdépendantes, dont les conditions de succès sont clairement identifiées.

      Nous disposons désormais d'une feuille de route cohérente, légitime et porteuse d'espoir.

      Comme l'affirment avec force les citoyennes et citoyens dans leur manifeste, l'heure n'est plus aux constats mais à l'action :

      Notre rapport ne doit pas être un rapport de plus, nous serons vigilants sur les suites données à notre travail. Nous attendons maintenant de nos décideurs politiques qu’ils prennent leurs responsabilités.

      La mise en œuvre de ces réformes ne doit pas être perçue comme une dépense, mais comme l'investissement le plus stratégique pour l'avenir de la Nation.

      Il s'agit de former des citoyens épanouis, en meilleure santé physique et mentale, capables de s'adapter aux défis de demain.

      Il s'agit de réduire les fractures sociales et territoriales à la racine, en offrant à chaque enfant, où qu'il vive, les mêmes chances de se réaliser pleinement.

      Il appartient désormais aux décideurs politiques de se montrer à la hauteur de cette ambition et de cet impératif démocratique.

    1. Reviewer #3 (Public review):

      Summary:

      The authors recorded brain responses while participants viewed images and captions. The images and captions were taken from the COCO dataset, so each image has a corresponding caption and each caption has a corresponding image. This enabled the authors to extract features from either the presented stimulus or the corresponding stimulus in the other modality. The authors trained linear decoders to take brain responses and predict stimulus features. "Modality-specific" decoders were trained on brain responses to either images or captions while "modality-agnostic" decoders were trained on brain responses to both stimulus modalities. The decoders were evaluated on brain responses while the participants viewed and imagined new stimuli, and prediction performance was quantified using pairwise accuracy. The authors reported the following results:

      (1) Decoders trained on brain responses to both images and captions can predict new brain responses to either modality.

      (2) Decoders trained on brain responses to both images and captions outperform decoders trained on brain responses to a single modality.

      (3) Many cortical regions represent the same concepts in vision and language.

      (4) Decoders trained on brain responses to both images and captions can decode brain responses to imagined scenes.

      Strengths:

      This is an interesting study that addresses important questions about modality-agnostic representations. Previous work has shown that decoders trained on brain responses to one modality can be used to decode brain responses to another modality. The authors build on these findings by collecting a new multimodal dataset and training decoders on brain responses to both modalities.

      To my knowledge, SemReps-8K is the first dataset of brain responses to vision and language where each stimulus item has a corresponding stimulus item in the other modality. This means that brain responses to a stimulus item can be modeled using visual features of the image, linguistic features of the caption, or multimodal features derived from both the image and the caption. The authors also employed a multimodal one-back matching task which forces the participants to activate modality-agnostic representations. Overall, SemReps-8K is a valuable resource that will help researchers answer more questions about modality-agnostic representations.

      The analyses are also very comprehensive. The authors trained decoders on brain responses to images, captions, and both modalities, and they tested the decoders on brain responses to images, caption, and imagined scenes. They extracted stimulus features using a range of visual, linguistic, and multimodal models. The modeling framework appears rigorous and the results offer new insights into the relationship between vision, language, and imagery. In particular, the authors found that decoders trained on brain responses to both images and captions were more effective at decoding brain responses to imagined scenes than decoders trained on brain responses to either modality in isolation. The authors also found that imagined scenes can be decoded from a broad network of cortical regions.

      Weaknesses:

      The characterization of "modality-agnostic" and "modality-specific" decoders seems a bit contradictory. There are three major choices when fitting a decoder: the modality of the training stimuli, the modality of the testing stimuli, and the model used to extract stimulus features. However, the authors characterize their decoders based on only the first choice-"modality-specific" decoders were trained on brain responses to either images or captions while "modality-agnostic" decoders were trained on brain responses to both stimulus modalities. I think that this leads to some instances where the conclusions are inconsistent with the methods and results.

      First, the authors suggest that "modality-specific decoders are not explicitly encouraged to pick up on modality-agnostic features during training" (line 137) while "modality-agnostic decoders may be more likely to leverage representations that are modality-agnostic" (line 140). However, whether a decoder is required to learn modality-agnostic representations depends on both the training responses and the stimulus features. Consider the case where the stimuli are represented using linguistic features of the captions. When you train a "modality-specific" decoder on image responses, the decoder is forced to rely on modality-agnostic information that is shared between the image responses and the caption features. On the other hand, when you train a "modality-agnostic" decoder on both image responses and caption responses, the decoder has access to the modality-specific information that is shared by the caption responses and the caption features, so it is not explicitly required to learn modality-agnostic features. As a result, while the authors show that "modality-agnostic" decoders outperform "modality-specific" decoders in most conditions, I am not convinced that this is because they are forced to learn more modality-agnostic features.

      Second, the authors claim that "modality-specific decoders can be applied only in the modality that they were trained on" while "modality-agnostic decoders can be applied to decode stimuli from multiple modalities, even without knowing a priori the modality the stimulus was presented in" (line 47). While "modality-agnostic" decoders do outperform "modality-specific" decoders in the cross-modality conditions, it is important to note that "modality-specific" decoders still perform better than expected by chance (figure 5). It is also important to note that knowing about the input modality still improves decoding performance even for "modality-agnostic" decoders, since it determines the optimal feature space-it is better to decode brain responses to images using decoders trained on image features, and it is better to decode brain responses to captions using decoders trained on caption features.

      Comments on revised version:

      The revised version benefits from clearer claims and more precise terminology (i.e. classifying the decoders as "modality-agnostic" or "modality-specific" while classifying the representations as "modality-invariant" or "modality-dependent").

      While the modality-agnostic decoders outperform the modality-specific decoders, I am still not convinced that this is because they are "explicitly trained to leverage the shared information in modality-invariant patterns of the brain activity". On one hand, the high-level feature spaces may each contain some amount of modality-invariant information, so even modality-specific decoders can capture some modality-invariant information. On the other hand, I do not see how training the modality-agnostic decoders on responses to both modalities necessitates that they learn modality-invariant representations beyond those that are learned by the modality-specific decoders.

    2. Author response:

      The following is the authors’ response to the original reviews

      We would like to thank all reviewers for their constructive and in-depth reviews. Thanks to your feedback, we realized that the main objective of the paper was not presented clearly enough, and that our use of the same “modality-agnostic” terminology for both decoders and representations caused confusion. We addressed these two major points as outlined in the following. 

      In the revised manuscript, we highlight that the main contribution of this paper is to introduce modality-agnostic decoders. Apart from introducing this new decoder type, we put forward their advantages in comparison to modality-specific decoders in terms of decoding performance and analyze the modality-invariant representations (cf. updated terminology in the following paragraph) that these decoders rely on. The dataset that these analyses are based on is released as part of this paper, in the spirit of open science (but this dataset is only a secondary contribution for our paper). 

      Regarding the terminology, we clearly define modality-agnostic decoders as decoders that are trained on brain imaging data from subjects exposed to stimuli in multiple modalities. The decoder is not given any information on which modality a stimulus was presented in, and is therefore trained to operate in a modality-agnostic way. In contrast, modality-specific decoders are trained only on data from a single stimulus modality. These terms are explained in Figure 2. While these terms describe different ways of how decoders can be trained, there are also different ways to evaluate them afterwards (see also Figure 3); but obviously, this test-time evaluation does not change the nature of the decoder, i.e., there is no contradiction in applying a modality-specific decoder to brain data from a different modality.

      Further, we identify representations that are relevant for modality-agnostic decoders using the searchlight analysis. We realized that our choice of using the same “modality-agnostic” term to describe these brain representations created unnecessary debate and confusion. In order to not conflate the terminology, in the updated manuscript we call these representations modality-invariant (and the opposite modality-dependent). Our methodology does not allow us to distinguish whether certain representations merely share representational structure to a certain degree, or are truly representations that abstract away from any modality-dependent information. However, in order to be useful for modality-agnostic decoding, a significant degree of shared representational structure is sufficient, and it is this property of brain representations that we now define as “modality-invariant”. 

      We updated the manuscript in line with this new terminology and focus: in particular, the first Related Work section on Modality-invariant brain representations, as well as the Introduction and Discussion.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors introduce a densely-sampled dataset where 6 participants viewed images and sentence descriptions derived from the MS Coco database over the course of 10 scanning sessions. The authors further showcase how image and sentence decoders can be used to predict which images or descriptions were seen, using pairwise decoding across a set of 120 test images. The authors find decodable information widely distributed across the brain, with a left-lateralized focus. The results further showed that modality-agnostic models generally outperformed modality-specific models, and that data based on captions was not explained better by caption-based models but by modality-agnostic models. Finally, the authors decoded imagined scenes.

      Strengths:

      (1) The dataset presents a potentially very valuable resource for investigating visual and semantic representations and their interplay.

      (2) The introduction and discussion are very well written in the context of trying to understand the nature of multimodal representations and present a comprehensive and very useful review of the current literature on the topic.

      Weaknesses:

      (1) The paper is framed as presenting a dataset, yet most of it revolves around the presentation of findings in relation to what the authors call modality-agnostic representations, and in part around mental imagery. This makes it very difficult to assess the manuscript, whether the authors have achieved their aims, and whether the results support the conclusions.

      Thanks for this insightful remark. The dataset release is only a secondary contribution of our study; this was not clear enough in the previous version. We updated the manuscript to make the main objective of the paper more clear, as outlined in our general response to the reviews (see above).

      (2) While the authors have presented a potential use case for such a dataset, there is currently far too little detail regarding data quality metrics expected from the introduction of similar datasets, including the absence of head-motion estimates, quality of intersession alignment, or noise ceilings of all individuals.

      As already mentioned in the general response, the main focus of the paper is to introduce modality-agnostic decoders. The dataset is released in addition, this is why we did not focus on reporting extensive quality metrics in the original manuscript. To respond to your request, we updated the appendix of the manuscript to include a range of data quality metrics. 

      The updated appendix includes head motion estimates in the form of realignment parameters and framewise displacement, as well as a metric to assess the quality of intersession alignment. More detailed descriptions can be found in Appendix 1 of the updated manuscript.

      Estimating noise ceilings based on repeated presentations of stimuli (as for example done in Allen et al. (2022)) requires multiple betas for each stimulus. All training stimuli were only presented once, so this could only be done for the test stimuli which were presented repeatedly. However, during our preprocessing procedure we directly calculated stimulus-specific betas based on data from all sessions using one single GLM, which means that we did not obtain separate betas for repeated presentations of the same stimulus. We will however share the raw data publicly, so that such noise ceilings can be calculated using an adapted preprocessing procedure if required.

      Allen, E. J., St-Yves, G., Wu, Y., Breedlove, J. L., Prince, J. S., Dowdle, L. T., Nau, M., Caron, B., Pestilli, F., Charest, I., Hutchinson, J. B., Naselaris, T., & Kay, K. (2022). A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1), 116–126. https://doi.org/10.1038/s41593-021-00962-x

      (3) The exact methods and statistical analyses used are still opaque, making it hard for a reader to understand how the authors achieved their results. More detail in the manuscript would be helpful, specifically regarding the exact statistical procedures, what tests were performed across, or how data were pooled across participants.

      In the updated manuscript, we improved the level of detail for the descriptions of statistical analyses wherever possible (see also our response to your “Recommendations for the authors”, Point 6).

      Regarding data pooling across participants: 

      Figure 8 shows averaged results across all subjects (as indicated in the caption)

      Regarding data pooling for the estimation of the significance threshold of the searchlight analysis for modality-invariant regions: We updated the manuscript to clarify that we performed a permutation test, combined with a bootstrapping procedure to estimate a group-level null distribution: “For each subject, we evaluated the decoders 100 times with shuffled labels to create per-subject chance-level results. Then, we randomly selected one of the 100 chance-level results for each of the 6 subjects and calculated group-level statistics (TFCE values) the exact same way as described in the preceding paragraph. We repeated this procedure 10,000 times resulting in 10,000 permuted group-level results.”

      Additionally, we indicated that the same permutation testing methods were applied to assess the significance threshold for the imagery decoding searchlight maps (Figure 10). 

      (4) Many findings (e.g., Figure 6) are still qualitative but could be supported by quantitative measures.

      The Figures 6 and 7 are intentionally qualitative results to support the quantitative decoding results presented in Figures 4 and 5. (see also Reviewer 2 Comment 2)

      Figures 4 and 5 show pairwise decoding accuracy as a quantitative measure for evaluation of the decoders. This metric is the main metric we used to compare different decoder types and features. Based on the finding that modality-agnostic decoders using imagebind features achieve the best score on this metric, we performed the additional qualitative analysis presented in Figures 6 and 7. (Note that we expanded the candidate set for the qualitative analysis in order to have a larger and more diverse set of images.)

      (5) Results are significant in regions that typically lack responses to visual stimuli, indicating potential bias in the classifier. This is relevant for the interpretation of the findings. A classification approach less sensitive to outliers (e.g., 70-way classification) could avoid this issue. Given the extreme collinearity of the experimental design, regressors in close temporal proximity will be highly similar, which could lead to leakage effects.

      It is true that our searchlight analysis revealed significant activity in regions outside of the visual cortex. However, it is assumed that the processing of visual information does not stop at the border of the visual cortex. The integration of information such as the semantics of the image is progressively processed in other higher-level regions of the brain. Recent studies have shown that activity in large areas of the cortex (including many outside of the visual cortex) can be related to visual stimulation (Solomon et al. 2024; Raugel et al. 2025). Our work confirms this finding and we therefore do not see reason to believe that this is due to a bias in our decoders.

      Further, you are suggesting that we could replace our regression approach with a 70-way classification. However, this is difficult using our fMRI data as we do not see a straightforward way to assign the training and testing stimuli with class labels (the two datasets consist of non-overlapping sets of naturalistic images).

      To address your concerns regarding the collinearity of the experimental design and possible leakage effects, we trained and evaluated a decoder for one subject after running a “null-hypothesis” adapted preprocessing. More specifically, for all sessions, we shifted the functional data of all runs by one run (moving the data of the last run to the very front), but leaving the design matrices in place. Thereby, we destroyed the relationship of stimuli and brain activity but kept the original data and design with its collinearity (and possible biases). We preprocessed this adapted data for subject 1, and ran a whole-brain decoding using Imagebind features and verified that the decoding performance was at chance level:  Pairwise accuracy (captions): 0.43 | Pairwise accuracy (images): 0.47 | Pairwise accuracy (imagery): 0.50. This result provides evidence against the notion that potential collinearity or biases in our experimental design or evaluation procedure could have led to inflated results.

      Raugel, J., Szafraniec, M., Vo, H.V., Couprie, C., Labatut, P., Bojanowski, P., Wyart, V. and King, J.R. (2025). Disentangling the Factors of Convergence between Brains and Computer Vision Models. arXiv preprint arXiv:2508.18226.

      Solomon, S. H., Kay, K., & Schapiro, A. C. (2024). Semantic plasticity across timescales in the human brain. bioRxiv, 2024-02.

      (6) The manuscript currently lacks a limitations section, specifically regarding the design of the experiment. This involves the use of the overly homogenous dataset Coco, which invites overfitting, the mixing of sentence descriptions and visual images, which invites imagery of previously seen content, and the use of a 1-back task, which can lead to carry-over effects to the subsequent trial.

      Regarding the dataset CoCo: We agree that CoCo is somewhat homogenous, it is however much more diverse and naturalistic than the smaller datasets used in previous fMRI experiments with multimodal stimuli. Additionally, CoCo has been widely adopted as a benchmark dataset in the Machine Learning community, and features rich annotations for each image (e.g. object labels, segmentations, additional captions, people’s keypoints) facilitating many more future analyses based on our data.

      Regarding the mixing of sentence descriptions and images: Subjects were not asked to visualize sentences and different techniques for the one-back tasks might have been used. Generally, we do not see it as problematic if subjects are performing visual imagery to some degree while reading sentences, and this might even be the case during normal reading as well. A more targeted experiment comparing reading with and without interleaved visual stimulation in the form of images and a one-back task would be required to assess this, but this was not the focus of our study. For now, it is true that we can not be sure that our results generalize to cases in which subjects are just reading and are less incentivized to perform mental imagery.

      Regarding the use of a 1-back task: It was necessary to make some design choices in order to realize this large-scale data collection with approximately 10 hours of recording per subject. Specifically, the 1-back task was included in the experimental setup in order to assure continuous engagement of the participant during the rather long sessions of 1 hour. The subjects did indeed need to remember the previous stimulus to succeed at the 1-back task, which means that some brain activity during the presentation of a stimulus is likely to be related to the previous stimulus. We aimed to account for this confound during the preprocessing stage when fitting the GLM, which was fit to capture only the response to the presented image/caption, not the preceding one. Still, it might have picked up on some of the activity from preceding stimuli, causing some decrease of the final decoding performance.

      We added a limitations section to the updated manuscript to discuss these important issues.

      (7) I would urge the authors to clarify whether the primary aim is the introduction of a dataset and showing the use of it, or whether it is the set of results presented. This includes the title of this manuscript. While the decoding approach is very interesting and potentially very valuable, I believe that the results in the current form are rather descriptive, and I'm wondering what specifically they add beyond what is known from other related work. This includes imagery-related results. This is completely fine! It just highlights that a stronger framing as a dataset is probably advantageous for improving the significance of this work.

      Thanks a lot for pointing this out. Based on this comment and feedback from the other reviewers we restructured the abstract, introduction and discussion section of the paper to better reflect the primary aim. (cf. general response above).

      You further mention that it is not clear what our results add beyond what is known from related work. We list the main contributions here:

      A single modality-agnostic decoder can decode the semantics of visual and linguistic stimuli irrespective of the presentation modality with a performance that is not lagging behind modality-specific decoders.

      Modality-agnostic decoders outperform modality-specific decoders for decoding captions and mental imagery.

      Modality-invariant representations are widespread across the cortex (a range of previous work has suggested they were much more localized (Bright et al. 2004; Jung et al. 2018; Man et al. 2012; Simanova et al. 2014).

      Regions that are useful for imagery are largely overlapping with modality-invariant regions

      Bright, P., Moss, H., & Tyler, L. K. (2004). Unitary vs multiple semantics: PET studies of word and picture processing. Brain and language, 89(3), 417-432.

      Jung, Y., Larsen, B., & Walther, D. B. (2018). Modality-Independent Coding of Scene Categories in Prefrontal Cortex. Journal of Neuroscience, 38(26), 5969–5981.

      Liuzzi, A. G., Bruffaerts, R., Peeters, R., Adamczuk, K., Keuleers, E., De Deyne, S., Storms, G., Dupont, P., & Vandenberghe, R. (2017). Cross-modal representation of spoken and written word meaning in left pars triangularis. NeuroImage, 150, 292–307. https://doi.org/10.1016/j.neuroimage.2017.02.032

      Man, K., Kaplan, J. T., Damasio, A., & Meyer, K. (2012). Sight and Sound Converge to Form Modality-Invariant Representations in Temporoparietal Cortex. Journal of Neuroscience, 32(47), 16629–16636.

      Simanova, I., Hagoort, P., Oostenveld, R., & van Gerven, M. A. J. (2014). Modality-Independent Decoding of Semantic Information from the Human Brain. Cerebral Cortex, 24(2), 426–434.

      Reviewer #2 (Public review):

      Summary:

      This study introduces SemReps-8K, a large multimodal fMRI dataset collected while subjects viewed natural images and matched captions, and performed mental imagery based on textual cues. The authors aim to train modality-agnostic decoders--models that can predict neural representations independently of the input modality - and use these models to identify brain regions containing modality-agnostic information. They find that such decoders perform comparably or better than modality-specific decoders and generalize to imagery trials.

      Strengths:

      (1) The dataset is a substantial and well-controlled contribution, with >8,000 image-caption trials per subject and careful matching of stimuli across modalities - an essential resource for testing theories of abstract and amodal representation.

      (2) The authors systematically compare unimodal, multimodal, and cross-modal decoders using a wide range of deep learning models, demonstrating thoughtful experimental design and thorough benchmarking.

      (3) Their decoding pipeline is rigorous, with informative performance metrics and whole-brain searchlight analyses, offering valuable insights into the cortical distribution of shared representations.

      (4) Extension to mental imagery decoding is a strong addition, aligning with theoretical predictions about the overlap between perception and imagery.

      Weaknesses:

      While the decoding results are robust, several critical limitations prevent the current findings from conclusively demonstrating truly modality-agnostic representations:

      (1) Shared decoding ≠ abstraction: Successful decoding across modalities does not necessarily imply abstraction or modality-agnostic coding. Participants may engage in modality-specific processes (e.g., visual imagery when reading, inner speech when viewing images) that produce overlapping neural patterns. The analyses do not clearly disambiguate shared representational structure from genuinely modality-independent representations. Furthermore, in Figure 5, the modality-agnostic encoder did not perform better than the modality-specific decoder trained on images (in decoding images), but outperformed the modality-specific decoder trained on captions (in decoding captions). This asymmetry contradicts the premise of a truly "modality-agnostic" encoder. Additionally, given the similar performance between modality-agnostic decoders based on multimodal versus unimodal features, it remains unclear why neural representations did not preferentially align with multimodal features if they were truly modality-independent.

      We agree that successful modality-agnostic and cross-modal decoding does not necessarily imply that abstract patterns were decoded. In the updated manuscript, we therefore refer to these representations as modality-invariant (see also the updated terminology explained in the general response above).

      If participants are performing mental imagery when reading, and this is allowing us to perform cross-decoding, then this means that modality-invariant representations are formed during this mental imagery process, i.e. that the representations formed during this form of mental imagery are compatible with representations during visual perception (or, in your words, produce overlapping neural patterns). While we can not know to what extent people were performing mental imagery while reading (or having inner speech while viewing images), our results demonstrate that their brain activity allows for decoding across modalities, which implies that modality-invariant representations are present.

      It is true that our current analyses can not disambiguate modality-invariant representations (or, in your words, shared representational structure) from abstract representations (in your words, genuinely modality-independent representations). As the main goal of the paper was to build modality-agnostic decoders, and these only require what we call “modality-invariant” representations (see our updated terminology in the general reviewer response above), we leave this question open for future work. We do however discuss this important limitation in the Discussion section of the updated manuscript.

      Regarding the asymmetry of decoding results when comparing modality-agnostic decoders with the two respective modality-specific decoders for captions and images: We do not believe that this asymmetry contradicts the premise of a modality-agnostic decoder. Multiple explanations for this result are possible: (1) The modality-specific decoder for images might benefit from the more readily decodable lower-level modality-dependent neural activity patterns in response to images, which are less useful for the modality-agnostic decoder because they are not useful for decoding caption trials. The modality-specific decoders for captions might not be able to pick up on low-level modality-dependent neural activity patterns as these might be less easily decodable. 

      The signal-to-noise ratio for caption trials might be lower than for image trials (cf. generally lower caption decoding performance), therefore the addition of training data (even if it is from another modality) improves the decoding performance for captions, but not for images (which might be at ceiling already).

      Regarding the similar performance between modality-agnostic decoders based on multimodal versus unimodal features: Unimodal features are based on rather high-level features of the respective modality (e.g. last-layer features of a model trained for semantic image classification), which can be already modality-invariant to some degree. Additionally, as already mentioned before, in the updated manuscript we only require representations to be modality-invariant and not necessarily abstract.

      (2) The current analysis cannot definitively conclude that the decoder itself is modality-agnostic, making "Qualitative Decoding Results" difficult to interpret in this context. This section currently provides illustrative examples, but lacks systematic quantitative analyses.

      The qualitative decoding results in Figures 6 and 7 present exemplary qualitative results for the quantitative results presented in Figures 4 and 5 (see also Reviewer 1 Comment 4).

      Figures 4 and 5 show pairwise decoding accuracy as a quantitative measure for evaluation of the decoders. This metric is the main metric we used to compare different decoder types and features. Based on the finding that modality-agnostic decoders using imagebind features achieve the best score on this metric, we performed the additional qualitative analysis presented in Figures 6 and 7. (Note that we expanded the candidate set for the qualitative analysis in order to have a larger and more diverse set of images.)

      (3) The use of mental imagery as evidence for modality-agnostic decoding is problematic.

      Imagery involves subjective, variable experiences and likely draws on semantic and perceptual networks in flexible ways. Strong decoding in imagery trials could reflect semantic overlap or task strategies rather than evidence of abstraction.

      It is true that mental imagery does not necessarily rely on modality-agnostic representations. In the updated manuscript we revised our terminology and refer to the analyzed representations as modality-invariant, which we define as “representations that significantly overlap between modalities”. 

      The manuscript presents a methodologically sophisticated and timely investigation into shared neural representations across modalities. However, the current evidence does not clearly distinguish between shared semantics, overlapping unimodal processes, and true modality-independent representations. A more cautious interpretation is warranted.

      Nonetheless, the dataset and methodological framework represent a valuable resource for the field.

      We fully agree with these observations, and updated our terminology as outlined in the general response.

      Reviewer #3 (Public review):

      Summary:

      The authors recorded brain responses while participants viewed images and captions. The images and captions were taken from the COCO dataset, so each image has a corresponding caption, and each caption has a corresponding image. This enabled the authors to extract features from either the presented stimulus or the corresponding stimulus in the other modality.

      The authors trained linear decoders to take brain responses and predict stimulus features.

      "Modality-specific" decoders were trained on brain responses to either images or captions, while "modality-agnostic" decoders were trained on brain responses to both stimulus modalities. The decoders were evaluated on brain responses while the participants viewed and imagined new stimuli, and prediction performance was quantified using pairwise accuracy. The authors reported the following results:

      (1) Decoders trained on brain responses to both images and captions can predict new brain responses to either modality.

      (2) Decoders trained on brain responses to both images and captions outperform decoders trained on brain responses to a single modality.

      (3) Many cortical regions represent the same concepts in vision and language.

      (4) Decoders trained on brain responses to both images and captions can decode brain responses to imagined scenes.

      Strengths:

      This is an interesting study that addresses important questions about modality-agnostic representations. Previous work has shown that decoders trained on brain responses to one modality can be used to decode brain responses to another modality. The authors build on these findings by collecting a new multimodal dataset and training decoders on brain responses to both modalities.

      To my knowledge, SemReps-8K is the first dataset of brain responses to vision and language where each stimulus item has a corresponding stimulus item in the other modality. This means that brain responses to a stimulus item can be modeled using visual features of the image, linguistic features of the caption, or multimodal features derived from both the image and the caption. The authors also employed a multimodal one-back matching task, which forces the participants to activate modality-agnostic representations. Overall, SemReps-8K is a valuable resource that will help researchers answer more questions about modality-agnostic representations.

      The analyses are also very comprehensive. The authors trained decoders on brain responses to images, captions, and both modalities, and they tested the decoders on brain responses to images, captions, and imagined scenes. They extracted stimulus features using a range of visual, linguistic, and multimodal models. The modeling framework appears rigorous, and the results offer new insights into the relationship between vision, language, and imagery. In particular, the authors found that decoders trained on brain responses to both images and captions were more effective at decoding brain responses to imagined scenes than decoders trained on brain responses to either modality in isolation. The authors also found that imagined scenes can be decoded from a broad network of cortical regions.

      Weaknesses:

      The characterization of "modality-agnostic" and "modality-specific" decoders seems a bit contradictory. There are three major choices when fitting a decoder: the modality of the training stimuli, the modality of the testing stimuli, and the model used to extract stimulus features. However, the authors characterize their decoders based on only the first choice-"modality-specific" decoders were trained on brain responses to either images or captions, while "modality-agnostic" decoders were trained on brain responses to both stimulus modalities. I think that this leads to some instances where the conclusions are inconsistent with the methods and results.

      In our analysis setup, a decoder is entirely determined by two factors: (1) the modality of the stimuli that the subject was exposed to, and (2) the machine learning model used to extract stimulus features.

      The modality of the testing stimuli defines whether we are evaluating the decoder in a within-modality or cross-modality setting, but is not an inherent characteristic of a trained decoder

      First, the authors suggest that "modality-specific decoders are not explicitly encouraged to pick up on modality-agnostic features during training" (line 137) while "modality-agnostic decoders may be more likely to leverage representations that are modality-agnostic" (line 140). However, whether a decoder is required to learn modality-agnostic representations depends on both the training responses and the stimulus features. Consider the case where the stimuli are represented using linguistic features of the captions. When you train a "modality-specific" decoder on image responses, the decoder is forced to rely on modality-agnostic information that is shared between the image responses and the caption features. On the other hand, when you train a "modality-agnostic" decoder on both image responses and caption responses, the decoder has access to the modality-specific information that is shared by the caption responses and the caption features, so it is not explicitly required to learn modality-agnostic features. As a result, while the authors show that "modality-agnostic" decoders outperform "modality-specific" decoders in most conditions, I am not convinced that this is because they are forced to learn more modality-agnostic features.

      It is true that for example a modality-specific decoder trained on fmri data from images with stimulus features extracted from captions might also rely on modality-invariant features. We still call this decoder modality-specific, as it has been trained to decode brain activity recorded from a specific stimulus modality. In the updated manuscript we corrected the statement that “modality-specific decoders are not explicitly encouraged to pick up on modality-invariant features during training” to include the case of decoders trained on features from the other modality which might also rely on modality-invariant features.

      It is true that a modality-agnostic decoder can also have access to modality-dependent information for captions and images. However, as it is trained jointly with both modalities and the modality-dependent features are not compatible, it is encouraged to rely on modality-invariant features. The result that modality-agnostic decoders are outperforming modality-specific decoders trained on captions for decoding captions confirms this, because if the decoder was only relying on modality-dependent features the addition of additional training data from another stimulus modality could not increase the performance. (Also, the lack of a performance drop compared to modality-specific decoders trained on images is only possible thanks to the reliance on modality-invariant features. If the decoder only relied on modality-dependent features the addition of data from another modality would equal an addition of noise to the training data which must result in a performance drop at test time.). We can not exclude the possibility that modality-agnostic decoders are also relying on modality-dependent features, but our results suggest that they are relying at least to some degree on modality-invariant features.

      Second, the authors claim that "modality-specific decoders can be applied only in the modality that they were trained on, while "modality-agnostic decoders can be applied to decode stimuli from multiple modalities, even without knowing a priori the modality the stimulus was presented in" (line 47). While "modality-agnostic" decoders do outperform "modality-specific" decoders in the cross-modality conditions, it is important to note that "modality-specific" decoders still perform better than expected by chance (figure 5). It is also important to note that knowing about the input modality still improves decoding performance even for "modality-agnostic" decoders, since it determines the optimal feature space-it is better to decode brain responses to images using decoders trained on image features, and it is better to decode brain responses to captions using decoders trained on caption features.

      Thanks for this important remark. We corrected this statement and now say that “modality-specific decoders that are trained to be applied only in the modality that they were trained on”, highlighting that their training process optimizes them for decoding in a specific modality. They can indeed be applied to the other modality at test time, this however results in a substantial performance drop.

      It is true that knowing the input modality can improve performance even for modality-agnostic decoders. This can most likely be explained by the fact that in that case the decoder can leverage both, modality-invariant and modality-dependent features. We will not further focus on this result however as the main motivation to build modality-agnostic decoders is to be able to decode stimuli without knowing the stimulus modality a priori. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I will list additional recommendations below in no specific order:

      (1) I find the term "modality agnostic" quite unusual, and I believe I haven't seen it used outside of the ML community. I would urge the authors to change the terminology to be more common, or at least very early explain why the term is much better suited than the range of existing terms. A modality agnostic representation implies that it is not committed to a specific modality, but it seems that a representation cannot be committed to something.

      In the updated manuscript we now refer to the identified brain patterns as modality-invariant, which has previously been used in the literature (Man et al. 2012; Devereux et al. 2013; Patterson et al. 2016; Deniz et al. 2019, Nakai et al. 2021) (see also the general response on top and the Introduction and Related Work sections of the updated manuscript).

      We continue to refer to the decoders as modality-agnostic, as this is a new type of decoder, and describes the fact that they are trained in a way that abstracts away from the modality of the stimuli. We chose this term as we are not aware of any work in which brain decoders were trained jointly on multiple stimulus modalities and in order not to risk contradictions/confusions with other definitions.

      Deniz, F., Nunez-Elizalde, A. O., Huth, A. G., & Gallant, J. L. (2019). The Representation of Semantic Information Across Human Cerebral Cortex During Listening Versus Reading Is Invariant to Stimulus Modality. Journal of Neuroscience, 39(39), 7722–7736. https://doi.org/10.1523/JNEUROSCI.0675-19.2019

      Devereux, B. J., Clarke, A., Marouchos, A., & Tyler, L. K. (2013). Representational Similarity Analysis Reveals Commonalities and Differences in the Semantic Processing of Words and Objects. The Journal of Neuroscience, 33(48).

      Nakai, T., Yamaguchi, H. Q., & Nishimoto, S. (2021). Convergence of Modality Invariance and Attention Selectivity in the Cortical Semantic Circuit. Cerebral Cortex, 31(10), 4825–4839. https://doi.org/10.1093/cercor/bhab125

      Man, K., Kaplan, J. T., Damasio, A., & Meyer, K. (2012). Sight and Sound Converge to Form Modality-Invariant Representations in Temporoparietal Cortex. Journal of Neuroscience, 32(47), 16629–16636.

      Patterson, K., & Lambon Ralph, M. A. (2016). The Hub-and-Spoke Hypothesis of Semantic Memory. In Neurobiology of Language (pp. 765–775). Elsevier. https://doi.org/10.1016/B978-0-12-407794-2.00061-4

      (2) The table in Figure 1B would benefit from also highlighting the number of stimuli that have overlapping captions and images.

      The number of overlapping stimuli is rather small (153-211 stimuli depending on the subject). We added this information to Table 1B. 

      (3) The authors wrote that training stimuli were presented only once, yet they used a one-back task. Did the authors also exclude the first presentation of these stimuli?

      Thanks for pointing this out. It is indeed true that some training stimuli were presented more than once, but only for the case of one-back target trials. In these cases the second presentation of the stimulus was excluded, but not the first. As the subject can not be aware of the fact that the upcoming presentation is going to be a one-back target, the first presentation can not be affected by the presence of the subsequent repeated presentation. We updated the manuscript to clarify this issue.

      (4) Coco has roughly 80-90 categories, so many image captions will be extremely similar (e.g., "a giraffe walking", "a surfer on a wave", etc.). How can people keep these apart?

      It is true that some captions and images are highly similar even though they are not matching in the dataset. This might result in several false button presses because the subjects identified an image-caption pair as matching when in fact it wasn't intended to. However, as there was no feedback given on the task performance, this issue should not have had a major influence on the brain activity of the participants.

      (5) Footnotes for statistics are quite unusual - could the authors integrate statistics into the text?

      Thanks for this remark, in the updated manuscript all statistics are part of the main text.

      (6) It may be difficult to achieve the assumptions of a permutation test - exchangeability, which may bias statistical results. It is not uncommon for densely sampled datasets to use bootstrap sampling on the predictions of the test data to identify if a given percentile of that distribution crosses 0. The lowest p-value is given by the number of bootstrap samples (e.g., if all 10,000 bootstrap samples are above chance, then p < 0.0001). This may turn out to be more effective.

      Thanks for this comment. Our statistical procedure was in fact involving a bootstrapping procedure to generate a null distribution on the group-level. We updated the manuscript to describe this method in more detail. Here is the updated paragraph: “To estimate the statistical significance of the resulting clusters we performed a permutation test, combined with a bootstrapping procedure to estimate a group-level null distribution see also Stelzer et al., 2013). For each subject, we evaluated the decoders 100 times with shuffled labels to create per-subject chance-level results. Then, we randomly selected one of the 100 chance-level results for each of the 6 subjects and calculated group-level statistics (TFCE values) the exact same way as described in the preceding paragraph. We repeated this procedure 10,000 times resulting in 10,000 permuted group-level results. We ensured that every permutation was unique, i.e. no two permutations were based on the same combination of selected chance-level results. Based on this null distribution, we calculated p-values for each vertex by calculating the proportion of sampled permutations where the TFCE value was greater than the observed TFCE value. To control for multiple comparisons across space, we always considered the maximum TFCE score across vertices for each group-level permutation (Smith and Nichols, 2009).”

      (7) The authors present no statistical evidence for some of their claims (e.g., lines 335-337). It would be good if they could complement this in their description. Further, the visualization in Figure 4 is rather opaque. It would help if the authors could add a separate bar for the average modality-specific and modality-agnostic decoders or present results in a scatter plot, showing modality-specific on the x-axis and modality-agnostic on the y-axis and color-code the modality (i.e., making it two scatter colors, one for images, one for captions). All points will end up above the diagonal.

      We updated the manuscript and added statistical evidence for the claims made:

      We now report results for the claim that when considering the average decoding performance for images and captions, modality-agnostic decoders perform better than modality-specific decoders, irrespective of the features that the decoders were trained on.

      Additionally, we report the average modality-agnostic and modality-specific decoding accuracies corresponding to Figure 4. For modality-agnostic decoders the average value is 81.86\%, for modality-specific decoders trained on images 78.15\%, and for modality-specific decoders trained on captions 72.52\%. We did not add a separate bar to Figure 4 as this would add additional information to a Figure which is already very dense in its information content (cf. Reviewers 2’s recommendations for the authors). We therefore believe it is more useful to report the average values in the text and provide results for a statistical test comparing the decoder types. A scatter plot would make it difficult to include detailed information on the features, which we believe is crucial.

      We further provide statistical evidence for the observation regarding the directionality of cross-modal decoding.

      Reviewer #2 (Recommendations for the authors):

      For achieving more evidence to support modality-agnostic representations in the brain, I suggest more thorough analyses, for example:

      (1) Traditional searchlight RSA using different deep learning models. Through this approach, it might identify different brain areas that are sensitive to different formats of information (visual, text, multimodal); subsequently, compare the decoding performance using these ROIs.

      (2) Build more dissociable decoders for information of different modality formats, if possible. While I do not have a concrete proposal, more targeted decoder designs might better dissociate representational formats (i.e., unimodal vs. modality-agnostic).

      (3) A more detailed exploration of the "qualitative decoding results"--for example, quantitatively examining error types produced by modality-agnostic versus modality-specific decoders--would be informative for clarifying what specific content the decoder captures, potentially providing stronger evidence for modality-agnostic representations.

      Thanks for these suggestions. As the main goal of the paper is to introduce modality-agnostic decoders (which should be more clear from the updated manuscript, see also the general response to reviews), we did not include alternative methods for identifying modality-invariant regions. Nonetheless, we agree that in order to obtain more in-depth insight into the nature of representations that were recorded, performing analyses with additional methods such as RSA, comparisons with more targeted decoder designs in terms of their target features will be indispensable, as well as more in-depth error type analyses. We leave these analyses as promising directions for future work.

      The writing could be further improved in the introduction and, accordingly, the discussion. The authors listed a series of theories about conceptual representations; however, they did not systematically explain the relationships and controversies between them, and it seems that they did not aim to address the issues raised by these theories anyway. Thus, the extraction of core ideas is suggested. The difference between "modality-agnostic" and terms like "modality-independent," "modality-invariant," "abstract," "amodal," or "supramodal," and the necessity for a novel term should be articulated.

      The updated manuscript includes an improved introduction and discussion section that highlight the main focus and contributions of the study.

      We believe that a systematic comparison of theories on conceptual representations involving their relationships and controversies would require a dedicated review paper. Here, we focused on the aspects that are relevant for the study at hand (modality-invariant representations), for which we find that none of the considered theories can be rejected based on our results.

      Regarding the terminology (modality-agnostic vs. modality-invariant, ..) please refer to the general response.

      The figures also have room to improve. For example, Figures 4 and 5 present dense bar plots comparing multiple decoding settings (e.g., modality-specific vs. modality-agnostic decoders, feature space, within-modal vs. cross-modal, etc.); while comprehensive, they would benefit from clearer labels or separated subplots to aid interpretation. All figures are recommended to be optimized for greater clarity and directness in future revisions.

      Thanks for this remark. We agree that the figures are quite dense in information. However, splitting them up into subplots (e.g. separate subplots for different decoder types) would make it much less straightforward to compare the accuracy scores between conditions. As the main goal of these figures is to compare features and decoder types, we believe that it is useful to keep all information in the same plot. 

      You are also suggesting to improve the clarity of the labels. It is true that the top left legend of Figures 4 and 5 was mixing information about decoder type and broad classes of features  (vision/language/multimodal). To improve clarity, we updated the figures and clearly separated information on decoder type (the hue of different bars) and features (x-axis labels).  The broad classes of features (vision/language/multimodal) are distinguished by alternating light gray background colors and additional labels at the very bottom of the plots.

      The new plots allow for easy performance comparison of the different decoder types and additionally provide information on confidence intervals for the performance of modality-specific decoders, which was not available in the previous figures.

      Reviewer #3 (Recommendations for the authors):

      (1) As discussed in the Public Review, I think the paper would greatly benefit from clearer terminology. Instead of describing the decoders as "modality-agnostic" and "modality-specific", perhaps the authors could describe the decoding conditions based on the train and test modalities (e.g., "image-to-image", "caption-to-image", "multimodal-to-image") or using the terminology from Figure 3 (e.g., "within-modality", "cross-modality", "modality-agnostic").

      We updated our terminology to be clearer and more accurate, as outlined in the general response. The terms modality-agnostic and modality-specific refer to the training conditions, and the test conditions are described in Figure 3 and are used throughout the paper.

      (2) Line 244: I think the multimodal one-back task is an important aspect of the dataset that is worth highlighting. It seems to be a relatively novel paradigm, and it might help ensure that the participants are activating modality-agnostic representations.

      It is true that the multimodal one-back task could play an important role for the activation of modality-invariant representations. Future work could investigate to what degree the presence of widespread modality-invariant representations is dependent on such a paradigm.

      (3) Line 253: Could the authors elaborate on why they chose a random set of training stimuli for each participant? Is it to make the searchlight analyses more robust?

      A random set of training stimuli was chosen in order to maximize the diversity of the training sets, i.e. to avoid bias based on a specific subsample of the CoCo dataset. Between-subject comparisons can still be made based on the test set which was shared for all subjects, with the limitation that performance differences due to individual differences or to the different training sets can not be disentangled. However, the main goal of the data collection was not to make between-subject comparisons based on common training sets, but rather to make group-level analyses based on a large and maximally diverse dataset. 

      (4) Figure 4: Could the authors comment more on the patterns of decoding performance in Figure 5? For instance, it is interesting that ResNet is a better target than ViT, and BERT-base is a better target than BERT-large.

      A multitude of factors influence the decoding performance, such as features dimensionality, model architecture, training data, and training objective(s) (Conwell et al. 2023; Raugel et al. 2025). Bert-base might be better than bert-large because the extracted features are of lower dimension. Resnet might be better than ViT because of its architecture (CNN vs. Transformer). To dive deeper into these differences further controlled analysis would be necessary, but this is not the focus of this paper. The main objective of the feature comparison was to provide a broad overview over visual/linguistic/multimodal feature spaces and to identify the most suitable features for modality-agnostic decoding.

      Conwell, C., Prince, J. S., Kay, K. N., Alvarez, G. A., & Konkle, T. (2023). What can 1.8 billion regressions tell us about the pressures shaping high-level visual representation in brains and machines? (p. 2022.03.28.485868). bioRxiv. https://doi.org/10.1101/2022.03.28.485868

      Raugel, J., Szafraniec, M., Vo, H.V., Couprie, C., Labatut, P., Bojanowski, P., Wyart, V. and King, J.R. (2025). Disentangling the Factors of Convergence between Brains and Computer Vision Models. arXiv preprint arXiv:2508.18226.

      (5) Figure 7: It is interesting that the modality-agnostic decoder predictions mostly appear traffic-related. Is there a possibility that the model always produces traffic-related predictions, making it trivially correct for the presented stimuli that are actually traffic-related? It could be helpful to include some examples where the decoder produces other types of predictions to dispel this concern.

      The presented qualitative examples were randomly selected. To make sure that the decoder is not always predicting traffic-related content, we included 5 additional randomly selected examples in Figures 6 and 7 of the updated manuscript. In only one of the 5 new examples the decoder was predicting traffic-related content, and in this case the stimulus had actually been traffic-related (a bus).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03174

      Corresponding author(s): Cristina, Tocchini and Susan, Mango

      1. General Statements

      We thank the reviewers for their thoughtful and constructive comments. We were pleased that the reviewers found our study “rigorous”, “well presented”, “technically strong”, and “novel”. We are also grateful for their recognition that our work identifies a function for a HOT region in gene regulation and provides new insights into the role of the uHOT in controlling dlg-1 expression.

      Point-by-point description of the revisions

      We have addressed the reviewers’ concerns by clarifying and refining the text, particularly regarding the intron 1 results, improving the quantitation and statistical analyses, and making adjustments and additions to text and figures.

      Specific responses to each point are provided below in blue.

      Reviewer #1

        • The results fully support the authors conclusions regarding the significant role of the upstream HOT region ("uHOT") with strong fluorescence activity and substantial phenotypic effects (i.e., the animals have very low brood sizes and rarely progress through hatching). This data is well presented and technically well done.* Thank you.
      1. In my view, their conclusions regarding the intronic HOT region are speculative and unconvincing. See below for main criticisms.*

      We agree, and have made changes throughout the manuscript to make this point clearer. Specifically, we contextualize the role of intron 1 as a putative enhancer in reporter assays, but not in endogenous, physiological conditions. Some examples are:

      Abstract: “(…) In contrast, the intronic region displays weak enhancer-like activity when tested in transcriptional reporter assays but is dispensable in transcriptional control when studied at the endogenous locus. Our findings reveal how HOT regions contribute to gene regulation during animal development and illustrate how regulatory potential identified in isolated contexts can be selectively deployed or buffered within the native genomic architecture.”

      Background: “(…) The HOT region in the first intron possesses weak transcriptional capabilities that are restricted to epidermal cells as observed in transcriptional reporters, but seem to not be employed in physiological contexts.” As it will become clear reading this updated version of the manuscript, we cannot exclude at present a functional role during non-physiological conditions (e.g., stress)

      Results and discussion: “(…) This is in contrast with what the reporter experiments showed, where intron 1 alone was permissive for transcription and slightly enhanced the FL transgene expression levels (Figure 1F,G and S4). (…)”

      Other changes can be found highlighted in yellow in the manuscript.

      • Furthermore, their conclusions about interactions between the two tested regions is speculative and they show no strong evidence for this claim.*

      We thank the reviewer for raising this concern. To avoid overstating our conclusions, we now frame the potential interaction between the two studied HOT regions strictly in the context of previously published ARC-C data (Huang et al., 2022). We clarify in the revised text that these interactions have been observed in earlier work during larval stages (Huang et al., 2022), but remain to be validated during embryogenesis, and we present them solely as contextual information rather than as a central conclusion.

      In Results and discussion section we wrote: “(…) Although the presence of a fountain at this locus remains to be confirmed during embryogenesis, Accessible Region Conformation Capture (ARC-C), a method that maps chromatin contacts anchored at accessible regulatory elements, showed that the putative HOT region interacts with other DNA sequences, including the first intron of dlg-1 (1). (…)”

      * The authors claim that not all the phenotypic effects seen from deleting the uHOT region are specific to the dlg-1 gene. This is an interesting model, but the authors show essentially no data to support this or any explanation of what other gene might be regulated.*

      We appreciate the reviewer’s comment and have revised the manuscript to ensure that the possibility of additional regulatory effects from the uHOT region is presented as a hypothesis rather than a claim. Our study was designed to investigate HOT-region–based transcriptional regulation rather than chromatin interactions, and we now make this scope more explicit in the text. The revised discussion highlights that, although ARC-C data suggest the uHOT region may contact other loci, the idea that these interactions contribute to the observed phenotypes remains speculative and will require dedicated future work.

      In Results and discussion section we wrote: “(…) Because, as previously shown, the upstream HOT region exhibits chromatin interactions with other genomic loci (1), its depletion might affect gene expression of beyond dlg-1 alone. An intriguing hypothesis is that these phenotypes do not arise only from the reduction in dlg-1 mRNA and DLG-1 protein levels, but also from synergistic, partial loss-of-function phenotypes involving other genes (24). (…)”

      * Finally, some of the hypotheses in the text could be more accurately framed by the authors. They claim HOT regions are often considered non-functional (lines 189-191). Also, they claim that correct expression levels and patterning is usually regulation by elements within a few hundred basepairs of the CDS (lines 78-80). These claims are not generally accepted in the field, despite a relatively compact genome. Notably, both claims were tested and disproven by Chen et al (2014), Genome Research, where the authors specifically showed strong transcriptional activity from 10 out of 10 HOT regions located up to 4.7 kb upstream of their nearest gene. Chen et al. 2014 is cited by Tocchini et al. and it is, therefore, surprisingly inconsistent with the claims in this manuscript.*

      We thank the reviewer for this comment and have revised the text to clarify our intended meaning and avoid framing discussion points as absolute claims. We changed “often” to “frequently” in both sentences so that they better reflect general trends rather than universal rules.

      The revised text now reads: “Controversially, C. elegans sequences that dictate correct expression levels and patterning are frequently located within a few hundred base-pairs (bp) (maximum around 1,000–1,500 bp) from a gene’s CDS (3,13–15),”;

      And: “HOT regions in C. elegans, as well as other systems, have been predominantly associated with promoters and were frequently considered non-functional or simply reflective of accessible chromatin (25).”

      Regarding the comparison to Chen et al., 2014, we note that their reporters did not include a reference baseline for “strong” transcriptional activity, and only five of the ten tested HOT regions were located more than 1.5 kb from the nearest TSS. Therefore, our phrasing is consistent with their findings while describing general trends observed in the C. elegans genome rather than absolute rules. We have also ensured that these sentences are presented as discussion points rather than definitive claims. We hope these revisions make the framing and context clearer to the reader. The fluorescence expression from the intronic HOT region is not visible by eye and the quantification shows very little expression, suggestive of background fluorescence. Although the authors show statistical significance in Figure 1G, I would argue this is possibly based on inappropriate comparisons and/or a wrong choice statistical test. The fluorescence levels should be compared to a non-transgenic animal and/or to a transgenic animal with the tested region shuffled but in an equivalent

      We understand the reviewer’s concern regarding the low fluorescence levels observed for the intronic HOT reporter. To address this, we have now included a Figure S4 with higher-exposure versions of the embryos shown in Figure 1. These panels confirm that the nuclear signal is genuine: embryos without a functional transcriptional transgene do not display any comparable fluorescence, aside from the characteristic cytoplasmic granules associated with embryonic autofluorescence. Similar reference images have also been added to Figure S3 to clarify the appearance of autofluorescence under the same imaging conditions.

      Regarding the quantitation analyses, as suggested by the reviewers, we now consistently quantify fluorescence by calculating the mean intensity for each embryo (biological replicates) and performing statistical analyses on these values. This approach ensures that the statistical tests are applied to independent biological measurements.

      * I would suggest the authors remove their claims about the intronic enhancer and the interaction between the two regions. And I would suggest softening the claims about the uHOT regulation of another putatitive gene.*

      We have revised the manuscript to avoid definitive claims regarding the presence of an interaction between the two studied HOT regions. These points are now presented strictly as hypotheses within the discussion, suggested by previously published ARC-C data rather than by our own experimental evidence. Likewise, we have softened our statements regarding the possibility that the uHOT region may regulate additional gene(s). This idea is now framed as a speculative model that will require dedicated future studies, rather than as a conclusion of the present work. Quotes can be found in the previous points (#3 and #4) raised by Reviewer 1.

      * The authors would need to demonstrate several things to support their current claims. The major experiments necessary are:*

        • Insert single-copy transgene with a minimal promoter and the intronic sequence scrambled to generate a proper baseline control. It is very possible that the intronic sequence does drive some expression, but the current control is not appropriate for statistical comparison (e.g., only the transgene with intron 1 contains the minimal promoter from pes-10, which may have baseline transcriptional activity even without the intron placed in front of the transgene).* We thank the reviewer for this suggestion. We agree that a scrambled-sequence control can be informative in some contexts; however, in this case we believe the existing data already address the concern. In our dataset, all uHOT reporter constructs—each containing the same minimal promoter—show consistent background levels in the absence of regulatory input, providing an internal baseline for comparison. For this reason, we consider the current controls sufficient to interpret the effects of the intronic region in reporter assays.

      In general, the minimal Δpes-10 promoter is specifically designed to have negligible basal transcriptional activity on its own, and this property has been extensively validated in previous studies (reference included in the revised manuscript).

      * It is not very clear why the authors did not test intron 1 within the H2B of the transgene and just the minimal promoter in front of the transgene, but only in the context of the full-length promoter. The authors show a minor difference in expression levels for the full-length (FL) and full-length with intron 1 (FL-INT1) but show a large statistical differnce. The authors use an inappropriate statistical test (T-test) for this experiment and treat many datapoints from the same embryo as independent, which is clearly not the case. Even minor differences in staging, transgene silencing in early development, or variability would potentially bias their data collection.*

      We thank the reviewer for this comment. Our goal was to assess the potential contribution of intron 1 in two complementary contexts: (i) on its own, upstream of a minimal promoter, to test whether it can in principle support transcription, and (ii) within the full-length promoter construct, which more closely reflects the endogenous configuration. For this reason, we did not generate an additional construct placing intron 1 within the H2B reporter driven only by the minimal promoter, as we considered this redundant with the information provided by the existing INT1 and FL-INT1 reporters.

      Regarding the statistical analysis, we agree that treating multiple measurements from the same embryo as independent is not appropriate. In the revised manuscript, we now use the mean fluorescence intensity per embryo as a single biological replicate and perform all statistical tests on these independent values. This approach avoids pseudo-replication and ensures that the analysis is robust to variability in staging or transgene behavior. The conclusions remain the same.

      * The authors claim, based on ARC-C data previously published by their lab (Huang et al. 2022) that the dlg-1 HOT region interacts with "other" genomic regions. This is potentially interesting but the evidence for this should be included in the manuscript itself, perhaps by re-analyzing data from the 2022 manuscript?*

      We thank the reviewer for this suggestion. The chromatin-interaction data referred to in the manuscript originate from the work of Huang et al., 2022, published by the Ahringer lab. As these ARC-C datasets are already publicly available and thoroughly analyzed in the original publication, we felt that reproducing them in our manuscript was not necessary for supporting the limited contextual point we make. Our intent is simply to note that previous work reported contacts between the uHOT region and additional loci. To address the reviewer’s concern, we have revised the manuscript to make clear that we are referencing previously published ARC-C observations and that we do not present these interactions as new findings from our study.

      For example, in Results and discussion section we wrote: “(…) Because, as previously shown, the upstream HOT region exhibits chromatin interactions with other genomic loci (1), its depletion might affect gene expression beyond dlg-1 alone. An intriguing hypothesis is that these phenotypes do not arise only from the reduction in dlg-1 mRNA and DLG-1 protein levels, but also from a synergistic, partial loss-of-function phenotypes involving other genes (24). (…)”

      * The fluorescence quantification is difficult to interpret from the attached data file (Table S1). For the invidividual values, it is unclear how many indpendent experiments (different embryos) were conducted. The authors should clarify if every data value is from an independent embryo or if they used several values from the same embryo. If they did use several values from the same embryo, how did they do this? Did they take very cell? Or did they focus on specific cells? How did they ensure embryo staging?*

      We thank the reviewer for pointing this out. To clarify the quantification procedure, we have expanded the description in the Methods section (“Live imaging: microscopy, quantitation, and analysis”). The revised text now specifies that each data point represents the normalized fluorescence value obtained from three nuclei (or five junctions, depending on the construct), all taken from the same anatomical positions across embryos. Two independent biological replicates were performed for each experiment, with each embryo contributing a single averaged value.

      As noted in the figure legends, the specific nuclei used for quantification are indicated in each panel (with dashed outlines), and a reference nucleus marked with an asterisk allows unambiguous identification of the same positions across all conditions. We are happy to further refine this description if additional clarification is needed.

      * The authors also do not describe how they validated single-copy insertions (partial transgene deletions in integrants are not infrequent and they only appear to use a single insertion for each strain). This should be described and or added as a caveat if no validation was performed.*

      The authors also do not describe any validation for the CRISPR alleles, either deletions or insertion of the synthetic intron into dlg-1. How were accurate gene edits verified.

      We thank the reviewer for highlighting the importance of validating the genetic constructs. We have now clarified this more explicitly in the revised Methods section and in Table S1. All single-copy transgene insertions and all CRISPR-generated alleles were verified by genotyping and Sanger sequencing to confirm correct integration and the absence of unintended rearrangements.

      • *

      I am not convinced the statistical analysis of the fluorescence data is correct. Unless the authors show that every datapoint in the fluorescence quantification is independent, then I would argue they vastly overestimate the statistical significance. Even small differences are shown to have "***" levels of significance, which does not appear empirically plausible.

      We thank the reviewer for highlighting this point. To ensure that each data point represents an independent measurement, we now calculate the mean fluorescence per embryo (from three nuclei or five junctions) and use these per-embryo means as biological replicates for statistical testing. Two independent experiments were performed for each condition. Statistical differences were evaluated using a one-tailed t-test on the per-embryo means, as indicated in the revised Methods section.

      After this adjustment, the differences remain statistically significant, although less extreme than in the initial analysis (now p * *

      This study is so closely related to the Chen et al study, that I believe this study should be discussed in more detail to put the data into context.

      We thank the reviewer for this suggestion. While we refer to Chen et al., 2014 as a relevant prior study for context, we believe that our work addresses distinct questions and experimental approaches. Specifically, our study focuses on HOT region-based transcriptional regulation in the dlg-1 locus and its functional dissection in vivo, which is conceptually and methodologically different from the scope of Chen et al., 2014 where the author tested the functionality of HOT region-containing promoters in the context of single-copy integrated transcriptional reporters. We hope this is clearer to the reader in the revised manuscript.

      * Add H2B to the mNG in Figure 1 in order to understand where the first intron was inserted.*

      We thank the reviewer for this suggestion. A schematic representation of the transgene is already provided above the corresponding images to indicate the location of the first intron.

      For additional clarity, we have now added the following sentence in the main text: “In the other, intron 1 was inserted in the FL transgene within the H2B coding sequence (at position 25 from the ATG), preserving the canonical splice junctions with AG at the end of the first exon and a G at the beginning of the second exon, so that it acted as a bona fide intron (FL-INT1) (Figure 1F).”

      This should help readers understand the placement of the intron without requiring modifications to the figure itself.__ __

      Reviewer #2

      1) The authors suggest that the region upstream of the dlg-1 gene is a HOT region. Although they highlight that other broad studies pick up this region as a HOT region, it would be good that the authors dive into the HOT identity of the region and characterize it, as it is a major part of their study. In addition to multiple TFs binding to the site, there are different criteria by which a region would be considered a HOT region. E.g. is there increased signal on this region in the IgG ChIP-seq tracks? Is the area CpG dense?

      We thank the reviewer for this suggestion. In the manuscript and Figure S1, we show several features of HOT regions, including transcription factor binding and chromatin marks. To further characterize the dlg-1 uHOT region, we have added the following sentence to the text: “The conserved region is positioned approximately four Kb from the CDS of dlg-1 in a CpG-dense sequence (2), and is overlapping and bordered by chromatin marks typically found in enhancers (5,16).”

      This addition provides additional evidence supporting the identity of the region as a HOT region, complementing the features already presented.

      * 2) When describing the HOT region, they refer to Pol II binding as 'confirming its role as a promoter': non-promoter regions can also have Pol II binding, especially enhancers. Having binding of Pol II does not confirm its role as promoter. On the contrary, seeing the K27ac and K4me1 would point towards it being an enhancer.*

      The sentence has been revised to clarify the interpretation of Pol II binding: “This HOT site also contains RNA Pol II peaks during embryogenesis (Figure S1C), supporting its role as a promoter or enhancer (9).” This wording avoids overinterpreting Pol II binding alone, while acknowledging that the HOT region may have both promoter and enhancer characteristics.

      We would like to note that the relevant chromatin marks (H3K27ac and H3K4me1), which are indicative of enhancer activity, are described in the text: “(…) Specifically, it is enriched in acetylated lysine 27 (H3K27ac) and mono- and di-methylated lysine 4 of histone H3 (H3K4me1/2), and depleted from tri-methylated lysine 4 of histone H3 (H3K4me3) (Figure S1D) (5,16). (…)”

      These changes clarify that the HOT region may have enhancer characteristics and avoid overinterpreting the Pol II signal.

      * 3) In S1B, the authors show TF binding tracks. They also have a diagram of the region subsets (HOT1-4) that were later tested. What is their criteria for dividing the HOT region into those fragments? From looking at Fig S1, the 'proper' HOT region (ie. Where protein binding occurs) seems to be divided into two (one chunk as part of HOT3 and one chunk as part of HOT4). Can the authors comment on the effects of this division?*

      To clarify the criteria for dividing the HOT region into subregions, we have added the following sentence to the main text: “The subregions were chosen taking into account (i) enrichment of putative TF binding sites (uHOT1 for PHA-4, uHOT2 for YAP-1 and NHR-25, uHOT3 for ELT-3, and uHOT4 for PHA-4 and others (e.g., ELT-1 and ELT-3)), (ii) Pol II binding peaks, and (iii) histone modification peaks (Fig. S1C,D).”

      This description explains the rationale behind the division and clarifies why the HOT region was split into these four fragments for functional testing.

      * 4) For the reporter experiments, the first experiments carry the histone H2B sequence and the second set of experiments (where the HOT region is dissected) carry a minimal promoter Δ*pes-10 (MINp). The results could be affected by the addition of these sequences. Is there a reason for this difference? Can the authors please justify it?

      The difference in reporter design reflects the distinct goals of the two sets of experiments. The H2B sequence, coupled to mNG, is used as a coding sequence throughout the first part of the study (reporter analysis). This is commonly used to (i) concentrate the fluorescence signal (mNG) into nuclei (H2B) and (ii) be able to identify specific cells more accurately for quantitation reasons (intensity and consistency). The Δpes-10 promoter is instead used to analyze whether specific sequences possess enhancer potential: this promoter alone possesses the sequences that can allow transcription only in the presence of transcription factors that bind to the studied sequence placed upstream it.

      To clarify this distinction in the manuscript, we have added the following sentence: “(…) Each region was paired with the minimal promoter Δpes-10 (MINp) (Figure 1D) and generated four transcriptional reporters. Δpes-10 is commonly used to generate transcriptional reporter aimed at assessing candidate regulatory enhancer sequences (20). The minimal promoter drives expression only when transcription factors bind to the tested upstream sequence and test enhancer activity. (…)”

      5) Regarding the H2B sequence: ' 137: first intron [...] inserted in the FL transgene within the H2B sequence, acting as an actual intron (FL-INT1)': how was the location of the insertion chosen? Does it disrupt H2B? can it be that the H2B sequence contributed to dampening down the expression of mNG and disrupting it makes it stronger? It would be important to run the first experiments with minimal promoters and not with the H2B sequence.

      The location of the intron insertion within the H2B coding sequence was chosen to preserve proper splicing and avoid disrupting H2B protein. We added the following sentence to clarify this point: “(…) In the other, the intron was inserted in the FL transgene within the H2B coding sequence (at position 25 from the ATG), preserving the canonical splice junctions with AG at the end of the first exon and a G at the beginning of the second exon, so that it acted as a bona fide intron (FL-INT1) (Figure 1F). (…)

      * 6) Have the authors explored the features of the sequences underlying the different HOT subregions? (e.g. running a motif enrichment analysis)? Is there anything special about HOT3 that could make it a functional region? It would be good to compare uHOT3 vs the others that do not drive the correct pattern. Since it's a HOT region, it may not have a special feature, but it is important to look into it.*

      We thank the reviewer for this suggestion. To clarify the rationale for dividing the HOT region into four subregions, we have added the following sentence to the main text: “(…) The subregions were chosen taking into account (i) enrichment of putative TF binding sites (uHOT1 for PHA-4, uHOT2 for YAP-1 and NHR-25, uHOT3 for ELT-3, and uHOT4 for PHA-4 and others (e.g., ELT-1 and ELT-3)), (ii) Pol II binding peaks, and (iii) histone modification peaks (Fig. S1C,D). (…)”

      While uHOT3 does not appear to possess unique sequence features beyond these general HOT-region characteristics, this approach allowed us to systematically test which fragments contribute to transcriptional activity and patterning.

      7) For comparisons, the authors run t-tests. Is the data parametric? Otherwise, it would be more suitable to use a non-parametric test.

      To ensure that each data point represents an independent biological replicate, we now calculate the mean fluorescence intensity per embryo and perform statistical tests on these per-embryo means. The data meet the assumptions of parametric tests, and we use a one-tailed t-test as indicated in the Methods.

      * 1) The authors work with C. elegans embryos at comma stage, according to the methods section. It would be good if the authors mentioned it in the main text so that the reader is informed.*

      Thanks for this suggestion. We added this sentence in the main text: “(…) Live imaging and quantitation analyses on embryos at the comma stage (used throughout the study for consistency purposes) showed (…)”.

      * 2) 'Notably, the upstream HOT region is located more than four kilo-bases (Kb) away the CDS, and the one in the first intron contains enhancer sites, too.': what do they mean by 'enhance sites, too'. Is the region known as a functional enhancer? If so, could you please provide the reference?*

      Here the clarification from the revised text: “(…) Notably, the upstream HOT region is located more than four kilo-bases (Kb) away the CDS, and the one in the first intron does not only contain two TSS but also three enhancer sites (8). (…)”

      * 3) 'We hypothesized the upstream HOT region is the main driver of dlg-1 transcriptional regulation.': this sentence needs more reasoning. What led to this hypothesis? Is it the fact of seeing multiple TFs binding there? The chromatin marks?*

      The reasoning behind the hypothesis is described in the preceding paragraph, and to make this connection clearer, we have revised the sentence to begin with: “Considering all of this information, we hypothesized the upstream HOT region is the main driver of dlg-1 transcriptional regulation. (…)”.

      This change explicitly links the hypothesis to the observed TF binding and chromatin marks described above.

      * 4) The labels of S1B are too wide, as if they have stretched the image. Could the authors please correct this?*

      Yes, we agree with Reviewer 2. We corrected this.

      * 5) This sentence does not flow with the rest of the text '84 - cohesins have been shown to organize the DNA in a way that active enhancers make contacts in the 3D space forming "fountains" detectable in Hi-C data (17,18).': is there a reason to explain this? I would remove it if not, as it can confuse the reader.*

      We thank the reviewer for this comment. We agree that the sentence could potentially interrupt the flow; however, it is important for introducing the concept of “fountains” in 3D genome organization, which is necessary to understand the subsequent statement: “(…) Although the presence of a fountain at this locus remains to be confirmed during embryogenesis, Accessible Region Conformation Capture (ARC-C), a method that maps chromatin contacts anchored at accessible regulatory elements, showed that the putative HOT region interacts with other DNA sequences, including the first intron of dlg-1 (1). (…)”.

      Therefore, we have retained this sentence to provide the necessary background for readers.

      * 6) The authors mentioned that 'ARC-C data showed the putative HOT region interacts with other DNA sequences, including the first intron of dlg': have the authors analysed the data from the previous paper? A figure with the relevant data could illustrate this interaction so that the reader knows which specific region has been shown to interact with which. This would also bring clarity as to why they chose intron1 for additional experiments.*

      We thank the reviewer for this suggestion. We have examined the relevant ARC-C data from the previous publication (Huang et al., 2022). However, as these results are already published, we do not feel it is necessary to reproduce them in our manuscript. The mentioning of these interactions is intended only to introduce the concept for discussion and to provide context for why intron 1 was considered in subsequent experiments

      * 7) 'two deletion sequences spanning from the beginning (uHOT) or the end (Short) of the HOT region until the dlg-1 CDS': From the diagrams of the figure, I understand that uHOT has the distal region deleted, and the short HOT has the distal and the upstream regions deleted. Is this correct? Could you clarify this in the text? E.g. 'we designed two reporters - one containing the sequence starting at the HOT region and ending at the dlg-1 CDS, and the other without the HOT region, but rather starting downstream of it until the dlg-1 CDS'.*

      To clarify the design of the reporters, we have revised the text as follows: “(…) To test this idea, we generated three single-copy, integrated transcriptional reporters carrying a histone H2B sequence fused to an mNeon-Green (mNG) fluorescent protein sequence under the transcriptional control of the following dlg-1 upstream regions: (i) a full-length sequence (“FL” = Distal + uHOT + Proximal sequences), (ii) one spanning from the beginning of the HOT region to the dlg-1 CDS (“uHOT” = uHOT + Proximal sequences), and (iii) one starting at the end of the HOT region and ending at the dlg-1 CDS (“Short” = Proximal sequence) (Figure 1A-C). (…)”

      This description clarifies which parts of the upstream region are included in each reporter and matches the schematics in Figure 1.

      * 8) 'Specifically, it spanned from bp 5,475,070 to 5,475,709 on chromosome X and removed HOT2 and HOT2 sequences' - this is unclear to me. What sequences are removed? HOT2 and 3?*

      Thanks for spotting this typo. It has now been corrected.

      * 9) 'ARC-C' is not introduced. Please spell out what this is. Accessible Region Conformation Capture (ARC-C). It would be helpful to include a sentence of what it is, as it will not be known by many readers.*

      You are right, we changed into: “(…) Although the presence of a fountain at this locus remains to be confirmed during embryogenesis, Accessible Region Conformation Capture (ARC-C), a method that maps chromatin contacts anchored at accessible regulatory elements, showed that the putative HOT region interacts with other DNA sequences, including the first intron of dlg-1 (1). (...)”

      * 10) Fig 1 B, diagram on the right: the H2B sequence is missing. I see that is indicated in the legend as part of mNG but this can be misleading. Could the authors add it to the diagram for clarification?*

      Yes, you are right. We added this in the figure.__ __

      Reviewer #3

      The authors' claims are generally supported by the data, thoug the last sentence of the abstract was a bit overstated. They state that they "reveal the function of HOT regions in animals development...."; it would be more accurate to state that they linked the role of an upstream HOT region to dlg-1 regulation, and their findings hint that this element could have additional regulatory functions. The authors can either temper their conclusions or try RNA-seq experiments to find additional genes that are misregulated by the delta-uHOT deletion allele. [OPTIONAL]. Another [OPTIONAL] experiment that would strengthen the claims is to perform RNAi knockdown or DLG-1 protein depletion and link that to phenotype to show that the dlg-1 mRNA and DLG-1 protein changes seen in the uHOT mutant do not explain the lethality observed.

      We thank the reviewer for this comment. We have studied HOT region function in the context of a model organism, C. elegans; therefore, we believe that describing our findings as revealing a function of HOT regions in animal development is accurate. The sentence aims at noting that these observations may provide broader insights into HOT region regulation. We changed the last sentence of the abstract into: “(…) Our findings reveal how HOT regions contribute to gene regulation during animal development and illustrate how regulatory potential identified in isolated contexts can be selectively deployed or buffered within the native genomic architecture. (…)”.

      We note that RNA-seq is beyond the scope of this study; our discussion of potential effects on other genes is intended only as a hypothesis for future work. RNAi of dlg-1 has been previously reported and is cited in the manuscript, providing context for the phenotypes observed and discussed.

      1. * When printed out I cannot read what the tracks are in Fig S1. Adding larger text to indicate what those tracks are is necessary.* Yes, you are right. We changed this in the figure.

      2. *

      3. Line 79. I would change the word "usually" to "frequently" in the discussion about regulatory element position. While promoters ranging from a few hundred to 2000 basepairs are frequently used, there are numerous examples where important enhancers can be further away.*

      Corrected.

      * Line 93-95. The description of the reporters was very confusing. When referring to the deletion sequences it sounds like that is what is missing rather than what is included. Rather, if I understand correctly the uHOT is the sequence from the start of the uHOT to the CDS and Short starts at the end of uHOT (omitting it). Adding the promoter fragments to the figure would improve clarity.*

      To clarify the design of the reporters, we have revised the text as follows: “(…) To test this idea, we generated three single-copy, integrated transcriptional reporters carrying a histone H2B sequence fused to an mNeon-Green (mNG) fluorescent protein sequence under the transcriptional control of the following dlg-1 upstream regions: (i) a full-length sequence (“FL” = Distal + uHOT + Proximal sequences), (ii) one spanning from the beginning of the HOT region to the dlg-1 CDS (“uHOT” = uHOT + Proximal sequences), and (iii) one starting at the end of the HOT region and ending at the dlg-1 CDS (“Short” = Proximal sequence) (Figure 1A-C). (…)”

      This description clarifies which parts of the upstream region are included in each reporter and matches the schematics in Figure 1.

      * Line 108. Re-work the phrase "increase majorly". Majorly increase would be better.*

      We thank the reviewer for this suggestion. The verb is used here as an infinitive (“to increase majorly”), and in standard English the infinitive is usually not split. Therefore, we have kept the phrasing as it currently appears in the manuscript.

      * Line 153-154. The deletion indicates that HOT2 and HOT2 were removed. Was one supposed to be HOT3?*

      Thanks for spotting this typo. It has now been corrected.

      * In the figure legends the number of animals scored and the number of biological repeats is missing.*

      Added.

      * Figure 1 title in the legend. Should read "main driver" not "man driver".*

      Thanks for spotting this typo. It has now been corrected.

      * The references need to be gone through carefully and cleaned up. There are numerous gene and species names that are not italicized. There are also extra elements added by the reference manager such as [Internet].*

      Thanks for pointing it out. We used Zotero and the requested formatting from the journal of our choice. We will discuss with their team how to go through this issue.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      High occupancy target (HOT) regions are genomic sequences in C. elegans that are bound by large numbers of transcription factors and emerged from systematic ChIP-seq studies. Whether they play physiologically important roles in gene regulation is not clear. in In this manuscript, Tocchini et al. examine the function of two HOT regions using a combination of promoter reporters, genome editing, and smFISH. One HOT region is upstream of the dlg-1 gene and other is in the first intron of dlg-1.

      The claims about the impact of the upstream HOT region on dlg-1 expression are convincing. Omitting the sequence in a promoter reporter reduces expression, the element is sufficient to drive expression from a MINp::mNG reporter, and deletion of the element reduces dlg-1 expression and causes developmental defects. The claims about the intronic HOT region need to be tempered slightly. The element drives weak expression in a MINp::mNG reporter but the replacement of the dlg-1 first intron with a syntron had no effect on expression, limiting the claims that be made about this regulatory element. The authors' claims are generally supported by the data, thoug the last sentence of the abstract was a bit overstated. They state that they "reveal the function of HOT regions in animals development...."; it would be more accurate to state that they linked the role of an upstream HOT region to dlg-1 regulation, and their findings hint that this element could have additional regulatory functions. The authors can either temper their conclusions or try RNA-seq experiments to find additional genes that are misregulated by the delta-uHOT deletion allele. [OPTIONAL]. Another [OPTIONAL] experiment that would strengthen the claims is to perform RNAi knockdown or DLG-1 protein depletion and link that to phenotype to show that the dlg-1 mRNA and DLG-1 protein changes seen in the uHOT mutant do not explain the lethality observed.

      There are elements of the manuscript that must be improved for clarity/accuracy.

      1. When printed out I cannot read what the tracks are in Fig S1. Adding larger text to indicate what those tracks are is necessary.
      2. Line 79. I would change the word "usually" to "frequently" in the discussion about regulatory element position. While promoters ranging from a few hundred to 2000 basepairs are frequently used, there are numerous examples where important enhancers can be further away.
      3. Line 93-95. The description of the reporters was very confusing. When referring to the deletion sequences it sounds like that is what is missing rather than what is included. Rather, if I understand correctly the uHOT is the sequence from the start of the uHOT to the CDS and Short starts at the end of uHOT (omitting it). Adding the promoter fragments to the figure would improve clarity.
      4. Line 108. Re-work the phrase "increase majorly". Majorly increase would be better.
      5. Line 153-154. The deletion indicates that HOT2 and HOT2 were removed. Was one supposed to be HOT3?
      6. In the figure legends the number of animals scored and the number of biological repeats is missing.
      7. Figure 1 title in the legend. Should read "main driver" not "man driver",
      8. The references need to be gone through carefully and cleaned up. There are numerous gene and species names that are not italicized. There are also extra elements added by the reference manager such as [Internet].

      Referee cross-commenting

      I agree with the comments from the previous reviewers. The suggested experiments are reasonable. Reviewer 1's point about the Chen et al 2014 Genome Res paper is really important. I put the revision as unknown as it depended on whether they did the optional experiments I suggested. If they revise their text, tempering claims, adjusting statistical analyses, then that could be 1-3 months. If they did the RNA-seq that I suggested, that would be a longer timeline.

      Significance

      The study is generally rigorously done. Strengths are that this work finds a function for a HOT region in gene regulation. Limitations are that the work is currently very thorough regulatory element bashing. They convincingly demonstrate the role of uHOT in regulating dlg-1 and suggest that the reduction of DLG-1 levels does not explain the phenotype. This finding is of interest to basic researchers in gene regulation. Without going into that discrepancy more, the significance is limited. Linking HOT regions to novel regulatory mechanisms controlling multiple genes would be broadly interesting to the gene regulation and developmental biology.

      I am a C. elegans molecular biologist with expertise in gene regulatory networks.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The authors investigate the functionality of a HOT region located upstream of the dlg-1 gene in Caenorhabditis elegans. This region is bound by multiple proteins and enriched for H3K27ac and H3K4me1, features characteristic of enhancers. Using reporter assays, they dissect the region and identify a sub-fragment, HOT3, as responsible for driving gene expression in epidermis, with a pattern similar to that of dlg-1 itself. Deletion of this region leads to downregulation of dlg-1 and lethality before or shortly after hatching, in contrast to complete dlg-1 knockouts, which die at mid-embryogenesis. They further examine the role of the gene's first intron, previously reported to physically interact with the HOT region. Incorporating intron 1 into the reporter construct slightly increases expression, suggesting an additive regulatory effect. However, replacing intron 1 with a synthetic sequence at the endogenous locus does not cause major changes. Overall, this study demonstrates that HOT regions can play a functional role in gene regulation, challenging the prevailing view that they are largely non-functional.

      Major comments:

      Overall, the paper lacks to explain their reasoning on choosing certain conditions and it also lacks on discussions on relevant topics, highlighted below.

      1) The authors suggest that the region upstream of the dlg-1 gene is a HOT region. Although they highlight that other broad studies pick up this region as a HOT region, it would be good that the authors dive into the HOT identity of the region and characterize it, as it is a major part of their study. In addition to multiple TFs binding to the site, there are different criteria by which a region would be considered a HOT region. E.g. is there increased signal on this region in the IgG ChIP-seq tracks? Is the area CpG dense?

      2) When describing the HOT region, they refer to Pol II binding as 'confirming its role as a promoter': non-promoter regions can also have Pol II binding, especially enhancers. Having binding of Pol II does not confirm its role as promoter. On the contrary, seeing the K27ac and K4me1 would point towards it being an enhancer.

      3) In S1B, the authors show TF binding tracks. They also have a diagram of the region subsets (HOT1-4) that were later tested. What is their criteria for dividing the HOT region into those fragments? From looking at Fig S1, the 'proper' HOT region (ie. Where protein binding occurs) seems to be divided into two (one chunk as part of HOT3 and one chunk as part of HOT4). Can the authors comment on the effects of this division?

      4) For the reporter experiments, the first experiments carry the histone H2B sequence and the second set of experiments (where the HOT region is dissected) carry a minimal promoter Δpes-10 (MINp). The results could be affected by the addition of these sequences. Is there a reason for this difference? Can the authors please justify it?

      5) Regarding the H2B sequence: ' 137: first intron [...] inserted in the FL transgene within the H2B sequence, acting as an actual intron (FL-INT1)': how was the location of the insertion chosen? Does it disrupt H2B? can it be that the H2B sequence contributed to dampening down the expression of mNG and disrupting it makes it stronger? It would be important to run the first experiments with minimal promoters and not with the H2B sequence.

      6) Have the authors explored the features of the sequences underlying the different HOT subregions? (e.g. running a motif enrichment analysis)? Is there anything special about HOT3 that could make it a functional region? It would be good to compare uHOT3 vs the others that do not drive the correct pattern. Since it's a HOT region, it may not have a special feature, but it is important to look into it.

      7) For comparisons, the authors run t-tests. Is the data parametric? Otherwise, it would be more suitable to use a non-parametric test.

      Minor comments:

      1) The authors work with C. elegans embryos at comma stage, according to the methods section. It would be good if the authors mentioned it in the main text so that the reader is informed.

      2) 'Notably, the upstream HOT region is located more than four kilo-bases (Kb) away the CDS, and the one in the first intron contains enhancer sites, too.': what do they mean by 'enhance sites, too'. Is the region known as a functional enhancer? If so, could you please provide the reference?

      3) 'We hypothesized the upstream HOT region is the main driver of dlg-1 transcriptional regulation.': this sentence needs more reasoning. What led to this hypothesis? Is it the fact of seeing multiple TFs binding there? The chromatin marks?

      4) The labels of S1B are too wide, as if they have stretched the image. Could the authors please correct this?

      5) This sentence does not flow with the rest of the text '84 - cohesins have been shown to organize the DNA in a way that active enhancers make contacts in the 3D space forming "fountains" detectable in Hi-C data (17,18).': is there a reason to explain this? I would remove it if not, as it can confuse the reader.

      6) The authors mentioned that 'ARC-C data showed the putative HOT region interacts with other DNA sequences, including the first intron of dlg': have the authors analysed the data from the previous paper? A figure with the relevant data could illustrate this interaction so that the reader knows which specific region has been shown to interact with which. This would also bring clarity as to why they chose intron1 for additional experiments.

      7) 'two deletion sequences spanning from the beginning (uHOT) or the end (Short) of the HOT region until the dlg-1 CDS': From the diagrams of the figure, I understand that uHOT has the distal region deleted, and the short HOT has the distal and the upstream regions deleted. Is this correct? Could you clarify this in the text? E.g. 'we designed two reporters - one containing the sequence starting at the HOT region and ending at the dlg-1 CDS, and the other without the HOT region, but rather starting downstream of it until the dlg-1 CDS'.

      8) 'Specifically, it spanned from bp 5,475,070 to 5,475,709 on chromosome X and removed HOT2 and HOT2 sequences' - this is unclear to me. What sequences are removed? HOT2 and 3?

      9) 'ARC-C' is not introduced. Please spell out what this is. Accessible Region Conformation Capture (ARC-C). It would be helpful to include a sentence of what it is, as it will not be known by many readers.

      10) Fig 1 B, diagram on the right: the H2B sequence is missing. I see that is indicated in the legend as part of mNG but this can be misleading. Could the authors add it to the diagram for clarification?

      Significance

      HOT regions are thought to be artifacts from ChIP-seq experiments. This study provides evidence that at least some HOT regions can have a functional role in gene regulation, emphasizing that they should not be dismissed outright.

      The findings will be of interest to researchers investigating the biological nature of HOT regions, as well as to those who have encountered HOT regions in their own sequencing datasets. In addition, researchers studying the regulation of dlg-1 in C. elegans may find this work particularly relevant. I work on gene regulation during embryonic development and my technical expertise is omics and fluorescence microscopy. Since I do not work in C. elegans, I cannot evaluate if the patterns/location of the signal is where they claim it to be, I do not know if the cells marked are epidermal cells.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      In this manuscript, Tocchini et al. characterize two enhancer regions, one distal and one intronic, of the gene dlg-1 in C. elegans. The two enhancers are termed high-occupancy target (HOT) regions as defined by their binding of most transcription factors, as identified by the modENCODE project. The authors test transcriptional activity of the two HOT regions using single-copy transgene assays and assay their functional relevance by deleting the regions using CRISPR/Cas9 genome editing. The authors observe robust transcriptional activity and functional effects of the distal regulatory element and little evidence for enhancer activity from the intronic enhancer. From these assays, the authors conclude that the distal and intronic enhancers coordinate to fine tune gene expression in a cell-type specific manner.

      Major comments:

      • Are the key conclusions convincing?

      • The results fully support the authors conclusions regarding the significant role of the upstream HOT region ("uHOT") with strong fluorescence activity and substantial phenotypic effects (i.e., the animals have very low brood sizes and rarely progress through hatching). This data is well presented and technically well done.

      • In my view, their conclusions regarding the intronic HOT region are speculative and unconvincing. See below for main criticisms.
      • Furthermore, their conclusions about interactions between the two tested regions is speculative and they show no strong evidence for this claim.
      • The authors claim that not all the phenotypic effects seen from deleting the uHOT region are specific to the dlg-1 gene. This is an interesting model, but the authors show essentially no data to support this or any explanation of what other gene might be regulated.
      • Finally, some of the hypotheses in the text could be more accurately framed by the authors. They claim HOT regions are often considered non-functional (lines 189-191). Also, they claim that correct expression levels and patterning is usually regulation by elements within a few hundred basepairs of the CDS (lines 78-80). These claims are not generally accepted in the field, despite a relatively compact genome. Notably, both claims were tested and disproven by Chen et al (2014), Genome Research, where the authors specifically showed strong transcriptional activity from 10 out of 10 HOT regions located up to 4.7 kb upstream of their nearest gene. Chen et al. 2014 is cited by Tocchini et al. and it is, therefore, surprisingly inconsistent with the claims in this manuscript.

      The fluorescence expression from the intronic HOT region is not visible by eye and the quantification shows very little expression, suggestive of background fluorescence. Although the authors show statistical significance in Figure 1G, I would argue this is possibly based on inappropriate comparisons and/or a wrong choice statistical test. The fluorescence levels should be compared to a non-transgenic animal and/or to a transgenic animal with the tested region shuffled but in an equivalent - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Yes, I would suggest the authors remove their claims about the intronic enhancer and the interaction between the two regions. And I would suggest softening the claims about the uHOT regulation of another putatitive gene. - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      Yes, the authors would need to demonstrate several things to support their current claims. The major experiments necessary are:

      1. Insert single-copy transgene with a minimal promoter and the intronic sequence scrambled to generate a proper baseline control. It is very possible that the intronic sequence does drive some expression, but the current control is not appropriate for statistical comparison (e.g., only the transgene with intron 1 contains the minimal promoter from pes-10, which may have baseline transcriptional activity even without the intron placed in front of the transgene).
      2. It is not very clear why the authors did not test intron 1 within the H2B of the transgene and just the minimal promoter in front of the transgene, but only in the context of the full-length promoter. The authors show a minor difference in expression levels for the full-length (FL) and full-length with intron 1 (FL-INT1) but show a large statistical differnce. The authors use an inappropriate statistical test (T-test) for this experiment and treat many datapoints from the same embryo as independent, which is clearly not the case. Even minor differences in staging, transgene silencing in early development, or variability would potentially bias their data collection.
      3. The authors claim, based on ARC-C data previously published by their lab (Huang et al. 2022) that the dlg-1 HOT region interacts with "other" genomic regions. This is potentially interesting but the evidence for this should be included in the manuscript itself, perhaps by re-analyzing data from the 2022 manuscript?
      4. Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      These experiments are not costly (two transgenes inserted by single-copy transgenesis) nor particularly time-consuming. With cloning, injection, and microscopy, these experiments can be conducted in 6 weeks with relatively few "hands on" hours. The cost should be very reasonably (reagents surely less than €500). - Are the data and the methods presented in such a way that they can be reproduced?

      The data are not entirely clear and could benefit from additional details. This is a partial list but shows the general concern.

      The fluorescence quantification is difficult to interpret from the attached data file (Table S1). For the invidividual values, it is unclear how many indpendent experiments (different embryos) were conducted. The authors should clarify if every data value is from an independent embryo or if they used several values from the same embryo. If they did use several values from the same embryo, how did they do this? Did they take very cell? Or did they focus on specific cells? How did they ensure embryo staging?

      The authors also do not describe how they validated single-copy insertions (partial transgene deletions in integrants are not infrequent and they only appear to use a single insertion for each strain). This should be described and or added as a caveat if no validation was performed.

      The authors also do not describe any validation for the CRISPR alleles, either deletions or insertion of the synthetic intron into dlg-1. How were accurate gene edits verified. - Are the experiments adequately replicated and statistical analysis adequate?

      I am not convinced the statistical analysis of the fluorescence data is correct. Unless the authors show that every datapoint in the fluorescence quantification is independent, then I would argue they vastly overestimate the statistical significance. Even small differences are shown to have "***" levels of significance, which does not appear empirically plausible.

      Minor comments:

      • Specific experimental issues that are easily addressable.
      • Are prior studies referenced appropriately?

      This study is so closely related to the Chen et al study, that I believe this study should be discussed in more detail to put the data into context. - Are the text and figures clear and accurate?

      Yes, the text and figurea are clear - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Add H2B to the mNG in Figure 1 in order to understand where the first intron was inserted.

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.
      • Place the work in the context of the existing literature (provide references, where appropriate).
      • State what audience might be interested in and influenced by the reported findings.
      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      This manuscript shows an incremental advance in our understanding of HOT regions in C. elegans. The authors replicate similar data presented previously (enhancer assays on HOT regions, PMID: 24653213). Importantly, the authors funcationally validate their data with smFISH and CRISPR-mediated deletion of two enhancers (including the substitution of the intron for a synthetic intron), which is, to my knowledge, novel and advances the field. As such, the data presented validate and increase our confidence in prior results on HOT regions. Unfortunately, the more interesting conclusions about HOT region interactions and synergy to direct expression are less well supported. The work will likely be mainly of interest to C. elegans researchers working on transcriptional regulation. My own field of expertise is C. elegans gene regulation and my lab frequently uses transcriptional transgene assays to determine gene expression.