10,000 Matching Annotations
  1. Jan 2025
    1. eLife Assessment

      This study examined the important question of how neurons code temporal information across the hippocampus, dorsal striatum, and orbitofrontal cortex. Using a behavioral task in the rat that requires discrimination between short and long time intervals, the authors conclude that time intervals are represented in all three regions and that synchronized activity of time-coding cells across the brain regions is coordinated by theta rhythms. However, several weaknesses are noted, and in its current form, the study provides incomplete evidence for understanding how temporal information is processed and coordinated throughout these brain networks.

    2. Reviewer #1 (Public review):

      Summary:

      It is known that neuronal activity in several brain regions encodes interval time. However, how interval time is encoded across distributed brain regions remains unclear. By simultaneously recording neuronal activity from the hippocampal CA1, dorsal striatum, and orbitofrontal cortex during a temporal bisection task, the authors showed that elapsed time during the interval period is encoded similarly across these regions and that the neuronal activity of time cells across these regions tends to be synchronized within 100 ms. Using Bayesian decoding, they demonstrated that the interval time decoded from the firing activity of time cells in these regions correlated with the rats' decisions and that the times decoded from the neuronal activity of different brain regions were correlated. The sound experiments and analyses support most of the main conclusions of this paper.

      Strengths:

      They used a temporal bisection task in which the effects of time and distance can be dissociated. The test trials successfully revealed the relationship between the interval time estimated by Bayesian decoding and the animal's judgment of long versus short interval times. Simultaneous recording of neuronal activity from the hippocampal CA1, dorsal striatum, and orbitofrontal cortex, which is technically challenging, allowed comparison of interval time encoding across brain regions and the degree of synchrony between neurons from different brain regions.

      Weaknesses:

      Some analyses were not explained in detail, making it difficult to assess whether their results support the authors' conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors examined how neural activity related to temporal information is distributed and coordinated throughout the hippocampus, dorsal striatum, and orbitofrontal cortex. Rats were forced to run for fixed time intervals on a treadmill and make a decision based on whether the interval was long (10s) or short (5s). Under these conditions time cells were observed across all examined brain regions. The primary finding of the authors is that synchronized activity between time cells across brain regions is entrained into the theta cycle. This observation is used to support the central claim that the sharing of temporal information is mediated by the theta oscillation.

      Strengths:

      By simultaneously recording several brain regions in an interval discrimination task, the authors provide a valuable dataset for understanding how temporal information is processed and distributed throughout relevant networks.

      Weaknesses:

      Several methodological concerns should be addressed and a more focused analysis should be performed to strengthen the central claims of this work.

      Major Concerns

      (1) The restriction to only use time cells to understand temporal information processing. Other mechanisms of encoding time, like population clocks and ramping, have been characterized in the striatum and frontal cortex, and these dynamics might contain more temporal information than the subset of cells that meet the statistical criteria for being a time cell. Furthermore, time cells in the OFC, and DS in particular, appear to be heavily biased towards the beginning of treadmill running. This raises the question of whether temporal information can be encoded by neurons other than time cells in these two regions.

      (2) The results of the Bayesian decoding analysis should be expanded on. In particular, the performance of each decoder above the chance level is not quantified. Comparing the performance of decoders trained on all cells to the performance of decoders trained on time cells alone would partially address the question of whether or not time cells are the only cells that can encode temporal information in the DS and OFC.

      (3) The decoding results for the test trials appear different from the results in the authors' previous publication (Shimbo et. al., 2021). There, differences in decoded time between the selected-long and selected-short trials emerged after 5s, the duration of the short trials. This was to be expected given the following two reasons. First, from the task design, it is unclear that the animal can distinguish trial types (long, short, or test) until after the first 5 seconds of treadmill running, making it logical for differences in decoded time to emerge only after this point. Second, time cell activity was identical in the first 5s of the long and short trials as shown in Figure 2A. Here, however, the differences in decoded time during the selected-long and selected-short test trials emerge within the first 2s of treadmill running. Could the authors explain this discrepancy?

      Furthermore, in Figure 6B, at 3 seconds of running time, the decoded time for selected-long and selected-short trials shows a difference of nearly 2 seconds, with no further increase as running time progresses. In contrast, at 2 seconds of running time, there is no significant difference in decoded time for DS and OFC, while CA1 shows a slight increase in the decoded time for selected-long trials. This pattern suggests a sudden jump in the encoded time for selected-long trials between 2 and 3 seconds. However, without explicitly showing the raw data, it is difficult to interpret this result and other results from the decoding analysis.

      Minor Concerns

      (1) It is not clear how the Bayes decoder was trained. Does the training data come entirely from the long trials?

      (2) For Figure 5D, even if only one of two neurons in a pair has its spike rate modulated by theta, wouldn't the expectation be that synchronous spike events between these two neurons would be modulated by theta as well? This analysis might benefit from shuffling methods to determine if the mean resultant length of synchronous spike events is larger than the chance level.

      (3) In Figure 5A, the authors suggest that 'the synchronization of time cells was modulated by theta oscillation.' However, it is unclear whether the population exhibits a preferred theta phase or the phase preference only occurs at the individual cell level. If there is no preference on the population level, how would the authors interpret this result?

    4. Reviewer #3 (Public review):

      Summary:

      This study examines neural activity recorded simultaneously in the hippocampus, dorsal striatum, and orbitofrontal cortex as rats performed an interval timing task. The analyses primarily focus on the activity of "time cells" which are neurons that fire at specific moments during the intervals. In this experiment, the intervals consist of periods when animals are running on a treadmill before selecting the arm associated with the interval duration. The results show that the theta oscillations induced by this running behavior were observed across the three regions and that this strong oscillation modulated the activity of neurons across regions. While these findings are correlative in nature, they provide an important characterization of activity patterns across regions during complex behavior. However, more research is needed to determine whether these activity patterns specifically contribute to temporal coding.

      Strengths:

      (1) Overall, the paper is very well written. Although I have specific concerns about the review of the relevant literature and the interpretation of the results (see below), I do want to commend the authors for their efforts toward presenting this complex work in an accessible manner.

      (2) The study is well designed and the quality of the electrophysiological data collected from multiple brain regions in such a challenging behavioral experiment is impressive. This work is a technical tour de force.

      (3) The analyses are very thorough, statistically rigorous, and clearly explained and visualized. The authors provide a thoughtful mixture of example data (at the level of individual cells or animals) and aggregated data (at the group or session level) to properly explain and quantify the activity patterns of interest.

    1. eLife Assessment

      This is an important study providing convincing evidence that increased blood pressure variability impairs myogenic tone and diminishes baroreceptor reflex. The study also provides evidence that blood pressure variability blunts functional hyperemia and contributes to cognitive decline. The authors use appropriate and validated methodology in line with the current state-of-the-art.

    2. Reviewer #1 (Public review):

      This study examined the effect of blood pressure variability on brain microvascular function and cognitive performance. By implementing a model of blood pressure variability using an intermittent infusion of AngII for 25 days, the authors examined different cardiovascular variables, cerebral blood flow, and cognitive function during midlife (12-15-month-old mice). Key findings from this study demonstrate that blood pressure variability impairs baroreceptor reflex and impairs myogenic tone in brain arterioles, particularly at higher blood pressure. They also provide evidence that blood pressure variability blunts functional hyperemia and impairs cognitive function and activity. Simultaneous monitoring of cardiovascular parameters, in vivo imaging recordings, and the combination of physiological and behavioral studies reflect rigor in addressing the hypothesis. The experiments are well-designed, and the data generated are clear. I list below a number of suggestions to enhance this important work:

      (1) Figure 1B: It is surprising that the BP circadian rhythm is not distinguishable in either group. Figure 2, however, shows differences in circadian rhythm at different timepoints during infusion. Could the authors explain the lack of circadian effect in the 24-h traces?

      (2) While saline infusion does not result in elevation of BP when compared to Ang II, there is an evident "and huge" BP variability in the saline group, at least 40mmHg within 1 hour. This is a significant physiological effect to take into consideration, and therefore it warrants discussion.

      (3) The decrease in DBP in the BPV group is very interesting. It is known that chronic Ang II increases cardiac hypertrophy, are there any changes to heart morphology, mass, and/or function during BPV? Can the the decrease in DBP in BPV be attributed to preload dysfunction? This observation should be discussed.

      (4) Examining the baroreceptor reflex during the early and late phases of BPV is quite compelling. Figures 3D and 3E clearly delineate the differences between the two phases. For clarity, I would recommend plotting the data as is shown in panels D and E, rather than showing the mathematical ratio. Alternatively, plotting the correlation of ∆HR to ∆SBP and analyzing the slopes might be more digestible to the reader. The impairment in baroreceptor reflex in the BPV during high BP is clear, is there any indication whether this response might be due to loss of sympathetic or gain of parasympathetic response based on the model used?

      (5) Figure 3B shows a drop in HR when the pump is ON irrespective of treatment (i.e., independent of BP changes). What is the underlying mechanism?

      (6) The correlation of ∆diameter vs MAP during low and high BP is compelling, and the shift in the cerebral autoregulation curve is also a good observation. I would strongly recommend that the authors include a schematic showing the working hypothesis that depicts the shift of the curve during BPV.

      (7) Functional hyperemia impairment in the BPV group is clear and well-described. Pairing this response with the kinetics of the recovery phase is an interesting observation. I suggest elaborating on why BPV group exerts lower responses and how this links to the rapid decline during recovery.

      (8) The experimental design for the cognitive/behavioral assessment is clear and it is a reasonable experiment based on previous results. However, the discussion associated with these results falls short. I recommend that the authors describe the rationale to assess recognition memory, short-term spatial memory, and mice activity, and explain why these outcomes are relevant in the BPV context. Are there other studies that support these findings? The authors discussed that no changes in alternation might be due to the age of the mice, which could already exhibit cognitive deficits. In this line of thought, what is the primary contributor to behavioral impairment? I think that this sentence weakens the conclusion on BPV impairing cognitive function and might even imply that age per se might be the factor that modulates the various physiological outcomes observed here. I recommend clarifying this section in the discussion.

      (9) Why were only male mice used?

      (10) In the results for Figure 3: "Ang II evoked significant increases in SBP in both control and BPV groups;...". Also, in the figure legend: "B. Five-minute average HR when the pump is OFF or ON (infusing Ang II) for control and BPV groups...." The authors should clarify this as the methods do not state a control group that receives Ang II.

    3. Reviewer #2 (Public review):

      Summary:

      Blood pressure variability has been identified as an important risk factor for dementia. However, there are no established animal models to study the molecular mechanisms of increased blood pressure variability. In this manuscript, the authors present a novel mouse model of elevated BPV produced by pulsatile infusions of high-dose angiotensin II (3.1ug/hour) in middle-aged male mice. Using elegant methodology, including direct blood pressure measurement by telemetry, programmable infusion pumps, in vivo two-photon microscopy, and neurobehavioral tests, the authors show that this BPV model resulted in a blunted bradycardic response and cognitive deficits, enhanced myogenic response in parenchymal arterioles, and a loss of the pressure-evoked increase in functional hyperemia to whisker stimulation.

      Strengths:

      As the presentation of the first model of increased blood pressure variability, this manuscript establishes a method for assessing molecular mechanisms. The state-of-the-art methodology and robust data analysis provide convincing evidence that increased blood pressure variability impacts brain health.

      Weaknesses:

      One major drawback is that there is no comparison with another pressor agent (such as phenylephrine); therefore, it is not possible to conclude whether the observed effects are a result of increased blood pressure variability or caused by direct actions of Ang II. Ang II is known to have direct actions on cerebrovascular reactivity, neuronal function, and learning and memory. Given that Ang II is increased in only 15% of human hypertensive patients (and an even lower percentage of non-hypertensive), the clinical relevance is diminished. Nonetheless, this is an important study establishing the first mouse model of increased BPV.

    1. eLife Assessment

      The authors show that: 1) following brief peripheral optogenetic stimulation of forepaw proprioceptors in mice, sensory-evoked responses in primary motor cortex (M1) are delayed relative to primary somatosensory cortex (S1); 2) the responses in both cortical areas follow a triphasic pattern of activation-suppression-activation; 3) directly activating cortical parvalbumin-positive (PV) inhibitory interneurons mimicked both the suppression and rebound components of the sensory-evoked response; and 4) partially suppressing activity in S1 reduces the sensory-evoked response in M1. The conclusions are convincing and build on prior work on cortical circuits related to the mouse forelimb from this group (Yamawaki et al., 2021, eLife, doi:10.7554/eLife.66836). More rigorously determining whether the peripheral stimulation approach used evokes movements would strengthen the conclusions. It is also possible that these effects would differ for peripheral mechanoreceptor stimulation. Overall, this in vivo work assessing sensory responses in forepaw-related cortical circuits represents a valuable comparison to previously published work.

    2. Reviewer #1 (Public review):

      Summary:

      Building on previous in vitro synaptic circuit work (Yamawaki et al., eLife 10, 2021), Piña Novo et al. utilize an in vivo optogenetic-electrophysiological approach to characterize sensory-evoked spiking activity in the mouse's forelimb primary somatosensory (S1) and motor (M1) areas. Using a combination of a novel "phototactile" somatosensory stimuli to the mouse's hand and simultaneous high-density linear array recordings in both S1 and M1, the authors report in awake mice that evoked cortical responses follow a triphasic peak-suppression-rebound pattern response. They also find that M1 responses are delayed and attenuated relative to S1. Further analysis revealed a 20-fold difference in subcortical versus corticocortical propagation speeds. They also report that PV interneurons in S1 are strongly recruited by hand stimulation. Furthermore, they report that selective activation of PV cells can produce a suppression and rebound response similar to "phototactile" stimuli. Lastly, the authors demonstrate that silencing S1 through local PV cell activation reduces M1 response to hand stimulation, suggesting S1 may directly drive M1 responses.

      Strengths:

      The study was technically well done, with convincing results. The data presented are appropriately analyzed. The author's findings build on a growing body of both in vitro and in vivo work examining the synaptic circuits underlying the interactions between S1 and M1. The paper is well-written and illustrated. Overall, the study will be useful to those interested in forelimb S1-M1 interactions.

      Weaknesses:

      Although the results are clear and convincing, one weakness is that many results are consistent with previous studies in other sensorimotor systems, and thus not all that surprising. For example, the findings that sensory stimulation results in delayed and attenuated responses in M1 relative to S1 and that PV inhibitory cells in S1 are strongly recruited by sensory stimulation are not novel (e.g., Bruno et al., J Neurosci 22, 10966-10975, 2002; Swadlow, Philos Trans R Soc Lond B Biol Sci 357, 1717-1727, 2002; Gabernet et al., Neuron 48, 315-327, 2005; Cruikshank et al., Nat Neurosci 10, 462-468, 2007; Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016; Yu et al., Neuron 104, 412-427 e414, 2019). Furthermore, the observation that sensory processing in M1 depends upon activity in S1 is also not novel (e.g., Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016). The authors do a good job highlighting how their results are consistent with these previous studies.

      Perhaps a more significant weakness, in my opinion, was the missing analyses given the rich dataset collected. For example, why lump all responsive units and not break them down based on their depth? Given superficial and deep layers respond at different latencies and have different response magnitudes and durations to sensory stimuli (e.g., L2/3 is much more sparse) (e.g., Constantinople et al., Science 340, 1591-1594, 2013; Manita et al., Neuron 86, 1304-1316, 2015; Petersen, Nat Rev Neurosci 20, 533-546, 2019; Yu et al., Neuron 104, 412-427 e414, 2019), their conclusions could be biased toward more active layers (e.g., L4 and L5). These additional analyses could reveal interesting similarities or important differences, increasing the manuscript's impact. Given the authors use high-density linear arrays, they should have this data.

      Similarly, why not isolate and compare PV versus non-PV units in M1? They did the photostimulation experiments and presumably have the data. Recent in vitro work suggests PV neurons in the upper layers (L2/3) of M1 are strongly recruited by S1 (e.g., Okoro et al., J Neurosci 42, 8095-8112, 2022; Martinetti et al., Cerebral cortex 32, 1932-1949, 2022). Does the author's data support these in vitro observations?

      It would have also been interesting to suppress M1 while stimulating the hand to determine if any part of the S1 triphasic response depends on M1 feedback. I appreciate the control experiment showing that optical hand stimulation did not evoke forelimb movement. However, this appears to be an N=1. How consistent was this result across animals, and how was this monitored in those animals? Can the authors say anything about digit movement? A light intensity of 5 mW was used to stimulate the hand, but it is unclear how or why the authors chose this intensity. Did S1 and M1 responses (e.g., amplitude and latency) change with lower or higher intensities? Was the triphasic response dependent on the intensity of the "phototactile" stimuli?

    3. Reviewer #2 (Public review):

      Summary:

      Communication between sensory and motor cortices is likely to be important for many aspects of behavior, and in this study, the authors carefully analyse neuronal spiking activity in S1 and M1 evoked by peripheral paw stimulation finding clear evidence for sensory responses in both cortical regions

      Strengths:

      The experiments and data analyses appear to have been carefully carried out and clearly represented.

      Weaknesses:

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

    4. Reviewer #3 (Public review):

      Summary:

      This is a solid study of stimulus-evoked neural activity dynamics in the feedforward pathway from mouse hand/forelimb mechanoreceptor afferents to S1 and M1 cortex. The conclusions are generally well supported, and match expectations from previous studies of hand/forelimb circuits by this same group (Yamawaki et al., 2021), from the well-studied whisker tactile pathway to whisker S1 and M1, and from the corresponding pathway in primates. The study uses the novel approach of optogenetic stimulation of PV afferents in the periphery, which provides an impulse-like volley of peripheral spikes, which is useful for studying feedforward circuit dynamics. These are primarily proprioceptors, so results could differ for specific mechanoreceptor populations, but this is a reasonable tool to probe basic circuit activation. Mice are awake but not engaged in a somatosensory task, which is sufficient for the study goals.

      The main results are:<br /> (1) brief peripheral activation drives brief sensory-evoked responses at ~ 15 ms latency in S1 and ~25 ms latency in M1, which is consistent with classical fast propagation on the subcortical pathway to S1, followed by slow propagation on the polysynaptic, non-myelinated pathway from S1 to M1;<br /> (2) each peripheral impulse evokes a triphasic activation-suppression-rebound response in both S1 and M1;<br /> (3) PV interneurons carry the major component of spike modulation for each of these phases;<br /> (4) activation of PV neurons in each area (M1 or S1) drives suppression and rebound both in the local area and in the other downstream area;<br /> (5) peripheral-evoked neural activity in M1 is at least partially dependent on transmission through S1.

      All conclusions are well-supported and reasonably interpreted. There are no major new findings that were not expected from standard models of somatosensory pathways or from prior work in the whisker system.

      Strengths:

      This is a well-conducted and analyzed study in which the findings are clearly presented. This will provide important baseline knowledge from which studies of more complex sensorimotor processing can build.

      Weaknesses:

      A few minor issues should be addressed to improve clarity of presentation and interpretation:

      (1) It is critical for interpretation that the stimulus does not evoke a motor response, which could induce reafference-based activity that could drive, or mask, some of the triphasic response. Figure S1 shows that no motor response is evoked for one example session, but this would be stronger if results were analyzed over several mice.

      (2) The recordings combine single and multi-units, which is fine for measures of response modulation, but not for absolute evoked firing rate, which is only interpretable for single units. For example, evoked firing rate in S1 could be higher than M1, if spike sorting were more difficult in S1, resulting in a higher fraction of multi-units relative to M1. Because of this, if reporting of absolute firing rates is an essential component of the paper, Figs 3D and 4E should be recalculated just for single units.

      (3) In Figure 5B, the average light-evoked firing rate of PV neurons seems to come up before time 0, unlike the single-trial rasters above it. Presumably, this reflects binning for firing rate calculation. This should be corrected to avoid confusion.

      (4) In Figure 6A bottom, please clarify what legends "W. suppression" and "W. rebound" mean.

    1. eLife Assessment

      This important study examines heterochromatin domain dynamics using a model system that allows reversible transition from an embryonic stem cell to a 2-cell-like state. The authors present a solid resource to the research community that will further the understanding of changes in the chromatin-bound proteome during the 2C-to-ESC transition. However, conclusions related to the functional roles of the interaction between the SWI/SNF complex component SMARCAD1 and the DNA Topoisomerase II Binding protein (TOPBP1) remain incomplete.

    2. Reviewer #1 (Public review):

      In this study, the authors investigate the molecular mechanisms driving the establishment of constitutive heterochromatin during embryonic development. The experiments have been meticulously conducted and effectively address the proposed hypotheses.

      The methodology stands out for its robustness, utilizing:<br /> i) an efficient system for converting ESCs to 2C-like cells via Dux overexpression;<br /> ii) a global approach through IPOTD, which unveils the chromatome at distinct developmental stages; and<br /> iii) STORM technology, enabling high-resolution visualization of DNA decompaction. These tools collectively provide clear and comprehensive insights that support the study's conclusions.

      The work makes a significant contribution to the field, offering valuable insights into chromatin-bound proteins at critical stages of embryonic development. These findings may also inform our understanding of processes beyond heterochromatin maintenance.

      The revised manuscript shows improvement, particularly through enhanced discussion and the addition of new references addressing the cooperation of SMARCAD1 and TOPBP1. All my previous concerns have been thoroughly addressed by the authors. However, I believe that, as this reviewer suggested, the inclusion of a model that summarizes the main findings of the study and discusses the potential mechanisms involved, would enhance the clarity and understanding of the message the manuscript aims to convey.

    3. Reviewer #2 (Public review):

      As noted in the original review, the study by Sebastian-Perez addresses an important research question using a tractable model system to examine the earliest drivers of heterochromatin formation during embryogenesis. Moreover, the proteomic analyses provide a valuable resource to the research community to understand changes in the chromatin-bound proteome during the 2C-to-ESC transition. From there, they carry out more detailed analyses of TOPBP1, which shows substantive changes in chromatin association in 2C-like cells, and a potential interacting protein SMARCAD1, which shows only modest changes in chromatin association. While I appreciate that the authors have revised the manuscript to some extent to address the minor points raised, the major over-arching issue of how TOPBP1 and SMARCAD1 function in the 2C-like state is still a concern.

    4. Reviewer #3 (Public review):

      The manuscript entitled "SMARCAD1 and TOPBP1 contribute to heterochromatin maintenance at the transition from the 2C-like to the pluripotent state" by Sebastian-Perez et al. adopted the iPOTD method to compare the chromatin-bound proteome in ESCs and 2CLCs induced by Dux overexpression. The authors identified 397 chromatin-bound proteins enriched specifically in non-2CLCs, among which they further investigated TOPBP1 due to its potential role in chromocenter reorganization. SMARCD1, a known interacting protein of TOPBP1, was also investigated in parallel. The authors report increased size and decreased number of H3K9me3-heterochromatin foci in Dux-induced 2CLCs. Remarkably, depletion of either TOPBP1 or SMARCD1 resulted in similar phenotypes. However, the absence of these proteins did not affect the entry into or exit from the 2C-like state. The authors further showed that both TOPBP1 and SMARCD1 are essential for early embryonic development.

      This manuscript provides valuable insights into the features of 2CLCs regarding H3K9me3-heterochromatin reorganization. However, the findings are largely descriptive. Mechanistic studies are required in future studies, such as: 1) how SMARCD1 associates with H3K9me3 and contributes to heterochromatin maintenance, 2) how TOPBP1 regulates the expression of SMARCD1 and facilitates its localization in heterochromatin foci, 3) whether the remodelling of chromocenter directly influence the transitions between ESCs and 2CLCs.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In the present work the authors explore the molecular driving events involved in the establishment of constitutive heterochromatin during embryo development. The experiments have been carried out in a very accurate manner and clearly fulfill the proposed hypotheses.

      Regarding the methodology, the use of: i) an efficient system for conversion of ESCs to 2C-like cells by Dux overexpression; ii) a global approach through IPOTD that reveals the chromatome at each stage of development and iii) the STORM technology that allows visualization of DNA decompaction at high resolution, helps to provide clear and comprehensive answers to the conclusion raised.

      The contribution of the present work to the field is very important as it provides valuable information on chromatin-bound proteins at key stages of embryonic development that may help to understand other relevant processes beyond heterochromatin maintenance.

      The study could be improved through a more mechanistic approach that focuses on how SMARCAD1 and TOPBP1 cooperate and how they functionally connect with H3K9me3, HP1b and heterochromatin regulation during embryonic development. For example, addressing why topoisomerase activity is required or whether it connects (or not) to SWI/SNF function and the latter to heterochromatin establishment, are questions that would help to understand more deeply how SMARCAD1 and TOPBP1 operate in embryonic development.

      We would like to thank the reviewer for the positive evaluation of our work and the methodology we employed. We greatly appreciated the reviewer’s recognition of our study to “provide valuable information on chromatin-bound proteins at key stages of embryonic development that may help to understand other relevant processes beyond heterochromatin maintenance”. While we acknowledge the value of including mechanistic studies, such an addition would require a substantial amount of experimental work that exceeds our current resources.

      Reviewer #1 (Recommendations For The Authors):

      In my opinion, the authors could improve the study by deciphering -to a certain extent- the possible mechanism by which SMARCAD1 and TOPBP1 are cooperating in their system to establish H3K9me3 and consequently heterochromatin; and whether it is different (or not) from that already reported in yeast (ref 27). In fact, is it only SMARCAD1 that participates in this process or the whole SWI/SNF complex? Could the lack of SMARCAD1 compromise the proper assembly of the SWI/SNF complex? In this regard, a model describing the main findings of the study and the discussion of the possible mechanisms involved -based on the current bibliography- would be appreciated. This, although speculative, would illustrate the range of possibilities that could be operating in the maintenance of heterochromatin during embryonic development. In conclusion, it would be great if the authors could link -mechanistically- the dots connecting SMARCD1, TOPBP1, H3K9me3/HP1/heterochromatin.

      As suggested by the reviewer and to enrich the discussion, we have included some additional sentences and references in the revised discussion section.

      As a minor point, In Figure 3A, left panel it appears that the protein precipitating with H3K9me3 reacts with TOPBP1 but its molecular weight does not exactly match to the TOPBP1 band found in the input. The authors should clarify this point and it is also recommended that IPs and inputs are run in the same gel. Please replace Figure 3A right panel.

      Following the reviewer’s suggestion and to improve the reading flow, we have restructured the order of the figures and removed the original Figure 3A. The revised Figure 3A-C panel illustrates the SMARCAD1 association with H3K9me3 in ESCs and 2C- cells, while capturing the reduced SMARCAD1-H3K9me3 association in 2C<sup>+</sup> cells.

      Reviewer #2 (Public Review):

      The manuscript by Sebastian-Perez describes determinants of heterochromatin domain formation (chromocenters) at the 2-cell stage of mouse embryonic development. They implement an inducible system for transition from ESC to 2C-like cells (referred to as 2C<sup>+</sup>) together with proteomic approaches to identify temporal changes in associated proteins. The conversion of ESCs to 2C<sup>+</sup> is accompanied by dissolution of chromocenter domains marked by HP1b and H3K9me3, which reform upon transition back to the 2C-like state. The innovation in this study is the incorporation of proteomic analysis to identify chromatin-associated proteins, which revealed SMARCAD1 and TOPBP1 as key regulators of chromocenter formation.

      In the model system used, doxycycline induction of DUX leads to activation of EGFP reporter regulated by the MERVL-LTR in 2C<sup>+</sup> cells that can be sorted for further analysis. A doxycycline-inducible luciferase cell line is used as a control and does not activate the MERVL-LTR GFP reporter. The authors do see groups of proteins anticipated for each developmental stage that suggest the overall strategy is effective.

      The major strengths of the paper involve the proteomic screen and initial validation. From there, however, the focus on TOPBP1 and SMARCAD1 is not well justified. In addition, how data is presented in the results section does not follow a logical flow. Overall, my suggestion is that these structural issues need to be resolved before engaging in comprehensive review of the submission. This may be best achieved by separating the proteomic/morphological analyses from the characterization of TOPBP1 and SMARCAD1.

      We appreciate the reviewer’s positive evaluation of our inducible system to trigger the transition from ESCs to 2C-like cells, and the strength of the chromatin proteomics we conducted. In response to the reviewer’s suggestion, we have reorganized the order of the figures, particularly Figure 1 and Figure 2, and revised the text to improve readability and flow.

      Reviewer #2 (Recommendations For The Authors):

      There are some very interesting components to the study but, as noted, the narrative requires changes and the rationale for focusing on TOPBP1 and SMARCAD1 is not strong at present. Specific comments are noted below

      (1) Inclusion of authentic 2C cells for comparative chromocenter analysis (or at least a more fulsome discussion of how the system has been benchmarked in previous studies).

      We have included more detail in the revised methods section, in the “Cell lines and culture conditions” paragraph. We have added: “The Dux overexpression system was benchmarked according to previously reported features. Dux overexpression resulted in the loss of DAPI-dense chromocenters and the loss of the pluripotency transcription factor OCT4 (fig. S1E) (6, 7), upregulation of specific genes of the 2-cell transcriptional program such as endogenous Dux, MERVL, and major satellites (MajSat) (fig. S1F) (6, 7, 11, 26, 58), and accumulation in the G2/M cell cycle phase (fig. S1G), with a reduced S phase consistent in several clonal lines (fig. S1H) (15).”

      (2) In Figure 1A, the text indicates a loss of chromocenters, but it may be better described as decompaction because the DAPI/H3K9me3 staining shows diffuse/expanded structures (this is in fact how it is described in relation to Figure 2).

      We have changed the text accordingly, now describing it as “decompaction”.

      (3) Table S1 has 6 separate tabs but these are not specified in the text. It would be useful to separate the 397 proteins unique to Luc and 2C- cells since they form much of the basis for the remaining analysis. This approach also assumes it is the absence of a protein in the 2C<sup>+</sup> that accounts for the lack of chromocenters (noting there are 510 proteins unique to the 2C<sup>+</sup> state that are not discussed).

      We have referenced the supplementary table as Table S1 in the text for simplicity. It includes: Table S1A - List of Protein Groups identified by mass spectrometry in -EdU, Luc, 2C- and 2C<sup>+</sup> cells; Table S1B - Input data for SAINT analysis; Table S1C - SAINT results of the comparison 2C- vs Luc and 2C<sup>+</sup> vs Luc; Table S1D - SAINT results of the comparison Luc vs 2C- and 2C<sup>+</sup> vs 2C-; Table S1E - SAINT results of the comparison Luc vs 2C<sup>+</sup> and 2C- vs 2C<sup>+</sup>; and Table S1F - Total number of PSM per protein in the different cells and conditions tested.

      (4) Since there is no change in H3K9me3 levels, loss of SUV420H2 from 2C<sup>+</sup> chromatin (figure 1G) coupled with potential changes in H4K20me3 could contribute the morphological differences. SUV420H2 is known to regulate chromocenter clustering in a way the requires H4K20me3 but this is not addressed or cited (PUBMED: 23599346).

      As suggested by the reviewer, we have added additional sentences and references in the revised manuscript.

      (5) In Figure 1C, there does appear to be overlap between the 2C<sup>+</sup> and 2C- populations (while the Luc population is distinct) even though they are morphologically distinct when imaged in Figure 2A. The 2C- cells are thought to be an intermediate, low Dux expressing population.

      Chromatome profiling through genome capture provides a snapshot of the chromatin-bound proteome in the analyzed samples (shown in revised Fig. 2B). As indicated by the reviewer and previously reported in the literature, 2C- cells are an intermediate population before reaching 2C<sup>+</sup> cells. For this study, we have focused on H3K9me3 morphological changes. Even though 2C- and 2C<sup>+</sup> cells are distinct with respect to H3K9me3 morphology (shown in revised Fig. 1B), analysis of the chromatome data from hundreds of chromatin-bound proteins revealed some overlap between these two populations. However, replicates from the same population tend to cluster together, for example, 2C<sup>+</sup> rep1 and 2C<sup>+</sup> rep3, and 2C- rep1 and 2C- rep2. Collectively, these data suggest that a defined subset of coordinated changes in the chromatome likely triggers the transition from 2C- to 2C<sup>+</sup> cells. Further experimental investigation of the chromatome dataset during the 2C-like transition would be interesting, however, we believe it is beyond the scope of this study.

      (6) Data with SUV39H1 and 2 is difficult to accommodate; what about other H3K9 methyltransferases or proteins such as TRIM28 (KAP1) and SETDB1 (this comes up in the discussion but is not assessed in the results section).

      We agree that investigating the role of TRIM28 (KAP1) and SETDB1 in this experimental setting could be of interest, however, we believe that these experiments go beyond the scope of the presented study.

      (7) Rationale for choosing TOPBP1 needs to be improved. How do TOPBP1 levels relate to TOPI/TOP2A/TOP2B levels across the 3 cell populations? By what criteria does topoisomerase inhibitor treatment increase 2C<sup>+</sup> like cells? Moreover, to what extent will inhibiting topoisomerases lead to global heterochromatin and cell cycle changes regardless of cell type.

      Following the reviewer’s suggestion, we have included some additional references throughout the text to strengthen our rationale for selecting TOPBP1, given its well-established critical role in DNA replication and repair. Additionally, we have revised the results and discussion sections to include new sentences that propose a potential mechanism by which topoisomerase inhibitors may indirectly recruit TOPBP1 to facilitate DNA repair, ultimately leading to an increase in 2C<sup>+</sup> cells.

      (8) Likewise, the decision to look at SMARCAD1 based solely on its interaction with TOPBP1 seems somewhat arbitrary and it did not seem to come up as of interest in the iPOTD analysis. Moreover, they were not able to validate the interaction with their own analyses.

      We have revised the text to clarify the connection further.

      (9) The flow of results is confusing. The first section concludes with a focus on TOPBP1 and SMARCAD1, then progresses to morphological characterization of heterochromatin regions in the next two sections before returning to TOPBP1 and SMARCAD1. It seems like it would make more sense to describe the model system and morphological characterization at the beginning of the results section and then transition to the proteomic analysis and characterization of TOPBP1 and SMARCAD1 (with the expectation that the rationale be improved).

      As suggested by the reviewer, we have reordered the figures, particularly Figure 1 and Figure 2, and rephased the text to improve the overall reading flow.

      (10) There has been considerable work done on characterizing chromatin structure, epigenetic changes, and morphology during early embryonic development. It is therefore difficult to see what validating some of these changes in the inducible model is adding much in the way of new knowledge. It may, but this is not articulated in the current text.

      As detailed before, we have rephrased the text to improve the overall reading flow, which we hope has improved the understanding of the impact of our results.

      (11) It is difficult to disentangle broader effects of both TOPBP1 and SMARCAD1 from those described here; they may induce phenotypes, but these may not be unique to this model system.

      We agree with the reviewer, but to address this point would require additional experiments which would go beyond the scope of the presented study.

      (12) One of the issues with this assay is global chromatin recovery; it is not focused on heterochromatin compartments. The statement "We identified a total of 2396 proteins, suggesting an efficient pull-down of chromatin-associated factors (fig. S2D and Table S1)" does not demonstrate efficiency. Additional functional annotation would be required to establish this claim, including what fraction are known chromatin-associated proteins (with a focus on the heterochromatin compartment).

      We have changed the text accordingly. The resulting statement reads as: “We identified a total of 2396 proteins, suggesting an effective pull-down of putative chromatin-associated factors (fig. S2D and Table S1)”.

      Reviewer #3 (Public Review):

      The manuscript entitled "SMARCAD1 and TOPBP1 contribute to heterochromatin maintenance at the transition from the 2C-like to the pluripotent state" by Sebastian-Perez et al. adopted the iPOTD method to compare the chromatin-bound proteome in ESCs and 2C-like cells generated by Dux overexpression. The authors identified 397 chromatin-bound proteins enriched only in ESC and 2C- cells, among which they further investigated TOPBP1 due to its potential role in controlling chromocenter reorganization. SMARCD1, a known interacting protein of TOPBP1, was also investigated in parallel. The authors observed increased size and decreased number of H3K9me3-heterochromatin foci in Dux-induced 2C<sup>+</sup> cells. Interestingly, depletion of TOPBP1 or SMARCD1 also led to increased size and decreased number of H3K9me3 foci. However, depletion of these proteins did not affect entry into or exit from the 2C-like state. Nevertheless, the authors showed that both TOPBP1 and SMARCD1 are required for early embryonic development.

      Although this manuscript provides new insights into the features of 2C-like cells regarding H3K9me3-heterochromatin reorganization, it remains largely descriptive at this stage. It does not provide new insights into the following important aspects: 1) how SMARCD1 associates with H3K9me3 and contributes to heterochromatin maintenance, 2) how TOPBP1 regulates the expression of SMARCD1 and facilitates its localization in heterochromatin foci, 3) whether the remodelling of chromocenter is causally related to the mutual transitions between ESCs and 2C-like cells. Furthermore, some results are over-interpreted. Additional experiments and analyses are needed to increase the strength of mechanistic insights and to support all claims in the manuscript.

      We would like to thank the reviewer for their positive and thorough evaluation of our manuscript. We have revised the text and hope that the overall flow is now clearer. Moreover, while we acknowledge the value of including mechanistic studies, such an addition would require a substantial amount of experimental work that exceeds our current resources. 

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Fig.2: the DNA decompaction of the chromatin fibers shown in 2C<sup>+</sup> cells may be more related to a relaxed 3D chromatin conformation (Zhu, NAR 2021; Olbrich, Nat Commun 2021) than chromatin accessibility. The authors should discuss this point.

      As suggested by the reviewer, we have included some additional sentences and references in the revised manuscript to address this concern.

      (2) Chemical inhibition of topoisomerases resulted in an increase in the percentage of 2C<sup>+</sup> cells. Does depletion of TOPBP1 also resulted in increased percentage of 2C<sup>+</sup> cells? Please include this result in Fig. 3E. Additionally, it should be noted that DDR and p53 have been reported to activate Dux (Stashpaz, eLife 2020; Grow, Nat Genet 2021), and thus, may contribute to the increased percentage of 2C<sup>+</sup> cells observed upon topoisomerase inhibition. This point should be discussed in the manuscript.

      To address this concern, we have included some additional sentences and references in the revised manuscript.

      (3) Fig 3A: the TOPBP1 band in the IP sample is questionable, and therefore the conclusion that TOPBP1 is associated with H3K9me3 is difficult to draw from Fig 3A. Additionally, the authors mentioned that association of TOPBP1 and SMARCAD1 is undetected in ESCs, likely due to the suboptimal efficiency of available antibodies. As these are key conclusions in this study, the authors are suggested to try other commercially available TOPBP1 antibodies (e.g., Abcam #ab-105109, used by ElInati, PNAS 2017) or knock-in tags to perform the co-IP experiment.

      Following the reviewer’s suggestion and to improve the reading flow, we have restructured the order of figures and removed the original Figure 3A. The revised Figure 3A-C panel illustrates the SMARCAD1 association with H3K9me3 in ESCs and 2C- cells, while capturing the reduced SMARCAD1-H3K9me3 association in 2C<sup>+</sup> cells.

      (4) Fig. 3C-D, Fig. S3D: the authors claimed reduction of both SMARCAD1 expression and its co-localization with H3K9me3 foci in 2C<sup>+</sup> cells, but did not perform mechanistic studies. It is important to know if TOPBP1 expression also decreases in 2C<sup>+</sup> cells. Additionally, it is unclear if the reduced co-localization of SMARCAD1 with H3K9me3 foci results from its altered nuclear localization or simply from reduced expression level? In either case, please provide some mechanistic insights.

      While we acknowledge the value of including mechanistic studies, such an addition would require a substantial amount of experimental work that exceeds our current resources. 

      (5) Fig. 3K, Fig. S4D-E: does SMARCAD1 expression decrease upon TOPBP1 depletion? Statistical analysis of SMARCAD1 intensity in Fig. S4E is needed, and a Western blot analysis is strongly suggested. Additionally, it is unclear if the reduced co-localization of SMARCAD1 with H3K9me3 foci results from its altered nuclear localization or simply from reduced expression level? In Fig. 3K, TOPBP1-depleted cells appear to show decreased size and increased number of H3K9me3 foci, which is inconsistent with Fig. S4B-C. The authors should clarify this discrepancy. Furthermore, statistics should be performed to determine whether Smarcad1/Topbp1 knockdown could further increase the size and decrease the number of H3K9me3 foci in 2C<sup>+</sup> cells. This would provide additional evidence for the involvement of these proteins in heterochromatin maintenance.

      We did not observe Smarcad1 downregulation after Topbp1 knockdown (shown in fig. S4A). In Figs. S4B and S4C, we observed that the number of H3K9me3 foci decreased, and their area became larger after knocking down either Smarcad1 or Topbp1, compared to scramble controls. These results align with the reviewer’s comment. Additionally, it should be noted that these findings were derived from the quantification of tens of cells and hundreds of foci, as indicated in the figure legend. This resulted in statistical significance after applying the test indicated in the figure legend.

      (6) Fig. 3J is suggested to be moved to Fig. 4. Additionally, performing immunostaining of SMARCAD1, TOPBP1, and H3K9me3 during pre-implantation development would provide valuable information on their protein-level dynamics, interactions, and functions in early embryos. This would further strengthen the conclusions drawn in the manuscript.

      We agree that performing these additional experiments would provide additional valuable information, however this would require a substantial amount of experimental work that exceeds our current resources.

      (7) Fig. 4 and Fig. S5: the authors observed reduced H3K9me3 signal in the Smarcad1 MO embryos at the 8-cell stage, but claim that they failed to examine Topbp1 MO embryos at the 8-cell stage due to their developmental arrest at the 4-cell stage. However, based on Fig. 4A, not all Topbp1 MO embryos were arrested at the 4-cell stage, and it is still possible to examine the H3K9me3 signal in 8-cell Topbp1 MO embryos, which is critical for demonstrating its function in early embryos. Also, how to interpret the increased HP1b signal in Topbp1 MO embryos?

      For Topbp1 silencing, we observed an even more severe phenotype compared to Smarcad1 MO. All the Topbp1 MO-injected embryos (100 %) arrested at the 4-cell stage and did not develop further (shown in Fig. 4A and 4B). Therefore, the severity of the Topbp1 morpholino phenotype posed a technical challenge in evaluating the H3K9me3 signal in 8-cell Topbp1 MO embryos, as none of the injected embryos developed beyond the 4-cell stage.

      We believe the increased HP1b signal in Topbp1 MO embryos could indicate potential alterations in chromatin organization and heterochromatin stability. Specifically, we observed remodeling of heterochromatin in both 2-cell and 4-cell Topbp1 MO arrested embryos compared to controls, as evidenced by the spreading and increased HP1b signal (shown in fig. S5F-S5I). Further investigations could enhance our understanding of the underlying defects in Topbp1 knockdown embryos, extending beyond heterochromatin-related errors.

      Minor points:

      (1) Page 4, the third row from the bottom: please revise the sentence.

      We have reviewed the text and it now reads correctly in the revised manuscript.

      (2) Fig. 1C: The authors claimed "Luc replicates clustered separately from 2C<sup>+</sup> and 2C- conditions", however, Luc rep3 is apparently clustered with 2C conditions.

      (3) The GFP signal in Fig. S1E is confusing.

      (4) Please include ESC in Fig. 2D-E. Also label the colors in Fig. 2E.

      As indicated in the figure legend of the revised Fig. 1F: “Cells with a GFP intensity score > 0.2 are colored in green. Black dots indicate 2C- cells and green dots indicate 2C<sup>+</sup> cells.”

      (5) Fig. 2G: Transposition of the heatmap (show genes in rows) is suggested to improve readability.

      (6) Page 7, the third row from the bottom: incorrect citation of Fig. 1K.

      Thank you for spotting this incorrect citation. We have corrected it in the revised manuscript.

      (7) Page 8, row 15, Fig. S3D should be cited to support the decreased expression of SMARCAD1 in 2C<sup>+</sup> cells.

      We have cited the corresponding supplementary figure S3D in the mentioned sentence.

      (8) Fig. 2H: what is the difference between "2C-" and "ESC-like"?

      We named 2C- to those cells not expressing the GFP reporter in the transition from ESCs to 2C<sup>+</sup> cells. We named ESC-like cells to those cells that do not express the GFP reporter during exit, meaning from sorted and purified 2C<sup>+</sup> to a GFP negative state.

      (9) Fig. S4A-C: compared with shTopbp1#2, shTopbp1#1 appears to be slightly more effective in knockdown, but less dramatic changes in the size/number of H3K9me3 foci.

      (10) Fig. 4: please show the effectiveness of Topbp1 MO by Immunostaining of TOPBP1.

      (11) Fig. 4C: please label the developmental stage as in Fig. 4E and 4G.

      We have added a “8-cell” label in the Figure 4C, as suggested by the reviewer.

    1. eLife Assessment

      This important study shows that Type 3 secretion translocons in Edwardsiella tarda and other bacteria activate the NAIP-NLRC4 inflammasome. The data from cellular and biochemical experiments showing that EseB is required for activation of the NLRC4 inflammasome are convincing. This paper is broadly relevant to those investigating host-pathogen interactions in diverse organisms.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Zaho and colleagues investigate inflammasome activation by E. tarda infections. They show that E. tarda induces the activation of the NLRC4 inflammasome as well as the non-canonical pathway in human THP1 macrophages. Further dissecting NLRC4 activation, the find that T3SS translocon components eseB, eseC and eseD are necessary for NLRC4 activation, and that delivery of purified eseB is sufficient to trigger NAIP-dependnet NLRC4 activation. Sequence analysis reveals that eseB shares homology within the C-terminus with T3SS needle and rod proteins, leading the authors to test if this region is necessary for inflammasome activation. They show that the eseB CT is required and that it mediates interaction with NAIP. Finally, they that homologs of eseB in other bacteria also share the same sequence and that they can activate NLRC4 in a HEK293T cell overexpression system.

      Strengths:

      This is a very nice study that convincingly shows that eseB and its homologs can be recognized by the human NAIP/NLRC4 inflammasome. The experiments are well-designed, controlled and described, and the papers is convincing as a whole.

      Weaknesses:

      The authors need to discuss their study in the context of previous papers that have shown an important role for E. tarda flagellin in inflammasome activation and test whether flagellin and/or E. tarda T3SSs needle or rod can activate NLRC4.

      The authors show that eseB and its homologs can activate NLRC4, but there are also other translocon proteins that are very different such as YopB or PopB. and share little homology with eseB. It would be nice to include a section comparing the different type 3 secretion systems. are there 2 different families of T3SSs, those that feature translocon components that are recognized by NAIP-NLRC4 and those that cannot be recognized?

      Comments on revisions:

      The authors have addressed my concern with additional experiments, which strengthen the authors' conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      This work by Zhao et al. demonstrates the role of the Edwardsiella tarda type 3 secretion system translocon in activating human macrophage inflammation and pyroptosis. The authors show the requirement of both the bacterial translocon proteins and particular host inflammasome components for E. tarda-induced pyroptosis. In addition, the authors show that the C-terminal region of the translocon protein, EseB, is both necessary and sufficient to induce pyroptosis when present in the cytoplasm. The most terminal region of EseB was determined to be highly conserved among other T3SS-encoding pathogenic bacteria and a subset of these exhibited functionally similar effects on inflammasome activation. Overall, the data support the conclusions and interpretations and provide valuable insights into interactions between bacterial T3SS components and the host immune system., thereby expanding our understanding of E. tarda pathogenesis.

      Strengths:

      The authors use established and reliable molecular biology and bacterial genetics strategies to characterize the roles of the bacterial T3SS translocon and host inflammasome pathways to E. tarda-induced pyroptosis in human macrophages. These observations are naturally expanded upon by demonstrating the specific regions of EseB that are required for inflammasome activation and the conservation of this sequence and function among other pathogenic bacteria.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Zhao and colleagues investigate inflammasome activation by E. tarda infections. They show that E. tarda induces the activation of the NLRC4 inflammasome as well as the non-canonical pathway in human THP1 macrophages. Further dissecting NLRC4 activation, they find that T3SS translocon components eseB, eseC and eseD are necessary for NLRC4 activation and that delivery of purified eseB is sufficient to trigger NAIP-dependent NLRC4 activation. Sequence analysis reveals that eseB shares homology within the C-terminus with T3SS needle and rod proteins, leading the authors to test if this region is necessary for inflammasome activation. They show that the eseB CT is required and that it mediates interaction with NAIP. Finally, they that homologs of eseB in other bacteria also share the same sequence and that they can activate NLRC4 in a HEK293T cell overexpression system.

      Strengths:

      This is a very nice study that convincingly shows that eseB and its homologs can be recognized by the human NAIP/NLRC4 inflammasome. The experiments are well designed, controlled and described, and the papers is convincing as a whole.

      Weaknesses:

      The authors need to discuss their study in the context of previous papers that have shown an important role for E. tarda flagellin in inflammasome activation and test whether flagellin and/or E. tarda T3SSs needle or rod can activate NLRC4.

      The authors show that eseB and its homologs can activate NLRC4, but there are also other translocon proteins that are very different such as YopB or PopB. and share little homology with eseB. It would be nice to include a section comparing the different type 3 secretion systems. are there 2 different families of T3SSs, those that feature translocon components that are recognized by NAIP-NLRC4 and those that cannot be recognized?

      (1) The authors need to discuss their study in the context of previous papers that have shown an important role for E. tarda flagellin in inflammasome activation and test whether flagellin and/or E. tarda T3SSs needle or rod can activate NLRC4.

      According to the reviewer’s suggestion, we added the relevant discussion (lines 326-334) and carried out additional experiments to examine whether E. tarda flagellin, needle, and rod could activate NLRC4. The relevant results are shown in Figure S3, Figure S5, and lines 226-230 and 269-274.

      (2) The authors show that eseB and its homologs can activate NLRC4, but there are also other translocon proteins that are very different such as YopB or PopB. and share little homology with eseB. It would be nice to include a section comparing the different type 3 secretion systems. are there 2 different families of T3SSs, those that feature translocon components that are recognized by NAIP-NLRC4 and those that cannot be recognized?

      According to the reviewer’s suggestion, additional experiments were performed to examine the NLRC4-activating potentials of 14 translocator proteins that share low sequence identities with EseB. The relevant results and discussion are shown in Figure S8 and lines 289-301; 364-372, and 377-379.

      Reviewer #2 (Public Review):

      Summary:

      This work by Zhao et al. demonstrates the role of the Edwardsiella tarda type 3 secretion system translocon in activating human macrophage inflammation and pyroptosis. The authors show the requirement of both the bacterial translocon proteins and particular host inflammasome components for E. tarda-induced pyroptosis. In addition, the authors show that the C-terminal region of the translocon protein, EseB, is both necessary and sufficient to induce pyroptosis when present in the cytoplasm. The most terminal region of EseB was determined to be highly conserved among other T3SS-encoding pathogenic bacteria and a subset of these exhibited functionally similar effects on inflammasome activation. Overall, the data support the conclusions and interpretations and provide interesting insights into interactions between bacterial T3SS components and the host immune system.

      Strengths:

      The authors use established and reliable molecular biology and bacterial genetics strategies to characterize the roles of the bacterial T3SS translocon and host inflammasome pathways to E. tarda-induced pyroptosis in human macrophages. These observations are naturally expanded upon by demonstrating the specific regions of EseB that are required for inflammasome activation and the conservation of this sequence among other pathogenic bacteria.

      Weaknesses:

      The functional assessment of EseB homologues is limited to inflammasome activation at the protein level but does not include the effects on cell viability as shown for E. tarda EseB. Confirmation that EseB homologues have similar effects on cell death would strengthen this portion of the manuscript.

      According to the reviewer’s suggestion, the effects of representative EseB homologs on cell death were examined in the revised manuscripts (Figure 5D, Figure S7 and line 289).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I only have a few suggestions on how to improve the study:

      Activation of caspase-4 requires entry into the host cytosol. Can this be observed with E. tarda and is it T3SS dependent? The fact that deleting the translocon components abrogates all GSDMD activation (see Fig. 2D) suggests that also Casp4 activation requires an active T3SS. It would be useful for the reader to include some more information on the cellular biology of E. tarda.

      In our study, we found that E. tarda could enter THP-1 cells (Figure S1), and host cell entry was not affected by deletion of eseB-D (Δ_eseB-D_) in the T3SS system (Figure 2B, C). Additional experiments showed that Δ_eseB-D_ abolished the ability of E. tarda to activate Casp4 (Figure S2), implying that Casp4 activation required an active T3SS. Relevant changes in the revised manuscript: lines 223 and 224, 341-342.

      The data presented by the authors suggest that escB is sensed by NLRC4 when overexpressed, they do however not prove that during an infection escB is the main factor that drives NLRC4 activation, since deficiency in escB also abrogated translocation of other potential activators of NLRC4, e.g. flagellin and T3SS needle and rod subunits. I would thus find it essential to properly test if E. tarda flagellin can activate NLRC4 by comparing a WT and flagellin deficient strain, and/or by transfecting or expressing E.t. flagellin in these cells, as well as testing whether E.t. rod and needle subunits act as NLRC4 activators. This is important as previous studies suggested that flagellin is the main activator of cytotoxicity during E. tarda infection.

      Previous studies have shown that flagellin is required for E. tarda-induced macrophage death in fish [1] but not in mice [2]. In the revised manuscript, we performed additional experiments to examine whether E. tarda flagellin, needle, and rod could activate NLRC4. The relevant results are shown in Figure S3, Figure S5, and lines 226-230 and 269-274, and 326-334.

      References

      (1) Xie HX, Lu JF, Rolhion N, Holden DW, Nie P, Zhou Y, et al. Edwardsiella tarda-induced cytotoxicity depends on its type III secretion system and flagellin. Infect Immun. 2014;82(8):3436-45. doi: 10.1128/IAI.01065-13.

      (2) Chen H, Yang D, Han F, Tan J, Zhang L, Xiao J, et al. The bacterial T6SS effector EvpP prevents NLRP3 inflammasome activation by inhibiting the Ca<sup>2+</sup>-dependent MAPK-JNK pathway. Cell Host Microbe. 2017;21(1):47-58. doi: 10.1016/j.chom.2016.12.004.

      Figure 5/S4, please list the names of the eseB homologs. It is cumbersome to have to access GenBank with the accession number to be able to understand what proteins the authors define as homologs of eseB.

      The names were added to the revised Table S2, Figure 5 and Figure S6 (the original Figure S4).

      The authors mention that other translocon proteins, such as YopB/D and PopB/D, were suggested to cause inflammasome activation. How do these compare to eseB and its homologs? Do they share the CT motif?

      Additional experiments were performed to compare the inflammasome activation abilities of EseB and other translocator proteins including YopD and PopD. The relevant results and discussion are shown in Figure S8 and lines 289-301, 364-372, and 377-379.

      It would be nice to show that there are potentially two groups of translocon proteins, one group sharing homology to needle subunits within the CT region and another that is different. A quick look at the sequence of these proteins suggests that they are quite different and much larger than eseB.

      In our study, additional experiments with more translocator proteins indicated that the possession of EseB T6R-like terminal residues does not necessarily guarantee the protein to activate the NLRC4 inflammasome. Relevant results and discussion are shown in lines 289-301, 364-372, and 377-379.

    1. eLife Assessment

      This paper reports important findings on giant organelle complexes containing endosomes and lysosomes (termed endosomal-lysosomal organelles form assembly structures [ELYSAs]) present in mouse oocytes and 1- to 2-cell embryos. The data showing the localization and dynamics of ELYSAs during oocyte/embryo maturation are convincing. This work will be of interest to general cell biologists and developmental biologists.

    2. Reviewer #1 (Public review):

      Satouh et al. report giant organelle complexes in oocytes and early embryos. Although these structures have often been observed in oocytes and early embryos, their exact nature has not been characterized. The authors named these structures "endosomal-lysosomal organelles form assembly structures (ELYSAs)". ELYSAs contain organelles such as endosomes, lysosomes, and probably autophagic structures. ELYSAs are initially formed in the perinuclear region and then seem to migrate to the periphery in an actin-dependent manner. When ELYSAs are disassembled after the 2-cell stage, the V-ATPase V1 subunit is recruited to make lysosomes more acidic and active. The ELYSAs are most likely the same as the "endolysosomal vesicular assemblies (ELVAs)", reported by Elvan Böke's group earlier this year (Zaffagnini et al. doi.org/10.1016/j.cell.2024.01.031). However, it is clear that Satouh et al. identified and characterized these structures independently. These two studies could be complementary. Although the nature of the present study is generally descriptive, this paper provides valuable information about these giant structures. Since the ELYSA described in this paper and ELVA proposed by Elvan Böke appear to be the same structure, it would be helpful to the field if the two groups discuss unifying the nomenclature in the future.

      Comments on latest version:

      In this revised manuscript, the authors have provided additional data supporting their conclusions and also revised the text to more accurately reflect the experimental results.

    3. Reviewer #2 (Public review):

      Satouh et al report the presence of spherical structures composed of endosomes, lysosomes and autophagosomes within immature mouse oocytes. These endolysosomal compartments have been named as Endosomal-LYSosomal organellar Assembly (ELYSA). ELYSAs increase in size as the oocytes undergo maturation. ELYSAs are distributed throughout the oocyte cytoplasm of GV stage immature oocytes but these structures become mostly cortical in the mature oocytes. Interestingly, they tend to avoid the region which contain metaphase II spindle and chromosomes. They show that the endolysosomal compartments in oocytes are less acidic and therefore non-degradative but their pH decreases and become degradative as the ELYSAs begin to disassemble in the embryos post fertilization. This manuscript shows that lysosomal switching does not happen during oocyte development, and the formation of ELYSAs prevent lysosomes from being activated. Structures similar to these ELYSAs have been previously described in mouse oocytes (Zaffagnini et al, 2024) and these vesicular assemblies are important for sequestering protein aggregates in the oocytes but facilitate proteolysis after fertilization. The current manuscript, however, provides further details of endolysosomal disassembly post fertilization. Specifically, the V1-subunit of V-ATPase targeting to the ELYSAs increases the acidity of lysosomal compartments in the embryos. This is a well-conducted study and their model is supported by experimental evidence and data analyses.

      Comments on revisions:

      This revised version of the manuscript has addressed most of my concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Satouh et al. report giant organelle complexes in oocytes and early embryos. Although these structures have often been observed in oocytes and early embryos, their exact nature has not been characterized. The authors named these structures "endosomal-lysosomal organelles form assembly structures (ELYSAs)". ELYSAs contain organelles such as endosomes, lysosomes, and probably autophagic structures. ELYSAs are initially formed in the perinuclear region and then migrate to the periphery in an actin-dependent manner. When ELYSAs are disassembled after the 2-cell stage, the V-ATPase V1 subunit is recruited to make lysosomes more acidic and active. The ELYSAs are most likely the same as the "endolysosomal vesicular assemblies (ELVAs)", reported by Elvan Böke's group earlier this year (Zaffagnini et al. doi.org/10.1016/j.cell.2024.01.031). However, it is clear that Satouh et al. identified and characterized these structures independently. These two studies could be complementary. Although the nature of the present study is generally descriptive, this paper provides valuable information about these giant structures. The data are mostly convincing, and only some minor modifications are needed for clarification and further explanation to fully understand the results.

      Reviewer #2 (Public Review):

      Satouh et al report the presence of spherical structures composed of endosomes, lysosomes, and autophagosomes within immature mouse oocytes. These endolysosomal compartments have been named as Endosomal-LYSosomal organellar Assembly (ELYSA). ELYSAs increase in size as the oocytes undergo maturation. ELYSAs are distributed throughout the oocyte cytoplasm of GV stage immature oocytes but these structures become mostly cortical in the mature oocytes. Interestingly, they tend to avoid the region which contains metaphase II spindle and chromosomes. They show that the endolysosomal compartments in oocytes are less acidic and therefore non-degradative but their pH decreases and becomes degradative as the ELYSAs begin to disassemble in the embryos post-fertilization. This manuscript shows that lysosomal switching does not happen during oocyte development, and the formation of ELYSAs prevents lysosomes from being activated. Structures similar to these ELYSAs have been previously described in mouse oocytes (Zaffagnini et al, 2024) and these vesicular assemblies are important for sequestering protein aggregates in the oocytes but facilitate proteolysis after fertilization. The current manuscript, however, provides further details of endolysosomal disassembly post-fertilization. Specifically, the V1-subunit of V-ATPase targeting the ELYSAs increases the acidity of lysosomal compartments in the embryos. This is a well-conducted study and their model is supported by experimental evidence and data analyses.

      Reviewer #3 (Public Review):

      Fertilization converts a cell defined as an egg to a cell defined as an embryo. An essential component of this switch in cell fate is the degradation (autophagy) of cellular elements that serve a function in the development of the egg but could impede the development of the embryo. Here, the authors have focused on the behavior during the egg-to-embryo transition of endosomes and lysosomes, which are cytoplasmic structures that mediate autophagy. By carefully mapping and tracking the intracellular location of well-established marker proteins, the authors show that in oocytes endosomes and lysosomes aggregate into giant structures that they term Endosomal LYSosomal organellar Assembl[ies] (ELYSA). Both the size distribution of the ELYSAs and their position within the cell change during oocyte meiotic maturation and after fertilization. Notably, during maturation, there is a net actin-dependent movement towards the periphery of the oocyte. By the late 2-cell stage, the ELYSAs are beginning to disintegrate. At this stage, the endo-lysosomes become acidified, likely reflecting the activation of their function to degrade cellular components.

      This is a carefully performed and quantified study. The fluorescent images obtained using well-known markers, using both antibodies and tagged proteins, support the interpretations, and the quantification method is sophisticated and clearly explained. Notably, this type of quantification of confocal z-stack images is rarely performed and so represents a real strength of the study. It provides sound support for the conclusions regarding changes in the size and position of the ELYSAs. Another strength is the use of multiple markers, including those that indicate the activity state of the endo-lysosomes. Altogether, the manuscript provides convincing evidence for the existence of ELYSAs and also for regulated changes in their location and properties during oocyte maturation and the first few embryonic cell cycles following fertilization.

      At present, precisely how the changes in the location and properties of the ELYSAs affect the function of the endo-lysosomal system is not known. While the authors' proposal that they are stored in an inactive state is plausible, it remains speculative. Nonetheless, this study lays the foundation for future work to address this question.

      Minor point: l. 299. If I am not mistaken, there is a typo. It should read that the inhibitors of actin polymerization prevent redistribution from the cytoplasm to the cortex during maturation.

      Minor point: A few statements in the Introduction would benefit from clarification. These are noted in the comments to the authors.

      We sincerely appreciate the editorial board of eLife and the reviewers for their helpful and constructive comments on our manuscript. We are pleased that the reviewers acknowledged that we identified and characterized this assembly structure independently. In the revised manuscript, we have carefully considered the reviewers’ comments and conducted additional analysis to address each of them.

      Regarding the typographical errors, we revised the description to fit with our findings and the reviewers’ comments. We also found that the primer sequence was correct, and we carefully checked the accuracy of the entire manuscript.

      We hope that the revised version will now be deemed suitable for publication in eLife.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Q. 1) The authors state in the Abstract that ELYSAs contain autophagosome-like membranes in the outer layer. However, this seems to be just speculation based on the LC3 staining results and is not directly shown. Are there autophagosome-like double membrane structures in ELYSAs?

      We appreciate this comment. We also agree with this concern; however, it was difficult to assert that they are autophagosomes based on the observation of the electron micrographs. For this reason, we rephrased it to be "Most ELYSAs are also positive for an autophagy regulator, LC3.” (lines 33). In addition, we revised the notation to LC3-positive structures in the Result and Discussion section (line 165-169, 286).

      Q. 2) The data in Figure 2A, showing a decrease in the number of LAMP1 structures, seems to contradict the data in Figure 1B, showing an apparent increase in LAMP1 structures. Please explain this discrepancy. If the authors did not count structures just below the plasma membrane, please explain the rationale for this.

      We really appreciate the valuable comment. Regarding the number of LAMP1-positive structures, it is not suitable for comparison with Figure 1B, etc., as pointed out by the reviewer, since the distribution of the LAMP1 signal differs from plane to plane. To avoid any potential confusion, we added new images of the Z-projection of the immunostained images that can better reflect the number of positive structures in the whole oocyte/embryo in Figure 2.

      In addition, as the reviewer pointed out, there is a technical difficulty in measuring the LAMP1-positive signal on the plasma membrane or just below it. We explained how and why we had to delete plasma membrane signals in our response #21.

      Q. 3) The actin dependence is not observed in Figure 5C. What is the difference between Figure 5C and 5E? Please explain further.

      We apologize for the lack of clarity; Figures 5C and 5E show the average number of LAMP1-positive structures (5C) and the percentage of the sum of granule volumes in LAMP1 positive structure (5E), respectively, after classifying the LAMP1 positive granules by their diameters.

      We removed Figure 5E for the sake of conciseness since we already mentioned a similar fact in Figure 5C. To clarify the corresponding explanations, we moved figures that were not classified by diameter to Supplementary Figure 8 to improve readability. Moreover, we have rewritten the main text on lines 200–211.

      Q. 4) While the actin inhibitors reduce the number of peripheral LAMP1 structures (Figure 5F), they do not affect their number in the central region (Figure 5G). How can the authors conclude that actin inhibitors inhibit the migration of LAMP1 structures?

      We appreciate the comment. As pointed out, the number of large LAMP1-positive structures in the medial region did not change. Therefore, we have avoided the description that ELYSAs migrate from the middle region to the cell periphery and have unified the description of whether large structures in the periphery occur. Please refer to the subsection title (line 188), the following descriptions (lines 189–199), the related description in the Results (lines 200–211), and the title and the legend of Figure 5.

      Q. 5) The authors show that the V1A subunit associates with the surface of LAMP1 structures as punctate structures (Figure 6B). What are these V1A-positive structures? Is V1A recruited to some specific domains of ELYSAs, or are V1A-positive active lysosomes recruited to ELYSAs? Please provide an interpretation of these data. The phrase "The V1-subunit of V-ATPase is targeted to these structures" (line 262) is not appropriate because it is indistinguishable whether only the V1 subunits are recruited or active lysosomes containing the V1 subunit are recruited.

      Thank you for the valuable comment. Indeed, our analysis, including the analysis of Fig. 8 described on line 262, did not clarify whether free V1A-mCherry molecules accessed the ELYSA periphery or whether lysosomes with V1A-mCherry molecules newly merged into the ELYSA. Therefore, we added this interpretation to lines 232–234 of the Results and revised the Discussion as "The number of membrane structures positive for V1A-mCherry increase upon ELYSA disassembly, indicating further acidification of the endosomal/lysosomal compartment" (lines 292–294).

      Q. 6) Why did the authors use LysoSensor as a marker for ELYSA instead of LAMP1 in Figure 8 and 9? Some reasons should be given.

      There is a clear technical reason for this: when LAMP1-EGFP was expressed in a zygote, it was largely migrated to the plasma membrane before and after the 2-cell stage, making it difficult to capture the change of ELYSAs. To circumvent this difficulty, we used Lysosensor to visualize ELYSAs instead of LAMP1-EGFP. This explanation was added to lines 258–260.

      Q. 7) In Figure 9A, it is not clear whether the activity of LysoSensor-positive structures is lower at this stage compared to other stages. It may be shown in Figure S7, but the data are not clearly visible. A direct comparison would be ideal.

      A new analysis similar to that shown in Fig. 9 for early 2-cells and 4-cells was performed and added to Figure S7. To support direct comparison, the ranges of axes were set to be similar.

      As a result, the quantified MagicRed signal on the isolated LysoSensor-positive punctate structure in MII oocyte was nearly the same as that in early 2-cells and 4-cells. In early 2-cells, LysoSensor gave a signal at the cellular boundary, where MagicRed staining was not observed, confirming that MagicRed activity is higher in the interior than in the cell periphery in post-fertilization embryos. We have included an additional description in the main text (lines 280–282).

      Q. 8) In the phrase "pregnant mare serum gonadotropin or an anti-inhibin antibody" (line 382), is "or" correct?

      When inducing superovulatory stimulation, an anti-inhibin antibody (distributed as CARD HyperOva) can be used as a substitute for PMSG (after additional stimulation with hCG), which results in the production of eggs of similar quality to those of PMSG. This was used in most experiments. To amend the lack of clarity, a reference (Takeo and Nakagata Plos One, 2015) was added to the description of HyperOva (line 417).

      Q. 9) In almost all graphs, please indicate what the X-axis is indicating (not just "number") so that readers can understand what number is being represented without reading the legends.

      We revised the axis titles in all figures.

      Q. 10) Since grayscale images provide better contrast than color images, it is recommended that single-color images be shown in grayscale.

      We replaced all single-color images with grayscale images.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments:

      Q. 11) Figure 1 and S1- Both Rab5 and Rab7 co-localize with LAMP1. However, there seems to be a lot of LAMP1-free Rab5 dots as compared to the Lamp1-free Rab7. As a result, LAMP1 and Rab7 are co-localized more frequently than LAMP1 and Rab5 (video1). Could it be that early endosomes (Rab5+) are yet to be incorporated into ELYSAs? If so, a brief discussion of this phenomenon would be nice.

      Thank you very much for the comment. We agree with the reviewer’s interpretation. In accordance with this suggestion, we clearly stated in the main text: “Although small punctate structures that are RAB5-positive but LAMP1-negative also spread over the cytosol, most giant structures were positive for RAB5 and LAMP1 (Video 1)” (lines 91–93). In the Discussion section, a brief statement was included: “Considering the large number of RAB5-positive and LAMP1-negative punctate structures in MII oocytes, these layers may also reflect the assembly mechanism of the ELYSA” (lines 318–320).

      Q. 12) Video 3 (and Figure 6) clearly shows the dynamics of LAMP1-labelled vesicles during maturation, which is impressive. In contrast to the live cell imaging after LAMP1 mRNA injection, Figure 1 used anti-LAMP1 Ab to detect endogenous levels of LAMP1. It appears that mRNA microinjection causes LAMP1 overexpression causing more (but smaller) vesicles to form. It should be easy to quantify and compare the vesicles in Figure 1 and 6

      We appreciate the comment. As mentioned, injections of EGFP-LAMP1 mRNA are useful for the visualization of LAMP1 dynamics during the maturation phase from GV to MII by live cell imaging, which is not feasible with immunostaining. However, the fluorescence emitted by EGFP-LAMP1 is only a few tenths of that of antibody staining, and because of the technical difficulty of microinjection into GV oocytes, the signal-to-noise ratio sufficient for imaging was merely one in ten oocytes. In addition, live cell imaging of oocytes in Figure 6 had to be carried out with very low excitation light exposure to reduce the toxicity. It was also performed with a low magnification lens and a longer step size in the z-axis. For these reasons, in examining the point raised, we performed an additional 3D object analysis, in the same way as in Figure 2, on the data of IVM oocytes injected with EGFP-LAMP1 mRNA using the same lens as in Figure 1 and with a longer exposure time than in live imaging. The results were compared with the MII data of Figures 1 and 2.

      As a result, as shown in the new Figure S8, more objects with a diameter of 0.2–0.4 µm were found than in the immunostaining data, which fits the reviewer’s point. In addition, the counts were lower for the 0.6–1.0 µm diameter, but there was no significant difference in the number of larger LAMP1 positive structures corresponding to the ELYSA size. We consider that this was appropriate for the original purpose of characterizing the ELYSA formation process. A description of these points has been added to lines 221–225.

      Q. 13) In Figure 4A and B- Seems like not all LAMP1-positive structures were LC3-positive. Is there any size or location within the oocyte that determines LC3 positivity?

      We appreciate the valuable comment. To answer this comment, we proceeded with a new 3D object-based co-localization analysis on Lamp1 and LC3, determined the number, volume, and distribution within the oocyte, and incorporated the results as Supplementary Figure 6. To examine the positivity, we further analyzed the percentage of double-positive structures of all the LAMP1-positive structures. The results showed that their average diameter significantly shifted from 2.36 µm (GV) to 3.78 µm (MII). Moreover, it was clearly indicated that LAMP1-positive structures smaller than 2 µm in diameter are rarely positive for LC3. In terms of location, measuring the distance of the double positive structures from the oocyte center (the cellular geometric center) indicated that they tend to be observed at the periphery of both stages of oocytes (more than 80% in > 30 µm in the MII oocyte). Of note, no clear tendency of double positivity was observed. A description of these points has been added to lines 174–186.

      Q. 14) In discussion, line 256- Small ELYSAs are formed in GV oocytes. Since you haven't checked the smaller-sized, growing oocytes, I suggest rephrasing this sentence as 'are present' rather than 'are formed'.

      We agree with the reviewer’s suggestion and changed it to "present" (line 287).

      Q. 15) Line 188- ELISA should instead be ELYSA

      Thank you for pointing this out. We have found a few more typographical errors, and all of them have been corrected (lines 213 and 321).

      Reviewer #3 (Recommendations For The Authors):

      Q. 16) Line 42: What do you mean by 'zygotic gene expression following the degradation of the cellular components of each maternal and paternal gamete'? ZGA requires this degradation? Please provide supporting references from the literature.

      We apologize for the confusing wording. We meant to say that both ZGA and degradation of parental components are required. To avoid misunderstanding, we have revised “zygotic gene expression as well as the degradation of the cellular components of each maternal and paternal gamete” and inserted a new reference (line 44).

      Q. 17) 50: MII means metaphase II, not meiosis II.

      We corrected the clerical mistake (line 50).

      Q. 18) 51: Define LC3.

      We added the definition of LC3 (line 51-52).

      Q. 19) 60: 'lysosomal activity in oocytes is upregulated by sperm-derived factors as the oocytes grow and mature'. As written, the sentence implies that oocytes grow and mature after fertilization. This may be true for maturation, but I would be surprised to learn that there is growth of the oocyte after fertilization.

      We appreciate this valuable comment.

      The C. elegans lives mainly as a hermaphrodite, which contains a couple of U-shaped gonad arms including the ovary, spermatheca and uterus in the body. Oocytes grow in the ovary and maturate upon receiving major sperm proteins secreted from sperms and ovulated to the spermatheca for fertilization. In 2017, Kenyon’s group reported that major sperm proteins act as sperm-secreted hormones to upregulates the lysosomal activity in oocytes during oocyte growth and maturation. We have revised our manuscript to avoid misunderstanding, to ' lysosomal activity in oocytes is upregulated by major sperm proteins secreted from sperms as the oocytes grow and mature '. (L. 61-66).

      Q. 20) 94 and Figure 1B: While it is clear that many LAMP1 foci at the late 2-cell stage do not also contain RAB5, it seems that the majority of RAB5 loci also stain for LAMP1. This may be a minor point in the context of the paper but could be clarified.

      We could not easily agree with the suggestion because of the possibility that the images might give different impressions on each plane. Therefore, as a way to verify this point, we attempted to quantify the co-localization by reconstructing the 3D puncta information based on the two types of antibody staining data. Unfortunately, as shown in Fig. 1AB, Rab5 had a high cytoplasmic background, and although we were able to extract peaks, we could not reliably recalibrate the three-dimensional punctate structure (please refer to the new Supplementary Fig. 6). Therefore, co-localization on each other's punctate structure (LAMP1/RAB5 vs. RAB5/LAMP1) could not be verified. The validation using specific planes also showed large differences between planes, with overlapping punctate structures counted separately in adjacent planes, making reliable quantification difficult. This is an issue that will be addressed in the future.

      On the other hand, the newly added Z-projection figure (Fig. 1AB) shows that RAB5-positive and LAMP1-negative punctate structures tend to accumulate along the LAMP1-positive punctate structures larger than 1 µm at the late 2-cell stage in all observed embryos; we added this statement on lines 99–101.

      Q. 21) 100-102 and Figure 2A: Does the decrease in the total number of LAMP1 foci refer just to cytoplasmic or also to membrane foci? If the former, what was the reason for not including the membrane in the analysis?

      We appreciate the critical question. The LAMP1 signal on the plasma membrane interfered with the measurement of the signals just below the plasma membrane. The biological cause of this increased signal on the plasma membrane, as shown in Fig. 2E, seemed to be caused by the migration of the LAMP1 signals post-fertilization, which was also reported in a previous paper by Zaffagnini et al. (2024), published in Cell.

      In our analysis, oocytes are giant cells, and confocal imaging has a technical limitation in obtaining the same fluorescent intensity along the z-axis. However, 3D-object analysis requires thresholding based on absolute values. As a result of this situation, the presence of the plasma membrane signal caused punctate structures located close to the membrane to be captured and recognized as a single, very large LAMP1-positive structure, resulting in the loss of the punctate structure that should be measured.

      To avoid this issue, we have used several programs to correct the fluorescence difference along the z-axis; nonetheless, these attempts were unsuccessful. Therefore, as described in the Materials and Methods section, we applied only background subtraction at each z-position and then manually removed the plasma membrane signal (which was thin and continuous at the edges). Furthermore, when the plasma membrane and punctate structure signals overlapped, we paid attention not to remove the signals but to separate them. Thus, we believe that the decrease in the number and volume of LAMP1-positive structures after fertilization is still a phenomenon associated with the shift of LAMP1 to the plasma membrane.

      Q. 22) Figure 2B, F, G: As the x-axis does not represent a continuous variable, adjacent data points should not be connected by a line. The histogram representations in A, C, and E are much easier to understand. I suggest presenting all data in this format.

      We revised the line graphs to bar graphs. Besides, to make the significance among populations clearer, the significances are now expressed using alphabetical indicators.

      Q. 23) Figure 2B, C: It seems that the values for the different stages are expressed relative to the value at MII. Why not use the GV value at the base-line? This would follow the developmental trajectory of the oocyte/embryo more directly and would not (I believe) change the conclusions.

      We appreciated the comment. We meant to express that ELYSA develops most in the MII phase and that it decreases after fertilization, so considering the reviewer’s suggestion, we expressed GV-MII changes based on GV and changes after fertilization based on the MII phase (Fig. 2C, D).

    1. eLife Assessment

      This study convincingly shows that aquaporin-mediated cell migration plays a key role in blood vessel formation during zebrafish development. In particular, the paper implicates hydrostatic pressure and water flow as mechanisms controlling endothelial cell migration during angiogenic sprouting. This fundamental study is highly novel and significantly advances our understanding of cell migration during morphogenesis. As such, this work will be of great interest to developmental and cell biologists working on organogenesis, angiogenesis, and cell migration.

    2. Reviewer #1 (Public review):

      Summary:

      The paper details a study of endothelial cell vessel formation during zebrafish development. The results focus on the role of aquaporins, which mediate the flow of water across the cell membrane, leading to cell movement. The authors show that actin and water flow together drive endothelial cell migration and vessel formation. If any of these two elements are perturbed, there are observed defects in vessels. Overall, the paper significantly improves our understanding of cell migration during morphogenesis in organisms.

      Strengths:

      The data are extensive and are of high quality. There is a good amount of quantification with convincing statistical significance. The overall conclusion is justified given the evidence.

      Weaknesses:

      There are two weaknesses, which if addressed, would improve the paper.

      (1) The paper focuses on aquaporins, which while mediates water flow, cannot drive directional water flow. If the osmotic engine model is correct, then ion channels such as NHE1 are the driving force for water flow. Indeed this water is shown in previous studies. Moreover, NHE1 can drive water intake because the export of H+ leads to increased HCO3 due to reaction between CO2+H2O, which increases the cytoplasmic osmolarity (see Li, Zhou and Sun, Frontiers in Cell Dev. Bio. 2021). If NHE cannot be easily perturbed in zebrafish, it might be of interest to perturb Cl channels such as SWELL1, which was recently shown to work together with NHE (see Zhang, et al, Nat. Comm. 2022).

      After revision, this concern has been addressed.

      (2) In some places the discussion seems a little confusing where the text goes from hydrostatic pressure to osmotic gradient. It might improve the paper if some background is given. For example, mention water flow follows osmotic gradients, which will build up hydrostatic pressure. The osmotic gradients across the membrane are generated by active ion exchangers. This point is often confused in literature and somewhere in the intro, this could be made clearer.

      After revision, this concern has been addressed.

    3. Reviewer #3 (Public review):

      Summary:

      Kondrychyn and colleagues describe the contribution of two Aquaporins Aqp1a.1 and Aqp8a.1 towards angiogenic sprouting in the zebrafish embryo. By whole-mount in situ hybridization, RNAscope and scRNA-seq, they show that both genes are expressed in endothelial cells in partly overlapping spatiotemporal patterns. Pharmacological inhibition experiments indicate a requirement for VEGR2 signaling (but not Notch) in transcriptional activation.

      To assess the role of both genes during vascular development the authors generate genetic mutations. While homozygous single mutants appear normal, aqp1a.1;aqp8a.1 double mutants exhibit defects in EC sprouting and ISV formation.

      At the cellular level, the aquaporin mutants display a reduction of filopodia in number and length. Furthermore, a reduction in cell volume is observed indicating a defect in water uptake.

      The authors conclude, that polarized water uptake mediated by aquaporins is required for the initiation of endothelial sprouting and (tip) cell migration during ISV formation. They further propose that water influx increases hydrostatic pressure within the cells which may facilitate actin polymerization and formation membrane protrusions.

      In the revised version of the manuscript the authors have added data which show that inhibition of swell-induced chloride channels mimics aqp mutant phenotypes, giving credence to the model that water influx via aquaporins is driven by an osmotic gradient.

      Strengths:

      The authors provide a detailed analysis of Aqp1a.1 and Aqp8a.1 during blood vessel formation in vivo, using zebrafish intersomitic vessels as a model. State-of-the-art imaging demonstrates an essential role aquaporins in different aspects of endothelial cell activation and migration during angiogenesis.

      Weaknesses:

      With respect to the connection between Aqp1/8 and actin polymerization/filopodia formation, the evidence appears preliminary and the authors' interpretation is guided by evidence from other experimental systems.

      After revision, the authors have addressed all other concerns

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper details a study of endothelial cell vessel formation during zebrafish development. The results focus on the role of aquaporins, which mediate the flow of water across the cell membrane, leading to cell movement. The authors show that actin and water flow together drive endothelial cell migration and vessel formation. If any of these two elements are perturbed, there are observed defects in vessels. Overall, the paper significantly improves our understanding of cell migration during morphogenesis in organisms.

      Strengths:

      The data are extensive and are of high quality. There is a good amount of quantification with convincing statistical significance. The overall conclusion is justified given the evidence.

      Weaknesses:

      There are two weaknesses, which if addressed, would improve the paper.

      (1) The paper focuses on aquaporins, which while mediates water flow, cannot drive directional water flow. If the osmotic engine model is correct, then ion channels such as NHE1 are the driving force for water flow. Indeed this water is shown in previous studies. Moreover, NHE1 can drive water intake because the export of H+ leads to increased HCO3 due to the reaction between CO2+H2O, which increases the cytoplasmic osmolarity (see Li, Zhou and Sun, Frontiers in Cell Dev. Bio. 2021). If NHE cannot be easily perturbed in zebrafish, it might be of interest to perturb Cl channels such as SWELL1, which was recently shown to work together with NHE (see Zhang, et al, Nat. Comm. 2022).

      (2) In some places the discussion seems a little confusing where the text goes from hydrostatic pressure to osmotic gradient. It might improve the paper if some background is given. For example, mention water flow follows osmotic gradients, which will build up hydrostatic pressure. The osmotic gradients across the membrane are generated by active ion exchangers. This point is often confused in literature and somewhere in the intro, this could be made clearer.

      Reviewer #1 (Recommendations For The Authors):

      (1) The paper focuses on aquaporins, which while mediating water flow, cannot drive directional water flow. If the osmotic engine model is correct, then ion channels such as NHE1 are the driving force for water flow. Indeed this water is shown in previous studies. Moreover, NHE1 can drive water intake because the export of H+ leads to increased HCO3 due to the reaction between CO2+H2O, which increases the cytoplasmic osmolarity (see Li, Zhou and Sun, Frontiers in Cell Dev. Bio. 2021). If NHE cannot be easily perturbed in zebrafish, it might be of interest to perturb Cl channels such as SWELL1, which was recently shown to work together with NHE (see Zhang, et al, Nat. Comm. 2022).

      We thank Reviewer #1 for this very important comment and the suggestion to examine the function of ion channels in establishing an osmotic gradient to drive directional flow. We have taken on board the reviewer’s suggestion and examined the expression of NHE1 and SWELL1 in endothelial cells using published scRNAseq of 24 hpf ECs (Gurung et al, 2022, Sci. Rep.). We found that slc9a1a, slc9a6a, slc9a7, slc9a8, lrrc8aa and lrrc8ab are expressed in different endothelial subtypes. To examine the function of NHE1 and SWELL1 in endothelial cell migration, we used the pharmacological compounds, 5-(N-ethyl-Nisopropyl)amiloride (EIPA) and DCPIB, respectively. While we were unable to observe an ISV phenotype after EIPA treatment at 5, 10 and 50µM, we were able to observe impaired ISV formation after DCPIB treatment that was very similar to that observed in Aquaporin mutants. We were very encouraged by these results and proceeded to perform more detailed experiments whose results have yielded a new figure (Figure 6) and are described and discussed in lines 266 to 289 and 396 to 407, respectively, in the revised manuscript.

      (2) In some places the discussion seems a little confusing where the text goes from hydrostatic pressure to osmotic gradient. It might improve the paper if some background is given. For example, mention water flow follows osmotic gradients, which will build up hydrostatic pressure. The osmotic gradients across the membrane are generated by active ion exchangers. This point is often confused in literature and somewhere in the intro, this could be made clearer.

      Thank you for pointing out the deficiency in explaining how osmotic gradients drive water flow to build up hydrostatic pressure. We have clarified this in lines 50, 53 - 54 and 385.

      The two recommendations listed above would improve the paper. They are however not mandatory. The paper would be acceptable with some clarifying rewrites. I am not an expert on zebrafish genetics, so it might be difficult to perturb ion channels in this model organism. Have the authors tried to perturb ion channels in these cells?

      We hope that our attempts at addressing Reviewer’s 1 comments are satisfactory and sufficient to clarify the concerns outlined.

      Reviewer #2 (Public Review):

      Summary:

      Directional migration is an integral aspect of sprouting angiogenesis and requires a cell to change its shape and sense a chemotactic or growth factor stimulus. Kondrychyn I. et al. provide data that indicate a requirement for zebrafish aquaporins 1 and 8, in cellular water inflow and sprouting angiogenesis. Zebrafish mutants lacking aqp1a.1 and aqp8a.1 have significantly lower tip cell volume and migration velocity, which delays vascular development. Inhibition of actin formation and filopodia dynamics further aggravates this phenotype. The link between water inflow, hydrostatic pressure, and actin dynamics driving endothelial cell sprouting and migration during angiogenesis is highly novel.

      Strengths:

      The zebrafish genetics, microscopy imaging, and measurements performed are of very high quality. The study data and interpretations are very well-presented in this manuscript.

      Weaknesses:

      Some of the mechanobiology findings and interpretations could be strengthened by more advanced measurements and experimental manipulations. Also, a better comparison and integration of the authors' findings, with other previously published findings in mice and zebrafish would strengthen the paper.

      We thank Reviewer #2 for the critique that the paper can be strengthened by more advanced measurements and experimental manipulations. One of the technical challenges that we face is how to visualize and measure water flow directly in the zebrafish. We have therefore taken indirect approaches to assess water abundance in endothelial cells in vivo. One approach was to measure the diffusion of GEM nanoparticles in tip cell cytoplasm in wildtype and Aquaporin mutants, but results were inconclusive. The second was to measure the volume of tip cells, which should reflect water in/outflow. As the second approach produced clear and robust differences between wildtype ECs, ECs lacking Aqp1a.1 and Aqp8a.1 and ECs overexpressing Aqp1a.1 (revised Fig. 5), we decided to present these data in this manuscript.

      We have also taken Reviewer 2 advice to better incorporate previously published data in our discussion (see below and lines 374 to 383 of the revised manuscript).

      Reviewer #2 (Recommendations For The Authors):

      I have a few comments that the authors may address to further improve their manuscript analysis, quality, and impact.

      Major comments:

      (1) Citation and discussion of published literature

      The authors have failed to cite and discuss recently published results on the role of aqp1a.1 and aqp8a.1 in ISV formation and caliber in zebrafish (Chen C et al. Cardiovascular Research 2024). That study showed a similar impairment of ISV formation when aqp1a.1 is absent but demonstrated a stronger phenotype on ISV morphology in the absence of aqp8a.1 than the current manuscript by Kondrychyn I et al. Furthermore, Chen C et al show an overall decrease in ISV diameter in single aquaporin mutants suggesting that the cell volume of all ECs in an ISV is affected equally. Given this published data, are ISV diameters affected in single and double mutants in the current study by Kondrochyn I et al? An overall effect on ISVs would suggest that aquaporin-mediated cell volume changes are not an inherent feature of endothelial tip cells. The authors need to analyse/compare and discuss all differences and similarities of their findings to what has been published recently.

      We apologise for having failed and discussed the recently published paper by Chen et al. This has been corrected and discussed in lines 374 to 383.

      In the paper by Chen et al, the authors describe a role of Aqp1a.1 and Aqp8a.1 in regulating ISV diameter (ISV diameter was analysed at 48 hpf) but they did not examine the earlier stages of sprouting angiogenesis between 20 to 30 hpf, which is the focus of our study. We therefore cannot directly compare the ISV phenotypes with theirs. Nevertheless, we recognise that there are differences in ISV phenotypes from 2 dpf. For example, they did not observe incompletely formed or missing ISVs at 2 and 3 dpf, which we clearly observe in our study. This could be explained by differences in the mutations generated. In Chen et al., the sgRNA used targeted the end of exon 2 that resulted in the generation of a 169 amino acid truncated aqp1a.1 protein. However, in our approach, our sgRNA targeted exon 1 of the gene that resulted in a truncated aqp1a.1 protein that is 76 amino acid long. As for the aqp8a.1 zebrafish mutant that we generated, our sgRNA targeted exon 1 of the gene that resulted in a truncated protein that is 73 amino acids long. In Chen et al., the authors did not generate an aqp8a.1 mutant but instead used a crispant approach, which leads to genetic mosaicism and high experimental variability.

      Following the reviewer’s suggestion, we have now measured the diameters of arterial ISVs (aISVs) and venous ISVs (vISVs) in aqp1a.1<sup>-/-</sup>, aqp8a.1<sup>-/-</sup> and aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup> zebrafish. In our lab, we always make a distinction between aISVs and vISVs are their diameters are significantly different from each other. The results are in Fig S11A. While we corroborate a decrease in diameter in both aISVs and vISVs in single aqp1a.1<sup>-/-</sup> and double aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup>.zebrafish, we observed a slight increase in diameter in both aISVs and vISVs in aqp8a.1<sup>-/-</sup> zebrafish at 2 dpf. We also measured the diameter of aISV and vISV in Tg(fli1ep:aqp1a.1-mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) zebrafish at 2 dpf (Fig S11B) and unlike in Chen et al., we could not detect a difference in the diameter between control and aqp1a.1- or aqp8a.1-overexpressing endothelial cells.

      We also would also like to point out that, because ISVs are incompletely formed or are missing in aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup> zebrafish (Fig. 3G – L), blood flow is most likely altered in the zebrafish trunk of these mutants, and this can have a secondary effect on blood vessel calibre or diameter. In fact, we often observed wider ISVs adjacent to unperfused ISVs (Fig. 3J) as more blood flow enters the lumenized ISV. Therefore, to determine the cell autonomous function of Aquaporin in mediating cell volume changes in vessel diameter regulation, one would need to perform cell transplantation experiments where we would measure the volume of single aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup> endothelial cells in wildtype embryos with normal blood flow. As this is beyond the scope of the present study, we have not done this experiment during the revision process.

      (2) Expression of aqp1a.1 and aqp8a.1

      The quantification shown in Figure 1G shows a relative abundance of expression between tip and stalk cells. However, it seems aqp8a.1 is almost never detected in most tip cells. The authors could show in addition, the % of Tip and stalk cells with detectable expression of the 2 aquaporins. It seems aqp8a1 is really weakly or not expressed in the initial stages. Ofcourse the protein may have a different dynamic from the RNA.

      We would like to clarify that aqp8a.1 mRNA is not detected in tip cells of newly formed ISVs at 20hpf. At 22 hpf, it is expressed in both tip cells (22 out of 23 tip cells analysed) and stalk cells of ISVs at 22hpf. This is clarified in lines 107 - 109. We also include below a graph showing that although aqp8a.1 mRNA is expressed in tip cells, its expression is higher in stalk cells.

      Author response image 1.

      Could the authors show endogenously expressed or tagged protein by antibody staining? The analysis of the Tg(fli1ep:aqp8a.1-mEmerald)rk31 zebrafish line is a good complement, but unfortunately, it does not reveal the localization of the endogenously expressed protein. Do the authors have any data supporting that the endogenously expressed aqp8a.1 protein is present in sprouting tip cells?

      We tested several antibodies against AQP1 (Alpha Diagnostic International, AQP11-A; ThermoFisher Scientific, MA1-20214; Alomone Labs, AQP-001) and AQP8 (Sigma Aldrich, SAB 1403559; Alpha Diagnostic International, AQP81-A; Almone Labs, AQP-008) but unfortunately none worked. As such, we do not have data demonstrating endogenous expression and localisation of Aqp1a.1 and Aqp8a.1 proteins in endothelial cells.

      Could the authors perform F0 CRISPR/Cas9 mediated knockin of a small tag (i.e. HA epitope) in zebrafish and read the endogenous protein localization with anti-HA Ab?

      CRISPR/Cas9 mediated in-frame knock-in of a tag into a genomic locus is a technical challenge that our lab has not established. We therefore cannot do this experiment within the revision period.

      Given the double mutant phenotypic data shown, is aqp8a.1 expression upregulated and perhaps more important in aqp1a.1 mutants?

      In our analysis of aqp1a.1 homozygous zebrafish, there is a slight down_regulation in _aqp8a.1 expression (Fig. S5C). Because the loss of Aqp1a.1 leads to a stronger impairment in ISV formation than the loss of Aqp8a.1 (see Fig. S6F, G, I and J), we believe that Aqp1a.1 has a stronger function than Aqp8a.1 in EC migration during sprouting angiogenesis.

      Regarding the regulation of expression by the Vegfr inhibitor Ki8751, does this inhibitor affect Vegfr/ERK signalling in zebrafish and the sprouting of ISVs significantly?

      ki8751 has been demonstrated to inhibit ERK signalling in tip cells in the zebrafish by Costa et al., 2016 in Nature Cell Biology. In our experiments, treatment with 5 µM ki8751 for 6 hours from 20 hpf also inhibited sprouting of ISVs.

      The data presented suggest that tip cells overexpressing aqp1a.1-mEmerald (Figure 2C) need more than 6 times longer to migrate the same distance as tip cells expressing aqp8a.1mEmerald (Figure 2D). How does this compare with cells expressing only Emerald? A similar time difference can be seen in Movie S1 and Movie S2. Is it just a coincidence? Could aqp8a.1, when expressed at similar levels than aqp1a, be more functional and induce faster cell migration? These experiments were interpreted only for the localization of the proteins, but not for the potential role of the overexpressed proteins on function. Chen C et al. Cardiovascular Research 2024 also has some Aqp overexpression data.

      The still images prepared for Fig. 2 C and D were selected to illustrate the localization of Aqp1a.1-mEmerald and Aqp8a.1-mEmerald at the leading edge of migrating tip cells. We did not notice that the tip cell overexpressing Aqp1a.1-mEmerald (Figure 2C) needed more than 6 times longer to migrate the same distance as the tip cell expressing aqp8a.1-mEmerald (Figure 2D), which the reviewer astutely detected. To ascertain whether there is a difference in migration speed between Aqp1a.1-mEmerald and Aqp8a.1-mEmerald overexpressing endothelial cells, we measured tip cell migration velocity of three ISVs from Tg(fli1ep:aqp1a.1-mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) zebrafish during the period of ISV formation (24 to 29 hpf) using the Manual Tracking plugin in Fiji. As shown in the graph, there is no significant difference in the migration speed of ECs overexpressing Aqp1a.1-mEmerald and Aqp8a.1-mEmerald, suggesting that Aqp8a.1-overexpressing cells migrate at a similar rate as Aqp1a.1-overexpressing cells. As we have not generated a Tg(fli1ep:mEmerald) zebrafish line, we are unable to determine whether endothelial cells migrate faster in Tg(fli1ep:aqp1a.1mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) zebrafish compared to endothelial cell expressing only mEmerald. As for the observation that tip cells overexpressing aqp1a.1mEmerald (Figure 2C) need more than 6 times longer to migrate the same distance as tip cells expressing aqp8a.1-mEmerald, we can only surmise that it is coincidental that the images selected “showed” faster migration of one ISV from Tg(fli1ep:aqp8a.1-mEmerald) zebrafish. We do not know whether the Aqp1a.1 and Aqp8a.1 are overexpressed to the same levels in Tg(fli1ep:aqp1a.1mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) zebrafish.

      We would also like to point out that when we analysed the lengths of ISVs at 28 hpf in aqp1a.1<sup>-/-</sup> and aqp8a.1<sup>-/-</sup> zebrafish, ISVs were shorter in aqp1a.1<sup>-/-</sup> zebrafish compared to aqp8a.1<sup>-/-</sup> zebrafish (Fig. S6 F to J). These results indicate that the loss of Aqp1a.1 function causes slower migration than the loss of aqp8a.1 function, and suggest that Aqp1a.1 induces faster endothelial cell migration that Aqp8a.1.

      Author response image 2.

      The data on Aqps expression after the Notch inhibitor DBZ seems unnecessary, and is at the moment not properly discussed. It is also against what is set in the field. aqp8a.1 levels seem to increase only 24h after DBZ, not at 6h, and still authors conclude that Notch activation inhibits aqp8a.1 expression (Line 138-139). In the field, Notch is considered to be more active in stalk cells, where aqp8a.1 expression seems higher (not lower). Maybe the analysis of tip vs stalk cell markers in the scRNAseq data, and their correlation with Hes1/Hey1/Hey2 and aqp1 vs aqp8 mRNA levels will be more clear than just showing qRT-PCR data after DBZ.

      As our scRNAseq data did not include ECs from earlier during development when ISVs are developing, we have analysed of scRNAseq data of 24 hpf endothelial cells published by Gurung et al, 2022 in Scientific Reports during the revision of this manuscript. However, we are unable to detect separate clusters of tip and stalk cells. As such, we are unable to correlate hes1/hey1/hey2 expression (which would be higher in stalk cells) with that of aqp1a.1/aqp8a.1. Also, we have decided to remove the DBZ-treatment results from our manuscript as we agree with the two reviewers that they are unnecessary.

      The paper would also benefit from some more analysis and interpretation of available scRNAseq data in development/injury/disease/angiogenesis models (zebrafish, mice or humans) for the aquaporin genes characterized here. To potentially raise a broader interest at the start of the paper.

      We thank the reviewer for suggesting examining aquaporin genes in other angiogenesis/disease/regeneration models to expand the scope of aquaporin function. We will do this in future studies.

      (3) Role of aqp1a.1 and aqp8a.1 on cytoplasmic volume changes and related phenotypes

      In Figure 5 the authors show that Aqp1/Aqp8 mutant endothelial tip cells have a lower cytoplasmic volume than tip cells from wildtype fish. If aquaporin-mediated water inflow occurs locally at the leading edge of endothelial tip cells (Figure 2, line 314-318), why doesn't cytoplasmic volume expand specifically only at that location (as shown in immune cells by Boer et al. 2023)? Can the observed reduction in cytoplasmic volume simply be a side-effect of impaired filopodia formation (Figure 4F-I)?

      We believe that water influx not only expands filopodia but also the leading front of tip cells (see bracket region in Fig. 4D), where Aqp1a.1-mEmerald/Aqp8a.1-mEmerald accumulate (Fig. 2), to generate an elongated protrusion and forward expansion of the tip cell. The decrease in cytoplasmic volume observed in the aqp1a.1;aqp8a.1 double mutant zebrafish is a result of decreased formation of these elongated protrusions at the leading front of migration tip cells as shown in Fig. 4E (compare to Fig. 4D), not from just a decrease in filopodia number. In fact, in the method used to quantify cell volume, mEmerald/EGFP localization is limited to the cytoplasm and does not label filopodia well (compare mEmerald/EGFP in green with membrane tagged-mCherry in Fig. 5A - C). The volume measured therefore reflects cytoplasmic volume of the tip cell, not filopodia volume.

      Do the authors have data on cytoplasmic volume changes of endothelial tip cells in latrunculin B treated fish? The images in Figures 6 A,B suggest that there is a difference in cell volume upon lat b treatment only.

      No, unfortunately we have not performed single cell labelling and measurement of tip cells in Latrunculin B-treated embryos. We can speculate that as there is a decrease in actindriven membrane protrusions in this experiment, one would also expect a decrease in cell volume as the reviewer has observed.

      (4) Combined loss of aquaporins and actin-based force generation.

      Lines 331-332 " we show that hydrostatic pressure is the driving force for EC migration in the absence of actin-based force generation"....better leave it more open and stick to the data. The authors show that aquaporin-mediated water inflow partially compensates for the loss of actin-based force generation in cell migration. Not that it is the key driving/rescuing force in the absence of actin-based force.

      We have changed it to “we show that hydrostatic pressure can generate force for EC migration in the absence of actin-based force generation” in line 348.

      (5) Aquaporins and their role in EC proliferation

      In the study by Phnk LK et al. 2013, the authors have shown that proliferation is not affected when actin polymerization or filopodia formation is inhibited. However, in the current manuscript by Kondrychyn I. et al. this has not been analysed carefully. In Movie S4 the authors indicate by arrows tip cells that fail to invade the zebrafish trunk demonstrating a severe defect of sprouting initiation in these mutants. Yet, when only looking at ISVs that reach the dorsal side in Movie S4, it appears that they are comprised of fewer EC nuclei/ISV than the ISVs in Movie S3. At the beginning of DLAV formation, most ISVs in control Movie S3 consist of 3-4 EC nuclei, while in double mutants Movie S4 it appears to be only 2-3 EC nuclei. At the end of the Movie S4, one ISV on the left side even appears to consist of only a single EC when touching the dorsal roof. The authors provide convincing data on how the absence of aquaporin channels affects sprouting initiation and migration speed, resulting in severe delay in ISV formation. However, the authors should also analyse EC proliferation, as it may also be affected in these mutants, and may also contribute to the observed phenotype. We know that effects on cell migration may indirectly change the number of cells and proliferation at the ISVs, but this has not been carefully analysed in this paper.

      We thank the reviewer for highlighting the lack of information on EC number and division in the aquaporin mutants. We have now quantified EC number in ISVs that are fully formed (i.e. connecting the DA or PCV to the DLAV) at 2 and 3 dpf and the results are displayed in Figure S10A and B. At 2 dpf, there is a slight but significant reduction in EC number in both aISVs and vISVs in aqp1a.1<sup>-/-</sup> zebrafish and an even greater reduction in the double aqp1a. aqp1a.1<sup>/-</sup>;aqp8a.1<sup>-/-</sup> zebrafish. No significant change in EC number was observed in aqp8a.1<sup>-/-</sup> zebrafish. EC number was also significantly decreased at 3 dpf for aqp1a.1<sup>-/-</sup>, aqp8a.1<sup>-/-</sup> and aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup> zebrafish. The decreased in EC number per ISV may therefore contribute to the observed phenotype.

      We have also quantified the number of cell divisions during sprouting angiogenesis (from 21 to 30 hpf) to assess whether the lack of Aquaporin function affects EC proliferation. This analysis shows that there is no significant difference in the number of mitotic events between aqp1a.1<sup>+/-</sup>; aqp8a.1<sup>+/-</sup> and aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup> zebrafish (Figure S10 C), suggesting that the reduction in EC number is not caused by a decrease in EC proliferation.

      These new data are reported on lines 198 to 205 of the manuscript.

      Minor comments:

      - Figure 3K data seems not to be necessary and even partially misleading after seeing Figure 3E. Fig. 3E represents the true strength of the phenotype in the different mutants.

      Figure 3K has been removed from Figure 3.

      - Typo Figure 3L (VII should be VI).

      Thank you for spotting this typo. VII has been changed to VI.

      - Line 242: The word "required" is too strong because there is vessel formation without Aqps in endothelial cells.

      This has been changed to “ …Aqp1a.1 and Aqp8a.1 regulate sprouting angiogenesis…” (lines 238 - 239).

      - From Figure S2, the doublets cluster should be removed.

      We have performed a new analysis of 24 hpf, 34hpf and 3 dpf endothelial cells scRNAseq data (the previous analysis did not consist of 24 hpf endothelial cells). The doublets cluster is not included in the UMAP analysis.

      - Better indicate the fluorescence markers/alleles/transgenes used for imaging in Figures 6A-D.

      The transgenic lines used for this experiment are now indicated in the figure (this figure is now Figure 7).

      Reviewer #3 (Public Review):

      Summary:

      Kondrychyn and colleagues describe the contribution of two Aquaporins Aqp1a.1 and Aqp8a.1 towards angiogenic sprouting in the zebrafish embryo. By whole-mount in situ hybridization, RNAscope, and scRNA-seq, they show that both genes are expressed in endothelial cells in partly overlapping spatiotemporal patterns. Pharmacological inhibition experiments indicate a requirement for VEGR2 signaling (but not Notch) in transcriptional activation.

      To assess the role of both genes during vascular development the authors generate genetic mutations. While homozygous single mutants appear normal, aqp1a.1;aqp8a.1 double mutants exhibit defects in EC sprouting and ISV formation.

      At the cellular level, the aquaporin mutants display a reduction of filopodia in number and length. Furthermore, a reduction in cell volume is observed indicating a defect in water uptake.

      The authors conclude, that polarized water uptake mediated by aquaporins is required for the initiation of endothelial sprouting and (tip) cell migration during ISV formation. They further propose that water influx increases hydrostatic pressure within the cells which may facilitate actin polymerization and formation membrane protrusions.

      Strengths:

      The authors provide a detailed analysis of Aqp1a.1 and Aqp8a.1 during blood vessel formation in vivo, using zebrafish intersomitic vessels as a model. State-of-the-art imaging demonstrates an essential role in aquaporins in different aspects of endothelial cell activation and migration during angiogenesis.

      Weaknesses:

      With respect to the connection between Aqp1/8 and actin polymerization/filopodia formation, the evidence appears preliminary and the authors' interpretation is guided by evidence from other experimental systems.

      Reviewer #3 (Recommendations For The Authors):

      Figure 1 H, J:

      The differential response of aqp1/-8 to ki8751 vs DBZ after 6h treatment is quite obvious. Why do the authors show the effect after 24h? The effect is more likely than not indirect.

      We agree with the reviewer and we have now removed 24 hour Ki8751 treatment and all DBZ treatments from Figure 1.

      Figure 2:

      According to the authors' model anterior localization of Aqp1 protein is critical. The authors perform transient injections to mosaically express Aqp fusion proteins using an endothelial (fli1) promoter. For the interpretation, it would be helpful to also show the mCherry-CAAX channel in separate panels. From the images, it is not possible to discern how many cells we are looking at. In particular the movie in panel D may show two cells at the tip of the sprout. A marker labelling cell-cell junctions would help. Furthermore, the authors are using a strong exogenous promoter, thus potentially overexpressing the fusion protein, which may lead to mislocalization. For Aqp1a.1 an antibody has been published to work in zebrafish (e.g. Kwong et al., Plos1, 2013).

      We would like to clarify that we generated transgenic lines - Tg(fli1ep:aqp1a.1-mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) - to visualize the localization of Aqp1a.1 and Aqp8a.1 in endothelial cells, and the images displayed in Fig. 2 are from the transgenic lines (not transient, mosaic expression).

      To aid visualization and interpretation, we have now added mCherry-CAAX only channel to accompany the Aqp1a.1/Aqp8a.1-mEmerald channel in Fig. 2A and B. To discern how many cells there are in the ISVs at this stage, we have crossed Tg(fli1ep:aqp1a.1-mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) zebrafish to TgKI(tjp1a-tdTomato)<sup>pd1224</sup> (Levic et al., 2021) to visualize ZO1 at cell-cell junction. However, because tjp1-tdTomato is expressed in all cell types including the skin that lies just above the ISV and the signal in ECs in ISVs is very weak at 22 to 25 hpf, it was very difficult to obtain good quality images that can properly delineate cell boundaries to determine the number of cells in the ISVs at this early stage. Instead, we have annotated endothelial cell boundaries based on more intense mCherryCAAX fluorescence at cell-cell borders, and from the mosaic expression of mCherryCAAX that is intrinsic to the  Tg(kdrl:ras-mCherry)<sup>s916</sup> zebrafish line.

      In Fig. 2D, there are two endothelial cells in the ISV during the period shown but there is only 1 cell occupying the tip cell position i.e. there is one tip cell in this ISV. Unlike the mouse retina where it has been demonstrated that two endothelial cells can occupy the tip cell position side-by-side (Pelton et al., 2014), this is usually not observed in zebrafish ISVs. This is demonstrated in Movie S3, where it is clear that one nucleus (belonging to the tip cell) occupies the tip of the growing ISV. The accumulation of intracellular membranes is often observed in tip cells that may serve as a reservoir of membranes for the generation of membrane protrusions at the leading edge of tip cells.

      We agree that by generating transgenic Tg(fli1ep:aqp1a.1-mEmerald) and Tg(fli1ep:aqp8a.1mEmerald) zebrafish, Aqp1a.1 and Aqp8a.1 are overexpressed that may affect their localization. The eel anti-Aqp1a.1 antibody used in (Kwong et la., 2013) was a gift from Dr. Gordon Cramb, Univ. of St Andrews, Scotland and it was first published in 2001. This antibody is not available commercially. Instead, we have tried to several other antibodies against AQP1 (Alpha Diagnostic International , AQP11-A; ThermoFisher Scientific, MA120214; Alomone Labs, AQP-001) and AQP8 (Sigma Aldrich, SAB 1403559; Alpha Diagnostic International, AQP81-A; Almone Labs, AQP-008) but unfortunately none worked. As such, we cannot compare localization of Aqp1a.1-mEmerald and Aqp8a.1-mEmerald with the endogenous proteins.

      Figure 3:

      E: the quantification is difficult to read. Wouldn't it be better to set the y-axis in % of the DV axis? (see also Figure S6).

      We would like to show the absolute length of the ISVs, and to illustrate that the ISV length decreases from anterior to posterior of the zebrafish trunk. We have increased the size of Fig. 3E to enable easier reading of the bars.

      K: This quantification appears arbitrary.

      We have removed this panel from Figure 3.

      G-J: The magenta channel is difficult to see. Is the lifeact-mCherry mosaic? In panel J there appears to be a nucleus between the sprout and the DLAV. It would be helpful to crop the contralateral side of the image.

      No, the Tg(fli1:Lifeact-mCherry) line is not mosaic. The “missing” vessels are not because of mosaicism in transgene but because of truncated ISVs that is a phenotype of loss Aquaporin function. We have changed the magenta channel to grey and hope that by doing so, the reviewer will be able to see the shape of the blood vessels more clearly. We would like to leave the contralateral side in the images, as it shows that the defective vessel is only on one side of body. Furthermore, when we tried to remove it (reducing the number of Z-stacks) neighbour ISV looks incomplete because the embryos were not mounted flat. To clarify what the nucleus between the sprout and the DLAV is, we have indicated that it is that of the contralateral ISV.

      L: I do not quite understand the significance of the different classes of phenotypes. Do the authors propose different morphogenetic events or contexts of how these differences come about?

      Here, we report the different types of ISV phenotypes that we observe in 3 dpf aqp1a.1<sup>-/-</sup>; aqp8a.1<sup>-/-</sup> zebrafish (Fig. 3 and Fig. S7). As demonstrated in Fig. 4, most of the phenotypes can be explained by the delayed emergence of tip cells from the dorsal aorta and slower tip cell migration. However, in some instances, we also observed retraction of tip cells (Movie S4) and failure of tip cells to emerge from the dorsal aorta or endothelial cell death (see attached figure on page 14), which can give rise to the Class II phenotype. In the dominant class I phenotype (in contrast to class II), secondary sprouting from the posterior cardinal vein is unaffected, and the secondary sprout migrates dorsally passing the level of horizontal myoseptum but cannot complete the formation of vISV (it stops beneath the spinal cord). The Class III phenotype appears to result from a failure of the secondary sprout to fuse with the regressed primary ISV. In the Class IV phenotype, the ventral EC does not maintain a connection to the dorsal aorta. We did not examine how Class III and IV phenotypes arise in detail in this current study.

      Author response image 3.

      Figure 4:

      This figure nicely demonstrates the defects in cell behavior in aqp mutants.

      In panel F it would be helpful to show the single channels as well as the merge.

      We have now added single channels for PLCd1PH and Lifeact signal in panels F and G.

      In Figure 1 the authors argue that the reduction of Aqp1/8 by VEGFR2 inhibition may account for part of that phenotype. In turn, the aqp phenotype seems to resemble incomplete VEGFR2 inhibition. The authors should check whether expression Aqp1Emerald can partially rescue ki8751 inhibition.

      To address the reviewer’s comment, we have treated Tg(fli1ep:Aqp1-Emerald) embryos with ki8751 from 20 hpf for 6 hours but we were unable to observe a rescue in sprouting. It could be because VEGFR2 inhibition also affects other downstream signalling pathways that also control cell migration as well as proliferation.

      Based on previous studies (Loitto et al.; Papadopoulus et al.) the authors propose that also in ISVs aquaporin-mediated water influx may promote actin polymerization and thereby filopodia formation. However, while the effect on filopodia number and length is well demonstrated, the underlying cause is less clear. For example, filopodia formation could be affected by reduced cell polarization. This can be tested by using a transgenic golgi marker (Kwon et al., 2016).

      We have examined tip cell polarity of wildtype, aqp1a.1<sup>-/-</sup> and  aqp8a. 1<sup>-/-</sup> embryos at 24-26 hpf by analysing Golgi position relative to the nucleus. We were unable to analyze polarity in  aqp1a.1<sup>rk28/rk28</sup>; aqp8a.1<sup>rk29/rk29</sup> embryos as they exist in an mCherry-containing transgenic zebrafish line (the Golgi marker is also tagged to mCherry). The results show that tip cell polarity is similar, if not more polarised, in aqp1a.1<sup>-/-</sup> and  aqp8a. 1<sup>-/-</sup> embryos when compared to wildtype embryos (Fig. S10D). This new data is discussed in lines 234 to 237.

      Figure 5:

      Panel D should be part of Figure 4.

      Panel 5D is now in panel J of Figure 4 and described in lines 231 and 235.

    1. eLife Assessment

      This important study presents compelling observational data supporting a role for transcription and polysome accumulation in the separation of newly replicated bacterial chromosomes. The study is generally thorough and rigorous in nature, although there are several instances where revisions would help clarify for the reader that the evidence is primarily circumstantial in nature and that a direct causal relationship between polysome accumulation has yet to be tested. With regard to the latter, the model's predictions could possibly be tested by examining the impact of translation inhibitors on nucleoid organisation. The authors could also compare the radial dimensions of the nucleoid with cell width to confirm that the nucleoid is radially confined across all conditions, a critical assumption of the model.

    2. Reviewer #1 (Public review):

      Summary:

      This paper is an elegant, mostly observational work, detailing observations that polysome accumulation appears to drive nucleoid splitting and segregation. Overall I think this is an insightful work with solid observations.

      Strengths:

      The strengths of this paper are the careful and rigorous observational work that leads to their hypothesis. They find the accumulation of polysomes correlates with nucleoid splitting, and that the nucleoid segregation occurring right after splitting correlates with polysome segregation. These correlations are also backed up by other observations:

      (1) Faster polysome accumulation and DNA segregation at faster growth rates.<br /> (2) Polysome distribution negatively correlating with DNA positioning near asymmetric nucleoids.<br /> (3) Polysomes form in regions inaccessible to similarly sized particles.

      These above points are observational, I have no comments on these observations leading to their hypothesis.

      Weaknesses:

      It is hard to state weaknesses in any of the observational findings, and furthermore, their two tests of causality, while not being completely definitive, are likely the best one could do to examine this interesting phenomenon.

      Points to consider / address:

      Notably, demonstrating causality here is very difficult (given the coupling between transcription, growth, and many other processes) but an important part of the paper. They do two experiments toward demonstrating causality that help bolster - but not prove - their hypothesis. These experiments have minor caveats, my first two points.

      (1) First, "Blocking transcription (with rifampicin) should instantly reduce the rate of polysome production to zero, causing an immediate arrest of nucleoid segregation". Here they show that adding rifampicin does indeed lead to polysome loss and an immediate halting of segregation - data that does fit their model. This is not definitive proof of causation, as rifampicin also (a) stops cell growth, and (b) stops the translation of secreted proteins. Neither of these two possibilities is ruled out fully.

      1a) As rifampicin also stops all translation, it also stops translational insertion of membrane proteins, which in many old models has been put forward as a possible driver of nucleoid segregation, and perhaps independent of growth. This should at last be mentioned in the discussion, or if there are past experiments that rule this out it would be great to note them.

      1b) They address at great length in the discussion the possibility that growth may play a role in nucleoid segregation. However, this is testable - by stopping surface growth with antibiotics. Cells should still accumulate polysomes for some time, it would be easy to see if nucleoids are still segregated, and to what extent, thereby possibly decoupling growth and polysome production. If successful, this or similar experiments would further validate their model.

      (2) In the second experiment, they express excess TagBFP2 to delocalize polysomes from midcell. Here they again see the anticorrelation of the nucleoid and the polysomes, and in some cells, it appears similar to normal (polysomes separating the nucleoid) whereas in others the nucleoid has not separated. The one concern about this data - and the differences between the "separated" and "non-separated" nuclei - is that the over-expression of TagBFP2 has a huge impact on growth, which may also have an indirect effect on DNA replication and termination in some of these cells. Could the authors demonstrate these cells contain 2 fully replicated DNA molecules that are able to segregate?

      (3) What is not clearly stated and is needed in this paper is to explain how polysomes do (or could) "exert force" in this system to segregate the nucleoid: what a "compaction force" is by definition, and what mechanisms causes this to arise (what causes the "force") as the "compaction force" arises from new polysomes being added into the gaps between them caused by thermal motions.

      They state, "polysomes exert an effective force", and they note their model requires "steric effects (repulsion) between DNA and polysomes" for the polysomes to segregate, which makes sense. But this makes it unclear to the reader what is giving the force. As written, it is unclear if (a) these repulsions alone are making the force, or (b) is it the accumulation of new polysomes in the center by adding more "repulsive" material, the force causes the nucleoids to move. If polysomes are concentrated more between nucleoids, and the polysome concentration does not increase, the DNA will not be driven apart (as in the first case) However, in the second case (which seems to be their model), the addition of new material (new polysomes) into a sterically crowded space is not exerting force - it is filling in the gaps between the molecules in that region, space that needs to arise somehow (like via Brownian motion). In other words, if the polysome region is crowded with polysomes, space must be made between these polysomes for new polysomes to be inserted, and this space must be made by thermal (or ATP-driven) fluctuations of the molecules. Thus, if polysome accumulation drives the DNA segregation, it is not "exerting force", but rather the addition of new polysomes is iteratively rectifying gaps being made by Brownian motion.

      The authors use polysome accumulation and phase separation to describe what is driving nucleoid segregation. Both terms are accurate, but it might help the less physically inclined reader to have one term, or have what each of these means explicitly defined at the start. I say this most especially in terms of "phase separation", as the currently huge momentum toward liquid-liquid interactions in biology causes the phrase "phase separation" to often evoke a number of wider (and less defined) phenomena and ideas that may not apply here. Thus, a simple clear definition at the start might help some readers.

      (4) Line 478. "Altogether, these results support the notion that ectopic polysome accumulation drives nucleoid dynamics". Is this right? Should it not read "results support the notion that ectopic polysome accumulation inhibits/redirects nucleoid dynamics"?

      (5) It would be helpful to clarify what happens as the RplA-GFP signal decreases at midcell in Figure 1- is the signal then increasing in the less "dense" parts of the cell? That is, (a) are the polysomes at midcell redistributing throughout the cell? (b) is the total concentration of polysomes in the entire cell increasing over time?

      (6) Line 154. "Cell constriction contributed to the apparent depletion of ribosomal signal from the mid-cell region at the end of the cell division cycle (Figure 1B-C and Movie S1)" - It would be helpful if when cell constriction began and ended was indicated in Figures 1B and C.

      (7) In Figure 7 they demonstrate that radial confinement is needed for longitudinal nucleoid segregation. It should be noted (and cited) that past experiments of Bacillus l-forms in microfluidic channels showed a clear requirement role for rod shape (and a given width) in the positing and the spacing of the nucleoids.<br /> Wu et al, Nature Communications, 2020 . "Geometric principles underlying the proliferation of a model cell system" https://dx.doi.org/10.1038/s41467-020-17988-7

      (8) "The correlated variability in polysome and nucleoid patterning across cells suggests that the size of the polysome-depleted spaces helps determine where the chromosomal DNA is most concentrated along the cell length. This patterning is likely reinforced through the displacement of the polysomes away from the DNA dense region"

      It should be noted this likely functions not just in one direction (polysomes dictating DNA location), but also in the reverse - as the footprint of compacted DNA should also exclude (and thus affect) the location of polysomes

      (9) Line 159. Rifampicin is a transcription inhibitor that causes polysome depletion over time. This indicates that all ribosomal enrichments consist of polysomes and therefore will be referred to as polysome accumulations hereafter". Here and throughout this paper they use the term polysome, but cells also have monosomes (and 2 somes, etc). Rifampicin stops the assembly of all of these, and thus the loss of localization could occur from both. Thus, is it accurate to state that all transcription events occur in polysomes? Or are they grouping all of the n-somes into one group?

    3. Reviewer #2 (Public review):

      Summary:

      The authors perform a remarkably comprehensive, rigorous, and extensive investigation into the spatiotemporal dynamics between ribosomal accumulation, nucleoid segregation, and cell division. Using detailed experimental characterization and rigorous physical models, they offer a compelling argument that nucleoid segregation rates are determined at least in part by the accumulation of ribosomes in the center of the cell, exerting a steric force to drive nucleoid segregation prior to cell division. This evolutionarily ingenious mechanism means cells can rely on ribosomal biogenesis as the sole determinant for the growth rate and cell division rate, avoiding the need for two separate 'sensors,' which would require careful coupling.

      Strengths:

      In terms of strengths; the paper is very well written, the data are of extremely high quality, and the work is of fundamental importance to the field of cell growth and division. This is an important and innovative discovery enabled through a combination of rigorous experimental work and innovative conceptual, statistical, and physical modeling.

      Weaknesses:

      In terms of weaknesses, I have three specific thoughts.

      Firstly, my biggest question (and this may or may not be a bona fide weakness) is how unambiguously the authors can be sure their ribosomal labeling is reporting on polysomes, specifically. My reading of the work is that the loss of spatial density upon rifampicin treatment is used to infer that spatial density corresponds to polysomes, yet this feels like a relatively indirect way to get at this question, given rifampicin targets RNA polymerase and not translation. It would be good if a more direct way to confirm polysome dependence were possible.

      Second, the authors invoke a phase separation model to explain the data, yet it is unclear whether there is any particular evidence supporting such a model, whether they can exclude simpler models of entanglement/local diffusion (and/or perhaps this is what is meant by phase separation?) and it's not clear if claiming phase separation offers any additional insight/predictive power/utility. I am OK with this being proposed as a hypothesis/idea/working model, and I agree the model is consistent with the data, BUT I also feel other models are consistent with the data. I also very much do not think that this specific aspect of the paper has any bearing on the paper's impact and importance.

      Finally, the writing and the figures are of extremely high quality, but the sheer volume of data here is potentially overwhelming. I wonder if there is any way for the authors to consider stripping down the text/figures to streamline things a bit? I also think it would be useful to include visually consistent schematics of the question/hypothesis/idea each of the figures is addressing to help keep readers on the same page as to what is going on in each figure. Again, there was no figure or section I felt was particularly unclear, but the sheer volume of text/data made reading this quite the mental endurance sport! I am completely guilty of this myself, so I don't think I have any super strong suggestions for how to fix this, but just something to consider.

    4. Reviewer #3 (Public review):

      Summary:

      Papagiannakis et al. present a detailed study exploring the relationship between DNA/polysome phase separation and nucleoid segregation in Escherichia coli. Using a combination of experiments and modelling, the authors aim to link physical principles with biological processes to better understand nucleoid organisation and segregation during cell growth.

      Strengths:

      The authors have conducted a large number of experiments under different growth conditions and physiological perturbations (using antibiotics) to analyse the biophysical factors underlying the spatial organisation of nucleoids within growing E. coli cells. A simple model of ribosome-nucleoid segregation has been developed to explain the observations.

      Weaknesses:

      While the study addresses an important topic, several aspects of the modelling, assumptions, and claims warrant further consideration.

      Major Concerns:

      Oversimplification of Modelling Assumptions:

      The model simplifies nucleoid organisation by focusing on the axial (long-axis) dimension of the cell while neglecting the radial dimension (cell width). While this approach simplifies the model, it fails to explain key experimental observations, such as:

      (1) Inconsistencies with Experimental Evidence:

      The simplified model presented in this study predicts that translation-inhibiting drugs like chloramphenicol would maintain separated nucleoids due to increased polysome fractions. However, experimental evidence shows the opposite-separated nucleoids condense into a single lobe post-treatment (Bakshi et al 2014), indicating limitations in the model's assumptions/predictions. For the nucleoids to coalesce into a single lobe, polysomes must cross the nucleoid zones via the radial shells around the nucleoid lobes.

      (2) The peripheral localisation of nucleoids observed after A22 treatment in this study and others (e.g., Japaridze et al., 2020; Wu et al., 2019), which conflicts with the model's assumptions and predictions. The assumption of radial confinement would predict nucleoids to fill up the volume or ribosomes to go near the cell wall, not the nucleoid, as seen in the data.

      (3) The radial compaction of the nucleoid upon rifampicin or chloramphenicol treatment, as reported by Bakshi et al. (2014) and Spahn et al. (2023), also contradicts the model's predictions. This is not expected if the nucleoid is already radially confined.

      (4) Radial Distribution of Nucleoid and Ribosomal Shell:

      The study does not account for well-documented features such as the membrane attachment of chromosomes and the ribosomal shell surrounding the nucleoid, observed in super-resolution studies (Bakshi et al., 2012; Sanamrad et al., 2014). These features are critical for understanding nucleoid dynamics, particularly under conditions of transcription-translation coupling or drug-induced detachment. Work by Yongren et al. (2014) has also shown that the radial organisation of the nucleoid is highly sensitive to growth and the multifork nature of DNA replication in bacteria.

      The omission of organisation in the radial dimension and the entropic effects it entails, such as ribosome localisation near the membrane and nucleoid centralisation in expanded cells, undermines the model's explanatory power and predictive ability. Some observations have been previously explained by the membrane attachment of nucleoids (a hypothesis proposed by Rabinovitch et al., 2003, and supported by experiments from Bakshi et al., 2014, and recent super-resolution measurements by Spahn et al.).

      Ignoring the radial dimension and membrane attachment of nucleoid (which might coordinate cell growth with nucleoid expansion and segregation) presents a simplistic but potentially misleading picture of the underlying factors.

      This reviewer suggests that the authors consider an alternative mechanism, supported by strong experimental evidence, as a potential explanation for the observed phenomena:<br /> Nucleoids may transiently attach to the cell membrane, possibly through transertion, allowing for coordinated increases in nucleoid volume and length alongside cell growth and DNA replication. Polysomes likely occupy cellular spaces devoid of the nucleoid, contributing to nucleoid compaction due to mutual exclusion effects. After the nucleoids separate following ter separation, axial expansion of the cell membrane could lead to their spatial separation.

      Incorporating this perspective into the discussion or future iterations of the model may provide a more comprehensive framework that aligns with the experimental observations in this study and previous work.

      Simplification of Ribosome States:<br /> Combining monomeric and translating ribosomes into a single 'polysome' category may overlook spatial variations in these states, particularly during ribosome accumulation at the mid-cell. Without validating uniform mRNA distribution or conducting experimental controls such as FRAP or single-molecule measurements to estimate the proportions of ribosome states based on diffusion, this assumption remains speculative.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper is an elegant, mostly observational work, detailing observations that polysome accumulation appears to drive nucleoid splitting and segregation. Overall I think this is an insightful work with solid observations.

      Thank you for your appreciation and positive comments. In our view, an appealing aspect of this proposed biophysical mechanism for nucleoid segregation is its self-organizing nature and its ability to intrinsically couple nucleoid segregation to biomass growth, regardless of nutrient conditions.

      Strengths:

      The strengths of this paper are the careful and rigorous observational work that leads to their hypothesis. They find the accumulation of polysomes correlates with nucleoid splitting, and that the nucleoid segregation occurring right after splitting correlates with polysome segregation. These correlations are also backed up by other observations:

      (1) Faster polysome accumulation and DNA segregation at faster growth rates.

      (2) Polysome distribution negatively correlating with DNA positioning near asymmetric nucleoids.

      (3) Polysomes form in regions inaccessible to similarly sized particles.

      These above points are observational, I have no comments on these observations leading to their hypothesis.

      Thank you!

      Weaknesses:

      It is hard to state weaknesses in any of the observational findings, and furthermore, their two tests of causality, while not being completely definitive, are likely the best one could do to examine this interesting phenomenon.

      It is indeed difficult to prove causality in a definitive manner when the proposed coupling mechanism between nucleoid segregation and gene expression is self-organizing, i.e., does not involve a dedicated regulatory molecule (e.g., a protein, RNA, metabolite) that we could have depleted through genetic engineering to establish causality. We are grateful to the reviewer for recognizing that our two causality tests are the best that can be done in this context.

      Points to consider / address:

      Notably, demonstrating causality here is very difficult (given the coupling between transcription, growth, and many other processes) but an important part of the paper. They do two experiments toward demonstrating causality that help bolster - but not prove - their hypothesis. These experiments have minor caveats, my first two points.

      (1) First, "Blocking transcription (with rifampicin) should instantly reduce the rate of polysome production to zero, causing an immediate arrest of nucleoid segregation". Here they show that adding rifampicin does indeed lead to polysome loss and an immediate halting of segregation - data that does fit their model. This is not definitive proof of causation, as rifampicin also (a) stops cell growth, and (b) stops the translation of secreted proteins. Neither of these two possibilities is ruled out fully.

      That’s correct; cell growth also stops when gene expression is inhibited, which is consistent with our model in which gene expression within the nucleoid promotes nucleoid segregation and biomass growth (i.e., cell growth), inherently coupling these two processes. This said, we understand the reviewer’s point: the rifampicin experiment doesn’t exclude the possibility that protein secretion and cell growth drive nucleoid segregation. We are assuming that the reviewer is envisioning an alternative model in which sister nucleoids would move apart because they would be attached to the membrane through coupled transcription-translation-protein secretion (transertion) and the membrane would expand between the separating nucleoids, similar to the model proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several observations arguing against this cell elongation/transertion model.

      (1) For this alternative mechanism to work, membrane growth must be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation. Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To circumvent the membrane fluidity issue, one could potentially evoke an additional connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid. However, peptidoglycan growth is dispersed early in the cell division cycle when the nucleoid splitting happens in fast growing cells and only appears to be zonal after the onset of cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In the revised manuscript, we wil clarify this point and provide confirmatory data showing that the cell elongation rate is indeed slower than the nucleoid segregation rate, indicating that it cannot be the main driver.

      (3) Furthermore, our correlation analysis comparing the rate of nucleoid segregation to the rate of either cell elongation or polysome accumulation argues that polysome accumulation plays a larger role than cell elongation in nucleoid segregation. These data were already shown in Figure 1H and Figure 1 – figure supplement 3 of the original manuscript but were not highlighted in this context. We will revise the text to clarify this point.

      (4) The asymmetries in nucleoid compaction that we described in our paper are predicted by our model. We do not see how they could be explained by cell growth or protein secretion.

      (5) We also show that polysome accumulation at ectopic sites (outside the nucleoid) results in correlated nucleoid dynamics, consistent with our proposed mechanism. These nucleoid dynamics cannot be explained by cell growth or protein secretion (transertion).

      (1a) As rifampicin also stops all translation, it also stops translational insertion of membrane proteins, which in many old models has been put forward as a possible driver of nucleoid segregation, and perhaps independent of growth. This should at last be mentioned in the discussion, or if there are past experiments that rule this out it would be great to note them.

      It is not clear to us how the attachment of the DNA to the cytoplasmic membrane could alone create a directional force to move the sister nucleoids. We agree that old models have proposed a role for cell elongation (providing the force) and transertion (providing the membrane tether).  Please see our response above for the evidence (from the literature and our work) against it. This was mentioned in the introduction and Results section, but we agree that this was not well explained. We will add experimental data and revise the text to clarify these points.

      (1b) They address at great length in the discussion the possibility that growth may play a role in nucleoid segregation. However, this is testable - by stopping surface growth with antibiotics. Cells should still accumulate polysomes for some time, it would be easy to see if nucleoids are still segregated, and to what extent, thereby possibly decoupling growth and polysome production. If successful, this or similar experiments would further validate their model.

      We reviewed the literature and could not find a drug that stops cell growth without stopping gene expression. Any drug that affects the membrane integrity or potential stops gene expression, which requires ATP.  However, our experiment in which we drive polysome accumulation at ectopic sites decouples polysome accumulation from cell growth. In this experiment, by redirecting most of chromosome gene expression to a single plasmid-encoded gene, we reduce the rate of cell growth but still create a large accumulation of polysomes at an ectopic location. This ectopic polysome accumulation is sufficient to affect nucleoid dynamics in a correlated fashion. In the revised manuscript, we will clarify this point and add model simulations to show that our experimental observations are predicted by our model.

      (2) In the second experiment, they express excess TagBFP2 to delocalize polysomes from midcell. Here they again see the anticorrelation of the nucleoid and the polysomes, and in some cells, it appears similar to normal (polysomes separating the nucleoid) whereas in others the nucleoid has not separated. The one concern about this data - and the differences between the "separated" and "non-separated" nuclei - is that the over-expression of TagBFP2 has a huge impact on growth, which may also have an indirect effect on DNA replication and termination in some of these cells. Could the authors demonstrate these cells contain 2 fully replicated DNA molecules that are able to segregate?

      We will perform the requested experiment.

      (3) What is not clearly stated and is needed in this paper is to explain how polysomes do (or could) "exert force" in this system to segregate the nucleoid: what a "compaction force" is by definition, and what mechanisms causes this to arise (what causes the "force") as the "compaction force" arises from new polysomes being added into the gaps between them caused by thermal motions.

      They state, "polysomes exert an effective force", and they note their model requires "steric effects (repulsion) between DNA and polysomes" for the polysomes to segregate, which makes sense. But this makes it unclear to the reader what is giving the force. As written, it is unclear if (a) these repulsions alone are making the force, or (b) is it the accumulation of new polysomes in the center by adding more "repulsive" material, the force causes the nucleoids to move. If polysomes are concentrated more between nucleoids, and the polysome concentration does not increase, the DNA will not be driven apart (as in the first case) However, in the second case (which seems to be their model), the addition of new material (new polysomes) into a sterically crowded space is not exerting force - it is filling in the gaps between the molecules in that region, space that needs to arise somehow (like via Brownian motion). In other words, if the polysome region is crowded with polysomes, space must be made between these polysomes for new polysomes to be inserted, and this space must be made by thermal (or ATP-driven) fluctuations of the molecules. Thus, if polysome accumulation drives the DNA segregation, it is not "exerting force", but rather the addition of new polysomes is iteratively rectifying gaps being made by Brownian motion.

      We apologize for the understandable confusion. In our picture, the polysomes and DNA (conceptually considered as small plectonemic segments) basically behave as dissolved particles. If these particles were noninteracting, they would simply mix. However, both polysomes and DNA segments are large enough to interact sterically. So as density increases, steric avoidance implies a reduced conformational entropy and thus a higher free energy per particle. We argue (based on Miangolarra et al. PNAS 2021 PMID: 34675077 and Xiang et al. Cell 2021 PMID: 34186018) that the demixing of polysomes and DNA segments occurs because DNA segments pack better with each other than they do with polysomes. This raises the free energy cost associated with DNA-polysome interactions compared to DNA-DNA interactions.  We model this effect by introducing a term in the free energy χ_np, which refer to as a repulsion between DNA and polysomes, though as explained above it arises from entropic effects. At realistic cellular densities of DNA and polysomes this repulsive interaction is strong enough to cause the DNA and polysomes to phase separate.

      This same density-dependent free energy that causes phase separation can also give rise to forces, just in the way that a higher pressure on one side of a wall can give rise to a net force on the wall. Indeed, the “compaction force” we refer to is fundamentally an osmotic pressure difference. At some stages during nucleoid segregation, the region of the cell between nucleoids has a higher polysome concentration, and therefore a higher osmotic pressure, than the regions near the poles. This results in a net poleward force on the sister nucleoids that drives their migration toward the poles. This migration continues until the osmotic pressure equilibrates. Therefore, both phase separation (due to the steric repulsion described above) and nonequilibrium polysome production and degradation (which creates the initial accumulation of polysomes around midcell) are essential ingredients for nucleoid segregation.

      This will be clarified in the revised text, with the support of additional simulation results.

      The authors use polysome accumulation and phase separation to describe what is driving nucleoid segregation. Both terms are accurate, but it might help the less physically inclined reader to have one term, or have what each of these means explicitly defined at the start. I say this most especially in terms of "phase separation", as the currently huge momentum toward liquid-liquid interactions in biology causes the phrase "phase separation" to often evoke a number of wider (and less defined) phenomena and ideas that may not apply here. Thus, a simple clear definition at the start might help some readers.

      Phase separation means that the DNA-polysome steric repulsion is strong enough to drive their demixing, which creates a compact nucleoid. As mentioned in a previous point, this effect is captured in the free energy by the χ_np term, which is an effective repulsion between DNA and polysomes, though as explained above it arises from entropic effects.

      In the revised manuscript, we will illustrate this with our theoretical model by initializing a cell with a diffuse nucleoid and low polysome concentration. For the sake of simplicity, we assume that the cell does not elongate. We observe that the DNA-polysome steric repulsion is sufficient to compact the nucleoid and place it at mid-cell.

      (4) Line 478. "Altogether, these results support the notion that ectopic polysome accumulation drives nucleoid dynamics". Is this right? Should it not read "results support the notion that ectopic polysome accumulation inhibits/redirects nucleoid dynamics"?

      We think that this is correct; the ectopic polysome accumulation drives nucleoid dynamics. In our theoretical model, we can introduce polysome production at fixed sources to mimic the experiments where ectopic polysome production is achieved by high plasmid expression (Fig. 6). The model is able to recapitulate the two main phenotypes observed in experiments. These new simulation results will be added to the revised manuscript.

      (5) It would be helpful to clarify what happens as the RplA-GFP signal decreases at midcell in Figure 1- is the signal then increasing in the less "dense" parts of the cell? That is, (a) are the polysomes at midcell redistributing throughout the cell? (b) is the total concentration of polysomes in the entire cell increasing over time?

      It is a redistribution—the RplA-GFP signal remains constant in concentration from cell birth to division (Figure 1 – Figure Supplement 1E). This will be clarified in the revised text.

      (6) Line 154. "Cell constriction contributed to the apparent depletion of ribosomal signal from the mid-cell region at the end of the cell division cycle (Figure 1B-C and Movie S1)" - It would be helpful if when cell constriction began and ended was indicated in Figures 1B and C.

      Good idea. We will add markers to indicate the start of cell constriction. We will also indicate that cell birth and division correspond to the first and last images/timepoint in Fig. 1B and C, respectively.

      (7) In Figure 7 they demonstrate that radial confinement is needed for longitudinal nucleoid segregation. It should be noted (and cited) that past experiments of Bacillus l-forms in microfluidic channels showed a clear requirement role for rod shape (and a given width) in the positing and the spacing of the nucleoids.

      Wu et al, Nature Communications, 2020 . "Geometric principles underlying the proliferation of a model cell system" https://dx.doi.org/10.1038/s41467-020-17988-7

      Good point. We will add this reference. Thank you.

      (8) "The correlated variability in polysome and nucleoid patterning across cells suggests that the size of the polysome-depleted spaces helps determine where the chromosomal DNA is most concentrated along the cell length. This patterning is likely reinforced through the displacement of the polysomes away from the DNA dense region"

      It should be noted this likely functions not just in one direction (polysomes dictating DNA location), but also in the reverse - as the footprint of compacted DNA should also exclude (and thus affect) the location of polysomes

      We agree that the effects could go both ways at this early stage of the story. We will revise the text accordingly.  

      (9) Line 159. Rifampicin is a transcription inhibitor that causes polysome depletion over time. This indicates that all ribosomal enrichments consist of polysomes and therefore will be referred to as polysome accumulations hereafter". Here and throughout this paper they use the term polysome, but cells also have monosomes (and 2 somes, etc). Rifampicin stops the assembly of all of these, and thus the loss of localization could occur from both. Thus, is it accurate to state that all transcription events occur in polysomes? Or are they grouping all of the n-somes into one group?

      In the discussion, we noted that our term “polysomes” also includes monosomes for simplicity, but we agree that the term should have been defined much earlier. This will be done in the revised manuscript.

      Thank you for the valuable comments and suggestions!

      Reviewer #2 (Public review):

      Summary:

      The authors perform a remarkably comprehensive, rigorous, and extensive investigation into the spatiotemporal dynamics between ribosomal accumulation, nucleoid segregation, and cell division. Using detailed experimental characterization and rigorous physical models, they offer a compelling argument that nucleoid segregation rates are determined at least in part by the accumulation of ribosomes in the center of the cell, exerting a steric force to drive nucleoid segregation prior to cell division. This evolutionarily ingenious mechanism means cells can rely on ribosomal biogenesis as the sole determinant for the growth rate and cell division rate, avoiding the need for two separate 'sensors,' which would require careful coupling.

      Terrific summary! Thank you for your positive assessment.

      Strengths:

      In terms of strengths; the paper is very well written, the data are of extremely high quality, and the work is of fundamental importance to the field of cell growth and division. This is an important and innovative discovery enabled through a combination of rigorous experimental work and innovative conceptual, statistical, and physical modeling.

      Thank you!

      Weaknesses:

      In terms of weaknesses, I have three specific thoughts.

      Firstly, my biggest question (and this may or may not be a bona fide weakness) is how unambiguously the authors can be sure their ribosomal labeling is reporting on polysomes, specifically. My reading of the work is that the loss of spatial density upon rifampicin treatment is used to infer that spatial density corresponds to polysomes, yet this feels like a relatively indirect way to get at this question, given rifampicin targets RNA polymerase and not translation. It would be good if a more direct way to confirm polysome dependence were possible.

      The heterogeneity of ribosome distribution inside E. coli cells has been attributed to polysomes by many labs (PMID: 25056965, 38678067, 22624875, 31150626, 34186018, 10675340).  The attribution is also consistent with single-molecule tracking experiments showing that slow-moving ribosomes (polysomes) are excluded by the nucleoid whereas fast-diffusing ribosomes (free ribosomal subunits) are distributed throughout the cytoplasm (PMID: 25056965, 22624875).

      Furthermore, inhibition of translation initiation with kasugamycin treatment, which decreases the pool of polysomes, results in a homogenization of ribosomes and expansion of the nucleoid (see Author response image 1). This further supports the rifampicin experiments. Given that the attribution of ribosome heterogeneity to polysomes is well accepted in the field, we would prefer to not include these kasugamycin data in the revised manuscript because long-term exposure to this drug leads to nucleoid re-compaction (PMID: 25250841 and PMID: 34186018). This secondary effect may possibly be due to a dysregulated increase in synthesis of naked rRNAs (PMID: 14460744, PMID: 2114400, and PMID: 2448483) or ribosome aggregation, which we are currently investigating.

      Author response image 1.

      Effects of kasugamycin treatment on the intracellular distribution of ribosomes and nucleoids. Representative single cell (CJW7323) growing in M9gluCAAT.  Kasugamycin (3 mg/mL) was added at time = 0 min. Show is the early response (0-30 min) to the drug characterized by the homogenization of the ribosomal RplA-GFP fluorescence and the expansion of the HupA-mCherry-labeled nucleoids. For each segmented cell, the RplA-GFP and HupA-mCherry signals were normalized by the average fluorescence.

      Second, the authors invoke a phase separation model to explain the data, yet it is unclear whether there is any particular evidence supporting such a model, whether they can exclude simpler models of entanglement/local diffusion (and/or perhaps this is what is meant by phase separation?) and it's not clear if claiming phase separation offers any additional insight/predictive power/utility. I am OK with this being proposed as a hypothesis/idea/working model, and I agree the model is consistent with the data, BUT I also feel other models are consistent with the data. I also very much do not think that this specific aspect of the paper has any bearing on the paper's impact and importance.

      We appreciate the reviewer’s comment, but the output of our reaction-diffusion model is a bona fide phase separation (spinodal decomposition). So, we feel that we need to use the term when reporting the modeling results. Inside the cell, the situation is more complicated. As the reviewer points out, there likely are entanglements (not considered in our model) and other important factors (please see our discussion on the model limitations). This said, we will revise our text to clarify our terms and proposed mechanism.

      Finally, the writing and the figures are of extremely high quality, but the sheer volume of data here is potentially overwhelming. I wonder if there is any way for the authors to consider stripping down the text/figures to streamline things a bit? I also think it would be useful to include visually consistent schematics of the question/hypothesis/idea each of the figures is addressing to help keep readers on the same page as to what is going on in each figure. Again, there was no figure or section I felt was particularly unclear, but the sheer volume of text/data made reading this quite the mental endurance sport! I am completely guilty of this myself, so I don't think I have any super strong suggestions for how to fix this, but just something to consider.

      We agree that there is a lot to digest. We will add schematics and a didactic simulation. We will also try to streamline the text.

      Reviewer #3 (Public review):

      Summary:

      Papagiannakis et al. present a detailed study exploring the relationship between DNA/polysome phase separation and nucleoid segregation in Escherichia coli. Using a combination of experiments and modelling, the authors aim to link physical principles with biological processes to better understand nucleoid organisation and segregation during cell growth.

      Strengths:

      The authors have conducted a large number of experiments under different growth conditions and physiological perturbations (using antibiotics) to analyse the biophysical factors underlying the spatial organisation of nucleoids within growing E. coli cells. A simple model of ribosome-nucleoid segregation has been developed to explain the observations.

      Weaknesses:

      While the study addresses an important topic, several aspects of the modelling, assumptions, and claims warrant further consideration.

      Thank you for your feedback. Please see below for a response to each concern. 

      Major Concerns:

      Oversimplification of Modelling Assumptions:

      The model simplifies nucleoid organisation by focusing on the axial (long-axis) dimension of the cell while neglecting the radial dimension (cell width). While this approach simplifies the model, it fails to explain key experimental observations, such as:

      (1) Inconsistencies with Experimental Evidence:

      The simplified model presented in this study predicts that translation-inhibiting drugs like chloramphenicol would maintain separated nucleoids due to increased polysome fractions. However, experimental evidence shows the opposite-separated nucleoids condense into a single lobe post-treatment (Bakshi et al 2014), indicating limitations in the model's assumptions/predictions. For the nucleoids to coalesce into a single lobe, polysomes must cross the nucleoid zones via the radial shells around the nucleoid lobes.

      We do not think that the results from chloramphenicol-treated cells are inconsistent with our model. Our proposed mechanism predicts that nucleoids will condense in the presence of chloramphenicol, consistent with experiments. It also predicts that nucleoids that were still relatively close at the time of chloramphenicol treatment could fuse if they eventually touched through diffusion (thermal fluctuation) to reduce their interaction with the polysomes and minimize their conformational energy. Fusion is, however, not expected for well-separated nucleoids since their diffusion is slow in the crowded cytoplasm. This is consistent with our experimental observations: In the presence of a growth-inhibitory concentration of chloramphenicol (70 μg/mL), nucleoids in relatively close proximity can fuse, but well-separated nucleoids condense and do not fuse. Since the growth rate inhibition is not immediate upon chloramphenicol treatment, many cells with well-separated condensed nucleoids divide during the first hour. As a result, the non-fusion phenotype is more obvious in non-dividing cells, achieved by pre-treating cells with the cell division inhibitor cephalexin (50μg/mL). In these polyploid elongated cells, well-separated nucleoids condensed but did not fuse, not even after an hour in the presence of chloramphenicol (as illustrated in Author response image 2).

      In Bakshi et al, 2014, nucleoid fusion was shown for a single cell in which the sister nucleoids were relatively close to each other at the time of chloramphenicol treatment. Population statistics were provided for the relative length and width of the nucleoids, but not for the fusion events. So, it is unclear whether the illustrated fusion was universal or not. Also, we note that Bakshi et al (2014) used a chloramphenicol concentration of 300 μg/mL, which is 20-fold higher than the minimal inhibitory concentration for growth, compared to 70 μg/mL in our experiments.

      Author response image 2.

      Effects of chloramphenicol treatment on the intracellular distribution of ribosomes and nucleoids in non-dividing cells. Exponentially growing cells (M9glyCAAT at 30°C) were pre-treated with cephalexin for one hour before being spotted on an 1% agarose pad for time-lapse imaging. The agarose pad contained M9glyCAAT, cephalexin, and chloramphenicol.  (A) Phase contrast, RplA-GFP fluorescence and HupA-mCherry fluorescence images of a representative single cell. Three timepoints are shown, including the first image after spotting on the agarose pad (at 0 min), 30 minutes and one hour of chloramphenicol treatment. (B) One-dimensional profiles of the ribosomal (RplA-GFP) and nucleoid (HupA-mCherry) fluorescence from the cells shown in panel A. These intensity profiles correspond to the average fluorescence along the medial axis of the cell considering a 6-pixel region (0.4 μm) centered on the central line of the cell. The fluorescence intensity is plotted along the relative cell length, scaled from 0 to 100% between the two poles, illustrating the relative nucleoid length (L<sub>DNA</sub>/L<sub>cell</sub>) that was plotted by Bakshi et al in 2014 (PMID: 25250841).

      (2) The peripheral localisation of nucleoids observed after A22 treatment in this study and others (e.g., Japaridze et al., 2020; Wu et al., 2019), which conflicts with the model's assumptions and predictions. The assumption of radial confinement would predict nucleoids to fill up the volume or ribosomes to go near the cell wall, not the nucleoid, as seen in the data.

      The reviewer makes a good point that DNA attachment to the membrane through transertion likely contributes to the nucleoid being peripherally localized in A22 cells. We will revise the text to add this point. However, we do not think that this contradicts the proposed nucleoid segregation mechanism based on phase separation and out-of-equilibrium dynamics described in our model. On the contrary, by attaching the nucleoid to the cytoplasmic membrane along the cell width, transertion might help reduce the diffusion and thus exchange of polysomes across nucleoids. We will revise the text to discuss transertion over radial confinement.

      (3) The radial compaction of the nucleoid upon rifampicin or chloramphenicol treatment, as reported by Bakshi et al. (2014) and Spahn et al. (2023), also contradicts the model's predictions. This is not expected if the nucleoid is already radially confined.

      We originally evoked radial confinement to explain the observation that polysome accumulations do not equilibrate between DNA-free regions. We agree that transertion is an alternative explanation. Thank you for bringing it to our attention. However, please note that this does not contradict the model. In our view, it actually supports the 1D model by providing a reasonable explanation for the slow exchange of polysomes across DNA-free regions. The attachment of the nucleoid to the membrane along the cell width may act as diffusion barrier. We will revise the text and the title of the manuscript accordingly.

      (4) Radial Distribution of Nucleoid and Ribosomal Shell:

      The study does not account for well-documented features such as the membrane attachment of chromosomes and the ribosomal shell surrounding the nucleoid, observed in super-resolution studies (Bakshi et al., 2012; Sanamrad et al., 2014). These features are critical for understanding nucleoid dynamics, particularly under conditions of transcription-translation coupling or drug-induced detachment. Work by Yongren et al. (2014) has also shown that the radial organisation of the nucleoid is highly sensitive to growth and the multifork nature of DNA replication in bacteria.

      We will discuss the membrane attachment. Please see the previous response.

      The omission of organisation in the radial dimension and the entropic effects it entails, such as ribosome localisation near the membrane and nucleoid centralisation in expanded cells, undermines the model's explanatory power and predictive ability. Some observations have been previously explained by the membrane attachment of nucleoids (a hypothesis proposed by Rabinovitch et al., 2003, and supported by experiments from Bakshi et al., 2014, and recent super-resolution measurements by Spahn et al.).

      We agree—we will add a discussion about membrane attachment in the radial dimension. See previous responses.

      Ignoring the radial dimension and membrane attachment of nucleoid (which might coordinate cell growth with nucleoid expansion and segregation) presents a simplistic but potentially misleading picture of the underlying factors.

      As mentioned above, we will discuss membrane attachment in the revised manuscript.

      This reviewer suggests that the authors consider an alternative mechanism, supported by strong experimental evidence, as a potential explanation for the observed phenomena:

      Nucleoids may transiently attach to the cell membrane, possibly through transertion, allowing for coordinated increases in nucleoid volume and length alongside cell growth and DNA replication. Polysomes likely occupy cellular spaces devoid of the nucleoid, contributing to nucleoid compaction due to mutual exclusion effects. After the nucleoids separate following ter separation, axial expansion of the cell membrane could lead to their spatial separation.

      This “membrane attachment/cell elongation” model is reminiscent to the hypothesis proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several lines of evidence arguing against it as the major driver of nucleoid segregation:

      (Below is a slightly modified version of our response to a comment from Reviewer 1—see page 3)

      (1) For this alternative model to work, axial membrane expansion (i.e., cell elongation) would have to be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation.  Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To go around this fluidity issue, one could potentially evoke a potential connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid to “push” the sister nucleoid apart from each other. However, peptidoglycan growth is dispersed prior to cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In the revised manuscript, we will provide additional data showing that the cell elongation rate is indeed slower than the nucleoid segregation rate.

      (3) Furthermore, our correlation analysis comparing the rate of nucleoid segregation to the rate of either cell elongation or polysome accumulation argues that polysome accumulation plays a larger role than cell elongation in nucleoid segregation. These data were already shown in the original manuscript (Figure 1I and Figure 1 – figure supplement 3) but were not highlighted in this context. We will revise the text to clarify this point.

      (4) The membrane attachment/cell elongation model does not explain the nucleoid asymmetries described in our paper (Figure 3), whereas they can be recapitulated by our model.

      (5) The cell elongation/transertion model cannot predict the aberrant nucleoid dynamics observed when chromosomal expression is largely redirected to plasmid expression. In the revised manuscript, we will add simulation results showing that these nucleoid dynamics are predicted by our model.

      In line of these arguments, we do not believe that a mechanism based on membrane attachment and cell elongation is the major driver of nucleoid segregations. However, we do believe that it may play a complementary role (see “Nucleoid segregation likely involves multiple factors” in the Discussion). We will revise this section to clarify our thoughts and mention the potential role of transertion.

      Incorporating this perspective into the discussion or future iterations of the model may provide a more comprehensive framework that aligns with the experimental observations in this study and previous work.

      As noted above, we will revise the text to mention about transertion.

      Simplification of Ribosome States:

      Combining monomeric and translating ribosomes into a single 'polysome' category may overlook spatial variations in these states, particularly during ribosome accumulation at the mid-cell. Without validating uniform mRNA distribution or conducting experimental controls such as FRAP or single-molecule measurements to estimate the proportions of ribosome states based on diffusion, this assumption remains speculative.

      Indeed, for simplicity, we adopt an average description of all polysomes with an average diffusion coefficient and interaction parameters, which is sufficient for capturing the fundamental mechanism underlying nucleoid segregation. To illustrate that considering multiple polysome species does not change the physical picture, we consider an extension of our model, which contains three polysome species, each with a different diffusion coefficient (D<SUB>P</SUB> = 0.018, 0.023, or 0.028 μm<sup>2</sup>/s), reflecting that polysomes with more ribosomes will have a lower diffusion coefficient. Simulation of this model reveals that the different polysome species have essentially the same concentration distribution, suggesting that the average description in our minimal model is sufficient for our purposes. We will present these new simulation results in the revised manuscript.

    1. eLife Assessment

      This study provides valuable scRNA-seq and scATAC-seq data for testicular tissues from patients with spermatogenesis disorders. By examining the transcriptomic and epigenetic changes in Sertoli cells, the authors uncovered key regulatory mechanisms underlying male infertility and identified potential therapeutic targets. While some of the cellular profiling results are convincing, the analyses for differential profiling of NOA cases and epigenomics data remain incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Wang and colleagues generate single-cell transcriptome and chromatin accessibility data from testicular tissues of two OA and three NOA cases. The authors analyze this dataset to identify novel cellular populations, marker genes, and inter-population interactions that may contribute to proper spermatogenesis. Then they propose a role of specific Sertoli cell subtypes and their interactions via Notch signaling in germ cell development. However, I remain skeptical of their central argument (also highlighted in the title) that stage-specific interactions between Sertoli and germ cells are a key component in NOA development, as my initial concerns regarding potential data misrepresentation, lack of statistical testing, and the rationale behind some of the analyses have not been sufficiently addressed.

      (1) As noted in my previous comments, the analysis of Sertoli cell subtypes is potentially misleading and lacks proper statistical support. The authors claim a significant loss of Sertoli subpopulations in NOA cases, and provide the absolute number of cells in Figure 6B. However, this observation could easily be driven by the total number of cells captured during the experiment and the anatomical location of the specimens. There is no statistical basis to make the claim that this loss is significant. Furthermore, the same analysis should be performed on scATAC-seq cells and presented alongside.

      (2) As pointed out in my initial concerns, some parts of the analyses require additional explanation to clarify their logical flow. For example, the logic of using between-sample correlations to assess colocalization of Sertoli and germ cells is lost on me. How can this be used to infer the important role of specific Sertoli cell populations in spermatogenesis, other than the fact that some of the genes are more co-expressed in the sub-populations? And how is this related to the claim that these cell populations are actually co-localized in the tissue? The authors then dedicate nearly a page describing the pathways enriched in Sertoli and germ cells, but the relevance is unclear, and the argument that these subtypes are functionally related is not convincing enough.

      (3) The statement regarding Notch signaling as a critical component in Sertoli and germ cell interaction is not supported by actual evidence. The inference based on CellphoneDB and an epigenome snapshot that shows not much difference are insufficient to justify this claim.

      (4) The manuscript is overly wordy and descriptive, making it difficult to read and understand the points. The main text needs to be more concise and on point, with unnecessary details removed to sharpen the key points. Non-essential results (e.g. Figure S10 and S11) unrelated to the main argument should be removed.

    3. Reviewer #2 (Public review):

      Summary:

      Shimin Wang et al. investigated the role of Sertoli cells in mediating spermatogenesis disorders in non-obstructive azoospermia (NOA) through stage-specific communications. The authors utilized scRNA-seq and scATAC-seq to analyze the molecular and epigenetic profiles of germ cells and Sertoli cells at different stages of spermatogenesis.

      Strengths:

      By understanding the gene expression patterns and chromatin accessibility changes in Sertoli cells, the authors sought to uncover key regulatory mechanisms underlying male infertility and identify potential targets for therapeutic interventions. They emphasized that the absence of the SC3 subtype would be a major factor contributing to NOA.

      Comments on revisions:

      The authors have addressed my concerns. I have no further comments.

    4. Reviewer #3 (Public review):

      Summary:

      This study profiled the single-cell transcriptome of human spermatogenesis and provided many potentials molecular markers for developing testicular puncture specific marker kits for NOA patients.

      Strengths:

      Perform single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) on testicular tissues from two OA patients and three NOA patients

      Weaknesses:

      Most results are analytical and lack specific experiments to support these analytical results and hypotheses.

      Comments on revisions:

      In the revised version of the manuscript, the authors made some effort to revise their manuscript according to reviewers' comments and addressed the problems that I had raised before.

      I have no other serious criticisms regarding the revised manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript is dedicated heavily to cell type mapping and identification of sub-type markers in the human testis but does not present enough results from cross-investigation between NOA cases versus control. Their findings are mostly based on transcriptome and the authors do not make enough use of the scATAC-seq data in their analyses as they put forward in the title. Overall, the authors should do more to include the differential profile of NOA cases at the molecular level - specific gene expression, chromatin accessibility, TF binding, pathway, and signaling that are perturbed in NOA patients that may be associated with azoospermia.

      Strengths:

      (1) The establishment of single-cell data (both RNA and ATAC) from the human testicular tissues is noteworthy.

      (2) The manuscript includes extensive mapping of sub-cell populations with some claimed as novel, and reports marker gene expression.

      (3) The authors present inter-cellular cross-talks in human testicular tissues that may be important in adequate sperm cell differentiation.

      Weaknesses:

      (1) A low sample size (2 OA and 3 NOA cases). There are no control samples from healthy individuals.

      Thank you for your comments. We recognize that the small sample size in this study somewhat limits its generalizability. However, in transcriptomic research, limited sample sizes are a common issue due to the complexities involved in acquiring samples, particularly in studies about the reproductive system. Healthy testicular tissue samples are difficult to obtain, and studies (doi: 10.18632/aging.203675) have used obstructive azoospermia as a control group in which spermatogenesis and development are normal.

      (2) Their argument about interactions between germ and Sertoli cells is not based on statistical testing.

      Thank you for your comments. Due to limited funding, we have not yet fully and deeply conducted validation experiments, but we plan to carry out related experiments in the later stage. We hope that the publication of this study will help to obtain more financial support to further investigate the interactions between germ cells and Sertoli cells.

      (3) Rationale/logic of the study. This study, in its present form, seems to be more about the role of sub-Sertoli population interactions in sperm cell development and does not provide enough insights about NOA.

      Thank you for your comments. In Figure 6, we conducted an in-depth analysis and comparison of the differences between the Sertoli cell subtypes and the germ cell subtypes involved in spermatogenesis in the OA and NOA groups. The results revealed that in the NOA group, especially in the NOA3 group, which has a lower sperm count compared to NOA2 and NOA1, there is a significant loss of Sertoli cell subtypes including SC3, SC4, SC5, SC6, and SC8. The NOA1 group, with a sperm count close to that of the OA group, also had a Sertoli cell profile similar to the OA group. The NOA2 group, with a sperm count between that of NOA1 and NOA3, also exhibited an intermediate profile of Sertoli cell subtypes. Therefore, we suggest that change in Sertoli cell subtypes is a key factor affecting sperm count, rather than just the total number of Sertoli cells. We believe that through these analyses, we can provide in-depth insights into NOA, and we hope that the publication of this study will help obtain more funding support to further validate and expand on these findings.

      (4) The authors do not make full use of the scATAC-seq data.

      Thank you for your comments.We have added analysis of the scATAC-seq data and shown in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Shimin Wang et al. investigated the role of Sertoli cells in mediating spermatogenesis disorders in non-obstructive azoospermia (NOA) through stage-specific communications. The authors utilized scRNA-seq and scATAC-seq to analyze the molecular and epigenetic profiles of germ cells and Sertoli cells at different stages of spermatogenesis.

      Strengths:

      By understanding the gene expression patterns and chromatin accessibility changes in Sertoli cells, the authors sought to uncover key regulatory mechanisms underlying male infertility and identify potential targets for therapeutic interventions. They emphasized that the absence of the SC3 subtype would be a major factor contributing to NOA.

      Weaknesses:

      Although the authors used cutting-edge techniques to support their arguments, it is difficult to find conceptual and scientific advances compared to Zeng S et al.'s paper (Zeng S, Chen L, Liu X, Tang H, Wu H, and Liu C (2023) Single-cell multi-omics analysis reveals dysfunctional Wnt signaling of spermatogonia in non-obstructive azoospermia. Front. Endocrinol. 14:1138386.). Overall, the authors need to improve their manuscript to demonstrate the novelty of their findings in a more logical way.

      Thank you for your detailed review of our work. We greatly appreciate your feedback and have made revisions to our manuscript accordingly.

      Regarding the novelty of our research, we believe our study offers conceptual and scientific advances in several ways:

      We have systematically revealed the stage-specific roles of Sertoli cell subtypes in different stages of spermatogenesis, particularly emphasizing the crucial role of the SC3 subtype in non-obstructive azoospermia (NOA). Additionally, we identified that other Sertoli cell subtypes (SC1, SC2, SC3...SC8, etc.) also collaborate in a stage-specific manner with different subpopulations of spermatogenic cells (SSC0, SSC1/SSC2/Diffed, Pa...SPT3). These findings provide new insights into the understanding of spermatogenesis disorders.

      Compared to the study by Zeng S et al., our research not only focuses on the functional alterations in Sertoli cells but also comprehensively analyzes the interaction patterns between Sertoli cells and spermatogenic cells using scRNA-seq and scATAC-seq technologies. We uncovered several novel regulatory networks that could serve as potential targets for the diagnosis and treatment of NOA.

      We sincerely appreciate your constructive comments and will continue to explore this area further, aiming to make a more significant contribution to the understanding of NOA mechanisms.

      Reviewer #3 (Public Review):

      Summary:

      This study profiled the single-cell transcriptome of human spermatogenesis and provided many potential molecular markers for developing testicular puncture-specific marker kits for NOA patients.

      Strengths:

      Perform single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) on testicular tissues from two OA patients and three NOA patients.

      Weaknesses:

      Most results are analytical and lack specific experiments to support these analytical results and hypotheses.

      Thank you for your thorough review of our work. We highly value your feedback and have made revisions to our manuscript accordingly. Indeed, we have conducted immunofluorescence (IF) experiments to validate the data obtained from single-cell sequencing and have expanded the sample size to enhance the reliability of our results. To better present these validation experiments, we have reorganized and renamed the sample information, making it easier for you to understand which samples were used in the specific experiments. Following the publication of this paper, we plan to secure additional funding to deepen our research, particularly in the area of experimental validation. We sincerely appreciate your support and insightful suggestions, which have greatly helped guide our future research directions.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should include results from cross-investigation comparing NOA/OA patients versus controls.

      Thank you for your comments. In this study, OA was the control group. Healthy testicular tissue samples are difficult to obtain, and studies (doi: 10.18632/aging.203675) have used OA as a control group in which spermatogenesis and development are normal.

      (2) In Table S1, the authors should also include the metric for scATAC-seq, and do more to show the findings the authors obtained in RNA is replicated with chromatin accessibility.

      Thank you for your comments. We have added Table S2, which includes the metric for scATAC-seq.

      (3) A single sample from each OA and NOA group may not be enough to confirm colocalization. The authors should include results from all available samples and use quantitative measures.

      Thank you for your comments. I apologize that the sample size in this study was less than three and we could not conduct quantitative analysis. We will increase the sample size and conduct corresponding experiments in subsequent research.

      (4) The Methods section does not include enough description to follow how the analyses were carried out, and is missing information on some of the key procedures such as velocity and cell cycle analyses.

      Thank you for your comments. The method about velocity and cell cycle analyses was added in the revised manuscript. The description is as follows:

      “Velocity analysis

      RNA velocity analysis was conducted using scVelo's (version 0.2.1) generalized dynamical model. The spliced and unspliced mRNA was quantified by Velocity (version 0.17.17).”

      “Cell cycle analysis

      To quantify the cell cycle phases for individual cell, we employed the CellCycleScoring function from the Seurat package. This function computes cell cycle scores using established marker genes for cell cycle phases as described in a previous study by Nestorowa et al. (2016). Cells showing a strong expression of G2/M-phase or S-phase markers were designated as G2/M-phase or S-phase cells, respectively. Cells that did not exhibit significant expression of markers from either category were classified as G1-phase cells.”

      (5) For the purpose of transparency, the authors should upload codes used for analyses so that each figure can be reproduced. All raw and processed data should be made publicly available.

      Thank you for your comments. We have deposited scRNA-seq and scATAC-seq data in NCBI. ScRNA-seq data have been deposited in the NCBI Gene Expression Omnibus with the accession number GSE202647, and scATAC-seq data have been deposited in the NCBI database with the accession number PRJNA1177103.

      Reviewer #2 (Recommendations For The Authors):

      The detailed points the authors need to improve are attached below.

      The results presented in the study have several weaknesses:

      In Figure 1A, it's required to show HE staining results of all patients who underwent single-cell analysis were provided.

      Thank you very much for your valuable suggestions. In Figure 1, we present the HE staining results paired with the single-cell data, covering all patients involved in the single-cell analysis.

      - Saying "identification of novel potential molecular markers for distinct cell types" seems unsupported by the data.

      Thank you for your comments. I'm sorry for the inaccuracy of my description. We have revised this sentence. The description is as follows: These findings indicate that the scRNA-seq data from this study can serve for cellular classification.

      - The methods suggest an integrated analysis of scRNA-seq and scATAC-seq, but from the figures, it seems like separate analyses were performed. It's necessary to have data showing the integrated analysis.

      Thank you for your comments. We have added an integrated analysis of scRNA-seq and scATAC-seq. The results were shown in Figure S2.

      Figure 2 does not seem to well cover the diversity of germ cell subtypes. The main content appears to be about the differentiation process, and it seems more focused on SSCs (stem cell types), but the intended message is not clearly conveyed.

      Thank you for your comments. Figure S1 revealed the diversity of germ cell subtypes. The second part of the results described the integrated findings from Figures 2 and S1.

      - In Figure 2B, pseudotime could be shown, and I wonder if the pseudotime in this analysis shows a similar pattern as in Figure 2D.

      Thank you for your comments. Figure 2B revealed the pseudotime analysis of 12 germ cell subpopulation. Figure 2D revealed RNA velocity of 12 germ cell subpopulation. The two methods are both used for cell trajectory analysis. The pseudotime in Figure 2B showed a similar pattern as in Figure 2D.

      - While staining occurs within one tissue, saying they are co-expressed seems inaccurate as the staining locations are clearly distinct. For example, the staining patterns of A2M and DDX4 (a classical marker) are quite different, so it's hard to claim A2M as a new potential marker just because it's expressed. Also, TSSK6 was separately described as having a similar expression pattern to DDX4, but from the IF results, it doesn't seem similar.

      Thank you for your comments. We have revised the Figure.

      - It was described that A2M (expressed in SSC0-1), and ASB9 (expressed in SSC2) have open promoter sites in SSC0, SSC2, and Diffing_SPG, but it doesn't seem like they are only open in the promoters of those cell types. For example, there doesn't seem to be a peak in Diffing for either gene. The promoter region of the tracks is not very clear, so overall figure modification seems necessary.

      Thank you for your comments. We have revised the Figure.

      - The ATAC signal scale for each genomic region should be included, and clear markings for the TSS location and direction of the genes are needed.

      Thank you for your comments. We have revised the figure and shown in the revised manuscript.

      Figure 3A mostly shows the SSC2 in the G2/M phase, so it seems questionable to call SSC0/1 quiescent. Also, I wonder if the expression of EOMES and GFRA1 is well distinguished in the SSC subtypes as expected.

      Thank you for your comments. We will validate in subsequent experiments whether the expression of EOMES and GFRA1 is clearly distinguished in the SSC subtypes.

      - In Figure 3C, it would be good to have labels indicating what the x and y axes represent. The figure seems complex, and the description does not seem to fully support it.

      Thank you for your comments. We have added labels indicating what the x and y axes represent in the Figure 3C. The x and y axes represent spliced and unspliced mRNA ratios, respectively.

      - While TFs are the central focus, it's disappointing that scATAC-seq was not used.

      Thank you for your comments. TFs analysis using scATAC-seq will be carried out in the future.

      Figure 4: It would be good to have a more detailed discussion of the differences between subtypes, such as through GO analysis. The track images need modification like marking the peaks of interest and focusing more on the promoter region, similar to the previous figures.

      Thank you for your comments. GO analysis results were put in Figure S5. The description is as follows:

      As shown in Figure S5, SC1 were mainly involved in cell differentiation, cell adhesion and cell communication; SC2 were involved in cell migration, and cell adhesion; SC3 were involved in spermatogenesis, and meiotic cell cycle; SC4 were involved in meiotic cell cycle, and positive regulation of stem cell proliferation; SC5 were involved in cell cycle, and cell division; SC6 were involved in obsolete oxidation−reduction process, and glutathione derivative biosynthetic process; SC7 were involved in viral transcription and translational initiation; SC8 were involved in spermatogenesis and sperm capacitation.

      In Figure 5, it would be good to have criteria for the novel Sertoli cell subtype presented. CCDC62 is presented as a representative marker for the SC8 cluster, but from Figure 4C, it seems to be quite expressed in the SC3 cluster as well. Therefore, in Figure 5E's protein-level check, it's unclear if this truly represents a novel SC8 subtype.

      Thank you for your comments. CCDC62 expression was higher in SC8 cluster than in SC3. Since some molecular markers were not commercially available in the market, CCDC62 was selected as SC8 marker for immunofluorescence verification. Immunofluorescence results showed that CCDC62 is a novel SC8 marker.

      - It might have been more meaningful to use SOX9 as a control and show that markers in the same subtype are expressed in the same location.

      Thank you for your comments. To determine PRAP1, BST2, and CCDC62 as new markers for the SC subtype, we co-stained them with SOX9 (a well-known SC marker).

      - Figures 4 and 5 could potentially be combined into one figure.

      Thank you for your comments. Since combining Figures 4 and 5 into a single image would cause the image to be unclear, two images are used to show it.

      In Figure 6, it would be good to support the results with more NOA patient data.

      Thank you for your comments. Patient clinical and laboratory characteristics has been presented in Table 1.

      - Rather than claiming the importance of SC3 based on 3 single-cell patient data, it would be better to validate using public data with SC3 signature genes (e.g., showing the correlation between germ cell and SC3 ratios).

      Thank you for your comments. I'm sorry I didn't find public data with SC3 signature genes. In the future, we will verify the importance of SC3 through in vivo and in vitro experiments.

      - 462: It seems to be referring to Figure 6G, not 6D.

      Thank you for your comments. We have revised it. The description is as follows: As shown in Figure 6G, State 1 SC3/4/5 were tended to associated with PreLep, SSC0/1/2, and Diffing and Diffed-SPG sperm cells (R > 0.72).

      In Figure 7, the spermatogenesis process is basically well-known, so it would be better to emphasize what novel content is being conveyed here. Additionally, emphasizing the importance of SC3 in the overall process based on GO results leaves room for a better approach.

      Thank you for your valuable suggestions. Regarding Figure 7, we recognize that the spermatogenesis process is well-known, and we will focus on highlighting the novel content, particularly the role and significance of the SC3 subtype in spermatogenesis disorders. As for the importance of SC3 in the overall process based on GO results, we have validated this in Figure 8 through co-staining experiments between Sertoli cells and spermatogenic cells in OA and NOA groups. The results demonstrate a significant correlation between the number of SC3-positive cells and SPT3 spermatogenic cells, particularly in the NOA5-P8 group, where both SC3 and SPT3 cell counts are notably lower than in the NOA4-P7 group. This further supports the critical role of SC3 in the spermatogenesis process. Your suggestions have prompted us to refine our data presentation and more clearly emphasize the novel aspects of our research. We will continue to strive to ensure that every part of our research contributes meaningfully to the academic community. Thank you again for your guidance.

      In Figure 8, only the contents of the IF-stained proteins are listed, which seems slightly insufficient to constitute a subsection on its own. It might have been better to conclude by emphasizing some subtypes.

      Thank you for your comments. We have combined this part of the results with other results into one section. The description is as follows:

      “Co-localization of subpopulations of Sertoli cells and germ cells

      To determine the interaction between Sertoli cells and spermatogenesis, we applied Cell-PhoneDB to infer cellular interactions according to ligand-receptor signalling database. As shown in Figure 6G, compared with other cell types, germ cells were mainly interacted with Sertoli cells. We futher performed Spearman correlation analysis to determine the relationship between Sertoli cells and germ cells. As shown in Figure 6H, State 1 SC3/4/5 were tended to be associated with PreLep, SSC0/1/2, and Diffing and Diffed-SPG sperm cells (R > 0.72). Interestingly, SC3 was significantly positively correlated with all sperm subpopulations (R > 0.5), suggesting an important role for SC3 in spermatogenesis and that SC3 is involved in the entire process of spermatogenesis. Subsequently, to understand whether the functions of germ cells and Sertoli cells correspond to each other, GO term enrichment analysis of germ cells and sertoli cells was carried out (Figure S3, S4). We found that the functions could be divided into 8 categories, namely, material energy metabolism, cell cycle activity, the final stage of sperm cell formation, chemical reaction, signal communication, cell adhesion and migration, stem cells and sex differentiation activity, and stress reaction. These different events were labeled with different colors in order to quickly capture the important events occurring in the cells at each stage. As shown in Figure S3, we discovered that SSC0/1/2 was involved in SRP-dependent cotranslational protein targeting to membrane, and cytoplasmic translation; Diffing SPG was involved in cell division and cell cycle; Diffied SPG was involved in cell cycle and RNA splicing; Pre-Leptotene was involved in cell cycle and meiotic cell cycle; Leptotene_Zygotene was involved in cell cycle and meiotic cell cycle; Pachytene was involved in cilium assembly and spermatogenesis; Diplotene was involved in spermatogenesis and cilium assembly; SPT1 was involved in cilium assembly and flagellated sperm motility; SPT2 was involved in spermatid development and flagellated sperm motility; SPT3 was involved in spermatid development and spermatogenesis. As shown in Figure S4, SC1 were mainly involved in cell differentiation, cell adhesion and cell communication; SC2 were involved in cell migration, and cell adhesion; SC3 were involved in spermatogenesis, and meiotic cell cycle; SC4 were involved in meiotic cell cycle, and positive regulation of stem cell proliferation; SC5 were involved in cell cycle, and cell division; SC6 were involved in obsolete oxidation−reduction process, and glutathione derivative biosynthetic process; SC7 were involved in viral transcription and translational initiation; SC8 were involved in spermatogenesis and sperm capacitation. The above analysis indicated that the functions of 8 Sertoli cell subtypes and 12 germ cell subtypes were closely related.

      To further verify that Sertoli cell subtypes have "stage specificity" for each stage of sperm development, we firstly performed HE staining using testicular tissues from OA3-P6, NOA4-P7 and NOA5-P8 samples. The results showed that the OA3-P6 group showed some sperm, with reduced spermatogenesis, thickened basement membranes, and a high number of sertoli cells without spermatogenic cells. The NOA4-P7 group had no sperm initially, but a few malformed sperm were observed after sampling, leading to the removal of affected seminiferous tubules. The NOA5-P8 group showed no sperm in situ (Figure 7A). Immunofluorescence staining in Figure 7B was performed using these tissues for validation. ASB9 (SSC2) was primarily expressed in a wreath-like pattern around the basement membrane of testicular tissue, particularly in the OA group, while ASB9 was barely detectable in the NOA group. SOX2 (SC2) was scattered around SSC2 (ASB9), with nuclear staining, while TF (SC1) expression was not prominent. In NOA patients, SPATS1 (SC3) expression was significantly reduced. C9orf57 (Pa) showed nuclear expression in testicular tissues, primarily extending along the basement membrane toward the spermatogenic center, and was positioned closer to the center than DDX4, suggesting its involvement in germ cell development or differentiation. BEND4, identified as a marker fo SC5, showed a developmental trajectory from the basement membrane toward the spermatogenic center. ST3GAL4 was expressed in the nucleus, forming a circular pattern around the basement membrane, similar to A2M (SSC1), though A2M was more concentrated around the outer edge of the basement membrane, creating a more distinct wreath-like arrangement. In cases of impaired spermatogenesis, this arrangement becomes disorganized and loses its original structure. SMCP (SC6) was concentrated in the midpiece region of the bright blue sperm cell tail. In the OA group, SSC1 (A2M) was sparsely arranged in a rosette pattern around the basement membrane, but in the NOA group, it appeared more scattered. SSC2 (ASB9) expression was not prominent. BST2 (SC7) was a transmembrane protein primarily localized on the cell membrane. In the OA group, A2M (SSC1) was distinctly arranged in a wreath-like pattern around the basement membrane, with expression levels significantly higher than ASB9 (SSC2). TSSK6 (SPT3) was primarily expressed in OA3-P6, while CCDC62 (SC8) was more abundantly expressed in NOA4-P7, with ASB9 (SCC2) showing minimal expression. Taken together, germ cells of a particular stage tended to co-localize with Sertoli cells of the corresponding stages. Germ cells and sertoli cells at each differentiation stage were functionally heterogeneous and stage-specific (Figure 8). This suggests that each stage of sperm development requires the assistance of sertoli cells to complete the corresponding stage of sperm development.”

      Reviewer #3 (Recommendations For The Authors):

      The authors revealed 11 germ cell subtypes and 8 Sertoli cell subtypes through single-cell analysis of two OA patients and three NOA patients. And found that the Sertoli cell SC3 subtype (marked by SPATS1) plays an important role in spermatogenesis. It also suggests that Notch1/2/3 signaling and integrins are involved in germ cell-Stotoli cell interactions. This is an interesting and useful article that at least gives us a comprehensive understanding of human spermatogenesis. It provides a powerful tool for further research on NOA. However, there are still some issues and questions that need to be addressed.<br /> (1) How to collect testicular tissue, please explain in detail. Extract which part of testicular tissue. It's better to make a schematic diagram.

      Thank you for your comments. The process is as follows: Testicular tissues were obtained from two OA patients (OA1-P1 and OA2-P2) and three NOA patients (NOA1-P3, NOA2-P4, NOA3-P5) using micro-dissection of testicular sperm extraction separately.

      (2) Whether the tissues of these patients are extracted simultaneously or separately, separated into single cells, and stored, and then single cell analysis is performed simultaneously. Please be specific.

      Thank you for your comments. The testicular tissues of these patients were extracted separately, then separated into single cells, and single cell analysis was performed simultaneously.

      (3) When performing single-cell analysis, cells from two OA patients were analyzed individually or combined. The same problem occurred in the cells of three NOA patients.

      Thank you for your comments. Cells from two OA patients and three NOA patients were analyzed individually.

      (4) Can you specifically point out the histological differences between OA and NOA in Figure 1A? This makes it easier for readers to understand the structure change between OA and NOA. Please also label representative supporting cells.

      Thank you for your comments. We have revised the description and it was shown in the revised manuscript.

      (5) The authors demonstrate that "We speculate that this lack of differentiation may be due to the intense morphological changes occurring in the sperm cells during this period, resulting in relatively minor differences in gene expression." Please provide some verification of this hypothesis? For example, use immunofluorescence staining to observe morphological changes in sperm cells.

      Thank you for your comments. Due to limited funds, we will verified this hypothesis in future studies.

      (6) The authors demonstrate that " As shown in Figure 5E, we discovered that PRAP1, BST2, and CCDC62 were co-expressed with SOX9 in testes tissues." The staining in Figure 5D is unclear, and it is difficult to explain that SOX9 is co-expressed with PRAP1 BST2 CCDC62 based on the current staining results. The staining patterns of SOX9 (green) and SOX9 (red) are also different. (SOX9 (red) appears as dots, while the background for SOX9 (green) is too dark to tell whether its staining is also in the form of dots.) In summary, increasing the clarity of the staining makes it more convincing. Alternatively, use high magnification to display these results.

      Thank you for your comments. I have redyed and updated this part of the immunofluorescence staining results. Please refer to the files named Figure 1, Figure 2, Figure 5, and Figure 8.

      (7) In Figure 8, the author emphasized the co-localization of Sertoli cells and Germ cells at corresponding stages and did a lot of staining, but it was difficult to distinguish the specific locations of co-localization, which was similar to Figure 5E. If possible, please mark specific colocalizations with arrows or use high magnification to display these results, in order to facilitate readers to better understand.

      Thank you for your comments. We have re-stained and updated this part of the data. Please refer to the immunofluorescence staining data in the updated Figure 8.

      (8) The authors emphasize that macrophages may play an important role in spermatogenesis. Therefore, adding relevant macrophage staining to observe the differences in macrophage expression between NOA and OA should better support this idea.

      Thank you for your comments. Macrophage-related experiments will be further explored in the future.

      (9) Notch1/2/3 signaling and integrin were discovered to be involved in germ cell-Sertoli cell interaction. However there are currently no concrete experiments to support this hypothesis. At least simple verification experiments are needed.

      Thank you for your comments. Due to limited funding, studies will be carried out in the future.

      (10) Data availability statements should not be limited to the corresponding author, especially for big data analysis. This is crucial to the credibility of this data (Have the scRNA-seq and scATAC-seq in this study been deposited in GEO or other databases, and when will they be released to the public?) The data for such big data analysis needs to be saved in GEO or other databases in advance so that more research can use it.

      Thank you for your comments. We have deposited scRNA-seq and scATAC-seq data in NCBI. “ScRNA-seq data have been deposited in the NCBI Gene Expression Omnibus with the accession number GSE202647, and scATAC-seq data have been deposited in the NCBI database with the accession number PRJNA1177103.”

    1. eLife Assessment

      This manuscript describes a fundamental investigation of the functioning of Cas9 and in particular on how variant xCas9 expands DNA targeting ability by an increase-flexibility mechanism. The authors provide compelling evidence to support their mechanistic models and the relevance of flexibility and entropy in recognition. This work can be of interest to a broad community of structural biophysicists, computational biologists, chemists, and biochemists.

    2. Joint Public Review:

      Summary:

      Hossain and coworkers investigate the mechanisms of recognition of xCas9, a variant of Cas9 with expanded targeting capability for DNA. They do so by using molecular simulations and combining different flavors of simulation techniques, ranging from long classical MD simulations, to enhanced sampling, to free energy calculations of affinity differences. Through this, the authors are able to develop a consistent model of expanded recognition based on the enhanced flexibility of the protein receptor.

      Strengths:

      The paper is solidly based on the ability of the authors to master molecular simulations of highly complex systems. In my opinion, this paper shows no major weaknesses. The simulations are carried out in a technically sound way. Comparative analyses of different systems provide valuable insights, even within the well-known limitations of MD. Plus, the authors further investigate why xCas9 exhibits improved recognition of the TGG PAM sequence compared to SpCas9 via well-tempered metadynamics simulations focusing on the binding of R1335 to the G3 nucleobase and the DNA backbone in both SpCas9 and xCas9. In this context, the authors provide a free-energy profiling that helps support their final model.

      The implementation of FEP calculations to mimic directed evolution improvement of DNA binding is also interesting, original and well-conducted.

      Overall, my assessment of this paper is that it represents a strong manuscript, competently designed and conducted, and highly valuable from a technical point of view.

      Weaknesses:

      To make their impact even more general, the authors may consider expanding their discussion on entropic binding to other recent cases that have been presented in the literature recently (such as e.g. the identification of small molecules for Abeta peptides, or the identification of "fuzzy" mechanisms of binding to protein HMGB1). The point on flexibility helping adaptability and expansion of functional properties is important, and should probably be given more evidence and more direct links with a wider picture.

      Comments on revisions:

      We have read the revised version and the response letter and I find that this manuscript is ready. There is no need for further additions/revisions.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Strengths:

      The paper is solidly based on the ability of the authors to master molecular simulations of highly complex systems. In my opinion, this paper shows no major weaknesses. The simulations are carried out in a technically sound way. Comparative analyses of different systems provide valuable insights, even within the well-known limitations of MD. Plus, the authors further investigate why xCas9 exhibits improved recognition of the TGG PAM sequence compared to SpCas9 via well-tempered metadynamics simulations focusing on the binding of R1335 to the G3 nucleobase and the DNA backbone in both SpCas9 and xCas9. In this context, the authors provide a free-energy profiling that helps support their final model.

      The implementation of FEP calculations to mimic directed evolution improvement of DNA binding is also interesting, original and well-conducted.

      We thank the reviewer for their positive evaluation of our computational strategy. To further substantiate our findings, we have incorporated additional molecular dynamics and Free Energy Perturbation (FEP) calculations for the system bound to GAT. These results corroborate our previous observations obtained with AAG, reinforcing our conclusions.

      Overall, my assessment of this paper is that it represents a strong manuscript, competently designed and conducted, and highly valuable from a technical point of view.

      Weaknesses:

      To make their impact even more general, the authors may consider expanding their discussion on entropic binding to other recent cases that have been presented in the literature recently (such as e.g. the identification of small molecules for Abeta peptides, or the identification of "fuzzy" mechanisms of binding to protein HMGB1). The point on flexibility helping adaptability and expansion of functional properties is important, and should probably be given more evidence and more direct links with a wider picture.

      We have expanded our discussion on the role of entropy in favoring TGG binding to xCas9. To this end, we performed entropy calculations using the Quasi-Harmonic approximation (details provided in the Materials and Methods section). This analysis reveals that R1335 in xCas9 experiences an entropy increase compared to SpCas9, enhancing its adaptability and interaction with the DNA. This analysis and its explanation are detailed on pages 8-9.

      Additionally, we have enriched the Discussion section by clarifying how DNA binding is entropically favored in xCas9, thereby facilitating the recognition of alternative PAM sequences. A refined explanation is also included in the Conclusions section, where we contextualize xCas9 within a broader evolutionary framework of protein-DNA recognition. This highlights how structural flexibility can enable sequence diversity while maintaining high specificity.

      Recommendations for the authors:

      Overall, this is a very interesting and elegant manuscript with compelling results that shed light on the atomistic determinants of genetic-editing technologies.

      Since the paper proposes new findings that may be helpful for experimentalists, it would be interesting if the authors point out (in their discussion/conclusions) specific amino acids to mutate/target for future tests by the experimental community. This should just appear as an open hypothesis/proposal for new experiments.

      In the Conclusions, we have incorporated a discussion on how modifications in the PAM-binding cleft can enhance the recognition of alternative PAM sequences. As an illustrative example, we reference the recently developed SpRY Cas9 variant, which is capable of recognizing a broader range of PAMs. This variant includes mutations within the PAM-binding cleft that likely increase the flexibility of the interacting residues, as suggested by recent cryo-EM structures (Hibshman et al. Nat. Commun. 2024). The importance of fine-tuning the flexibility of the PAM-interacting cleft for engineering strategies has also been highlighted in the abstract.

      Overall, in light of the reviewer’s comments and in consideration of our findings, we revised the manuscript title in: “Flexibility in PAM Recognition Expands DNA Targeting in xCas9.” This new title better highlights the key findings from our research and contextualizes them within the broader goal of expanding DNA targeting capabilities, a critical priority for developing enhanced CRISPR-Cas systems.

    1. eLife Assessment

      This study provides important computational insights into the dynamics of PROTAC-induced degradation complexes, offering a convincing demonstration that differences in degradation efficacy can be linked to linker properties. The analyses address reproducibility considerations comprehensively, reinforcing the study's conclusions. Overall, these findings are significant for advancing cancer treatments and will be of broad interest to both biochemists and biophysicists.

    2. Reviewer #1 (Public review):

      This study by Wu et al. provides valuable computational insights into PROTAC-related protein complexes, focusing on linker roles, protein-protein interaction stability, and lysine residue accessibility. The findings are significant for PROTAC development in cancer treatment, particularly breast and prostate cancers.

      Strengths:

      (1) Comprehensive computational analysis of PROTAC-related protein complexes.<br /> (2) Focus on critical aspects: linker role, protein-protein interaction stability, and lysine accessibility.

      Weaknesses:

      (1) Limited examination of lysine accessibility despite its stated importance.<br /> (2) Use of RMSD as the primary metric for conformational assessment, which may overlook important local structural changes.

      The authors' claims about the role of PROTAC linkers and protein-protein interaction stability are generally supported by their computational data. However, the conclusions regarding lysine accessibility could be strengthened with more in-depth analysis. The use of the term "protein functional dynamics" is not fully justified by the presented work, which focuses primarily on structural dynamics rather than functional aspects.

      Comments on revisions:

      The authors have addressed the questions raised substantially.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript reports the computational study of the dynamics of PROTAC-induced degradation complexes. The research investigates how different linkers within PROTACs affect the formation and stability of ternary complexes between the target protein BRD4BD1 and Cereblon E3 ligase, and the degradation machinery. Using computational modeling, docking, and molecular dynamics simulations, the study demonstrates that although all PROTACs form ternary complexes, the linkers significantly influence the dynamics and efficacy of protein degradation. The findings highlight that the flexibility and positioning of Lys residues are crucial for successful ubiquitination. The results also discussed the correlated motions between the PROTAC linker and the complex.

      Strengths:

      The field of PROTAC discovery and design, characterized by its limited research, distinguishes itself from traditional binary ligand-protein interactions by forming a ternary complex involving two proteins. The current understanding of how the structure of PROTAC influences its degradation efficacy remains insufficient. This study investigated the atomic-level dynamics of the degradation complex, offering potentially valuable insights for future research into PROTAC degradability.

      Comments on revisions:

      All my questions have been addressed.

    4. Reviewer #3 (Public review):

      The authors offer an interesting computational study on the dynamics of PROTAC-driven protein degradation. They employed a combination of protein-protein docking, structural alignment, atomistic MD simulations, and post-analysis to model a series of CRBN-dBET-BRD4 ternary complexes, as well as the entire degradation machinery complex. These degraders, with different linker properties, were all capable of forming stable ternary complexes but had been shown experimentally to exhibit different degradation capabilities. While in the initial models of the degradation machinery complex, no surface Lys residue(s) of BRD4 were exposed sufficiently for the crucial ubiquitination step, MD simulations illustrated protein functional dynamics of the entire complex and local side-chain arrangements to bring Lys residue(s) to the catalytic pocket of E2/Ub for reactions. Using these simulations, the authors were able to present a hypothesis as to how linker property affects degradation potency. They were able to roughly correlate the distance of Lys residues to the catalytic pocket of E2/Ub with observed DC50/5h values. This is an interesting and timely study that presents interesting tools that could be used to guide future PROTAC design or optimization.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This study by Wu et al. provides valuable computational insights into PROTAC-related protein complexes, focusing on linker roles, protein-protein interaction stability, and lysine residue accessibility. The findings are significant for PROTAC development in cancer treatment, particularly breast and prostate cancers.

      The authors' claims about the role of PROTAC linkers and protein-protein interaction stability are generally supported by their computational data. However, the conclusions regarding lysine accessibility could be strengthened with more in-depth analysis. The use of the term "protein functional dynamics" is not fully justified by the presented work, which focuses primarily on structural dynamics rather than functional aspects.

      Strengths:

      (1) Comprehensive computational analysis of PROTAC-related protein complexes.

      (2) Focus on critical aspects: linker role, protein-protein interaction stability, and lysine accessibility.

      Weaknesses:

      (1) Limited examination of lysine accessibility despite its stated importance.

      (2) Use of RMSD as the primary metric for conformational assessment, which may overlook important local structural changes.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors' claims about the role of PROTAC linkers and protein-protein interaction stability are generally supported by their computational data. However, the conclusions regarding lysine accessibility could be strengthened with more in-depth analysis. Expand the analysis of lysine accessibility, potentially correlating it with other structural features such as linker length.

      We thank the reviewers for the suggestions! We performed time dependent correlation analysis to correlate the dihedral angles of the PROTACs and the Lys-Gly distance (Figures 6 and S17). We included detailed explanation on page 16:

      “To further examine the correlation between PROTAC rotation and the Lys-Gly interaction, we performed a time-dependent correlation analysis. This analysis showed that PROTAC rotation translates motion over time, leading to the Lys-Gly interaction, with a correlation peak around 60-85 ns, marking the time of the interaction (Figure 6 and Figure S17). In addition, the pseudo dihedral angles also showed a high correlation (0.85 in the case of dBET1) with Lys-Gly distance. This indicated that degradation complex undergoes structural rearrangement and drives the Lys-Gly interaction.”

      (2) The use of the term "protein functional dynamics" is not fully justified by the presented work, which focuses primarily on structural dynamics rather than functional aspects. Consider changing "protein functional dynamics" to "protein dynamics" to more accurately reflect the scope of the study.

      Thanks to the reviewer for the suggestion to use the more accurate terminology! We agreed with the reviewer that if we keep “protein functional dynamics” in the title, we should focus on how the “overall protein dynamic” links to the “function” – The function is directly related to PROTAC-induced structural dynamics which is commonly seen in “protein-structural-function” relationship, but it is not our main focus. Therefore, we changed the title to replace “functional” by “structural”.  

      (3) Incorporate more local and specific characterization methods in addition to RMSD for a more comprehensive conformational assessment.

      We thank the reviewer for the suggestion. We performed time dependent correlation analysis to understand how the rotation of PROTACs can translate to the Lys-Gly interaction. In addition, we performed dihedral entropies analysis for each dihedral angle in the linker of the PROTACs to better examine the flexibility of each PROTAC.

      We included detailed explanation at page 18: “Our dihedral entropies analysis showed that dBET57 has ~0.3 kcal/mol lower entropies than the other three linkers, suggesting dBET57 is less flexible than other PROTACs (Figure S18).”

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports the computational study of the dynamics of PROTAC-induced degradation complexes. The research investigates how different linkers within PROTACs affect the formation and stability of ternary complexes between the target protein BRD4BD1 and Cereblon E3 ligase, and the degradation machinery. Using computational modeling, docking, and molecular dynamics simulations, the study demonstrates that although all PROTACs form ternary complexes, the linkers significantly influence the dynamics and efficacy of protein degradation. The findings highlight that the flexibility and positioning of Lys residues are crucial for successful ubiquitination. The results also discussed the correlated motions between the PROTAC linker and the complex.

      Strengths:

      The field of PROTAC discovery and design, characterized by its limited research, distinguishes itself from traditional binary ligand-protein interactions by forming a ternary complex involving two proteins. The current understanding of how the structure of PROTAC influences its degradation efficacy remains insufficient. This study investigated the atomic-level dynamics of the degradation complex, offering potentially valuable insights for future research into PROTAC degradability.

      Reviewer #2 (Recommendations for the authors):

      (1) Regarding the modeling of the ternary complex, the BRD4 structure (3MXF) is from humans, whereas the CRBN structure in 4CI3 is derived from Gallus gallus. Is there a specific reason for not using structures from the same species, especially considering that human CRBN structures are available in the Protein Data Bank (e.g., 8OIZ, 4TZ4)?

      We appreciate the reviewer’s insightful comment regarding the choice of crystal structures of BRD4 and CRBN structures from two species. Our initial selection of 4CI3 for CRBN structure was based on its high resolution and publication in Nature journal. Furthermore, the Gallus gallus CRBN structure shares high degree of sequence and structural similarity with Homo sapiens CRBN, especially in the ligand binding region. At the time of our study, we were aware of 4TZ4 as Homo sapiens CRBN, however, we did not use this structure since no publication or detailed experimental was associated with it. Additionally, PDB 8OIZ, was not publicly available yet for other researchers to use at the time.

      (2) Based on the crystal structure (PDB ID: 6BNB) discussed in Reference 6, the ternary complex of dBET57 exhibits a conformation distinct from other PROTACs, with CRBN adopting an "open" conformation. Using the same CRBN structure for dBET57 as for other PROTACs might result in inaccurate docking outcomes.

      Thank you for the reviewer’s comment! As noted by the authors in Reference 6, the observed open conformation of CRBN in the dBET57 ternary complex may result from the high salt crystallization conditions, which could drive structural rearrangement, and crystal contacts that may induce this conformation. The authors also mentioned that this open conformation could, in part, reflect CRBN’s intrinsic plasticity. However, they acknowledged that further studies are needed to determine whether this conformational flexibility is a characteristic feature of CRBN that enables it to accommodate a variety of substrates. Despite these observations, we believe that the compatibility of the observed BRD4<sup>BD1</sup> binding conformation with both open and closed CRBN states suggests that these conformational changes are all possible. Therefore, we believe using the same initial CRBN structure for dBET57 as for other PROTACs can still reasonably reveal the dynamic nature of the ternary complex and would not significantly affect the accuracy of our docking outcomes either.

      (3) Figure 2 displays only a single frame from the simulations, which might not provide a comprehensive representation. Could a contact frequency heatmap of PROTAC with the proteins be included to offer a more detailed view?

      We thank the reviewer for the suggestion! We performed the contact map analysis to observe the average distance between PROTACs and BRD4<sup>BD1</sup> over 400ns of MD simulation (new Figure S4 added).

      We included detailed explanation at page 8 and 9: “The residues contact map throughout the 400ns MD simulation also showed different pattern of protein-protein interactions, indicating that the linkers were able to adopt different conformations (Figure S4).”

      (4) The conclusions in Figure 3 and S11 are based on a single 400 ns trajectory. The reproducibility of these results is therefore uncertain.

      We thank the reviewer for the suggestion! We added one more random seed MD simulation for each PROTAC to ensure the reproducibility of the results. The Result is shown in Figure S21 and the details for each MD run are updated in Table 1.

      (5) Figure 4 indicates significant differences between the first and last 100 ns of the simulations. Does this suggest that the simulations have not converged? If so, how can the statistical analysis presented in this paper be considered reliable?

      We thank the reviewers for the question. The simulation was initiated with a 10-15A gap between BRD4 and Ub to monitor the movement of degradation machinery and Lys-Gly interaction. The significant changes in pseudo dihedral in Figure 4 shows that the large-scale movement of the degradation complex can initiate the Lys-Gly binding. It does not relate to unstable sampling because the system remains very stable when BRD4 comes close to Ub.

      (6) In Figure 5, the dihedral angle of dBET57_#9MD1 is marked on a peptide bond. Shouldn't this angle have a high energy barrier for rotation?

      We thank the reviewers for catching the error! Indeed, it was an error that the dihedral angles were marked on the peptide bond. We reworked the figure and double checked our dihedral correlation analysis. The updated correlate dihedral angle selection and the correlation coefficient is shown in Figure 5.

      (7) Given that crystal structures for dBET 70, 23, and 57 are available, why is there a need to model the complex using protein-protein docking?

      We thank the reviewer for the feedback. Only dBET23 has the ternary complex available in a crystal structure, which has the PROTAC and both proteins, while dBET1, dBET57 and dBET70 are not completed as ternary complexes. Although dBET70 has a crystal structure, its PROTAC’s conformation is not resolved, and thus we decided to still perform protein-protein docking with dBET70. 

      We includeed the explanation at page 8: “Only dBET23 crystal structure is available with the PROTAC and both proteins, while the experimentally determined ternary complexes of dBET1, dBET57 and dBET70 are not available. “

      (8) On page 9, it is mentioned that "only one of the 12 PDB files had CRBN bound to DDB1 (PDB ID 4TZ4)." However, there are numerous structures of the DDB1-CRBN complex available, including those used for docking like 4CI3, as well as 4CI1, 4CI2, 8OIZ, etc.

      We thank the reviewers for the comment! We acknowledged the existence of several DDB1-CRBN complex crystal structures, such as PDB IDs 4CI1, 4CI2, 4CI3, and the more recent 8OIZ. For our study, we chose to use 4TZ4 to maintain consistency in complex construction and to align with the methodology established in a previously published JBC paper (https://doi.org/10.1016/j.jbc.2022.101653), which successfully utilized the same structure for a similar construct. At the time our study was conducted, the 8OIZ structure had not yet been released. We appreciate your suggestion and will consider incorporating alternative structures in future studies to further investigate our findings.

      (9) Table 2 is first referenced on page 8, while Table 1 is mentioned first on page 10. The numbering of these tables should be reversed to reflect their order of appearance in the text.

      We thank the reviewer for catching the error! We switched the order of Table 1 and Table 2.

      Reviewer #3 (Public review):

      The authors offer an interesting computational study on the dynamics of PROTAC-driven protein degradation. They employed a combination of protein-protein docking, structural alignment, atomistic MD simulations, and post-analysis to model a series of CRBN-dBET-BRD4 ternary complexes, as well as the entire degradation machinery complex. These degraders, with different linker properties, were all capable of forming stable ternary complexes but had been shown experimentally to exhibit different degradation capabilities. While in the initial models of the degradation machinery complex, no surface Lys residue(s) of BRD4 were exposed sufficiently for the crucial ubiquitination step, MD simulations illustrated protein functional dynamics of the entire complex and local side-chain arrangements to bring Lys residue(s) to the catalytic pocket of E2/Ub for reactions. Using these simulations, the authors were able to present a hypothesis as to how linker property affects degradation potency. They were able to roughly correlate the distance of Lys residues to the catalytic pocket of E2/Ub with observed DC50/5h values. This is an interesting and timely study that presents interesting tools that could be used to guide future PROTAC design or optimization.

      Reviewer #3 (Recommendations for the authors):

      (1) My most important comment refers to the MM/PBSA analysis, the results of which are shown in Figure S9: binding affinities of -40 to -50 kcal/mol are unrealistic. This would correspond to a dissociation constant of 10^-37 M. This analysis needs to be removed or corrected.

      We thank the reviewer for the comment! MM/PBSA analysis indeed cannot give realistic binding free energy. It does not include the configurational entropy loss which should be a large positive value. In addition, while the implicit PBSA solvent model computes solvation free energy, the absolute values may not be very accurate. However, because this is a commonly used energy calculation, and some readers may like to see quantitative values to ensure that the systems have stable intermolecular attractions, we kept the analysis in SI. We edited the figure legend, moved the Figure S10 in SI page 19, and added sentences to clearly state that the calculations did not include configuration entropy loss “Note that the energy calculations focus on non-bonded intermolecular interactions and solvation free energy calculations using MM/PBSA, where the configuration entropy loss during protein binding was not explicitly included. “.

      (2) I think that the analysis of what in the different dBETx makes them cause different degradation potency is underdeveloped. The dihedral angle analysis (Figure 4B) did not explain the observed behavior in my opinion. Please add additional, clearer analysis as to what structural differences in the dBETx make them sample very different conformations.

      We thank the reviewer for the suggestions! Based on the suggestion, we further performed dihedral entropy analysis for each dihedral angle in the linker part of the PROTAC to examine the flexibility of each PROTAC. Because each PROTAC has a different linker, we now clearly label them in a new Figure S18 in SI page 27. Low dihedral entropies indicate a more rigid structure and thus less flexibility to make a PROTAC more difficult to rearrange and facilitate the protein structural dynamic necessary for ubiquitination.

      We added detailed explanation on page 18: “Our dihedral entropy analysis showed that dBET57 has ~0.3 kcal/mol lower configuration entropies than the other dBETs with three different linkers, suggesting that dBET57 is less flexible than the other PROTACs (Figure S18).”

      (3) "The movement of the degradation machinery correlated with rotations of specific dihedrals of the linker region in dBETs (Figure 5).": this is not sufficiently clear from the figure. Definitely not in a quantitative way.

      We thank the reviewers for the suggestions! To further understand the correlation between PROTACs dihedral angles and the movement of degradation machinery, we performed time dependent correlation analysis to correlate the dihedral angles of the PROTACs and the Lys-Gly distance (Figures 6 and S17).

      We included detailed explanation on page 16:

      “To further examine the correlation between PROTAC rotation and the Lys-Gly interaction, we performed a time-dependent correlation analysis. This analysis showed that PROTAC rotation translates motion over time, leading to the Lys-Gly interaction, with a correlation peak around 60-85 ns, marking the time of the interaction (Figure 6 and Figure S17). In addition, the pseudo dihedral angles also showed a high correlation (0.85 in the case of dBET1) with Lys-Gly distance. This indicated that degradation complex undergoes structural rearrangement and drives the Lys-Gly interaction.

      (4) Cartoons are needed at multiple stages throughout the paper to enhance the clarity of what the modeled complexes looked like (e.g. which subunits they contained).

      We thank the reviewers for the suggestions. We added and remade several Figures with cartoons to better represent the stages. We also used higher resolution and included clearer labels for each protein system.

      (5) The difference between CRL4A E3 ligase and CRBN E3 ligase is not clear to the non-expert reader.

      Thanks for the reviewer’s comment! To clarify the terms "CRL4A E3 ligase" and "CRBN E3 ligase", which refer to different levels of description for the protein complexes, we added a couple of sentences in the Figure 1 legend. As a result, the non-expert readers can clearly know the differences.

      As illustrated in Figure 1,

      • CRL4A E3 ligase refers to the full E3 ligase complex, which includes all protein components such as CRBN, DDB1, CUL4A, and RBX1.

      • CRBN E3 ligase, on the other hand, is a more colloquial term typically used to describe just the CRBN protein, often in isolation from the full CRL4A complex.

      (6) Figure 1, legend: unclear why it's E3 in A and E2 in B.

      We thank the reviewer for the question! E3 ligase in Figure 1A refers to CRBN E3 ligase, where researchers also simply term it CRBN. We have added a sentence to specify that CRBN E3 ligase is also termed CRBN for simplicity. In Figure 1B, E2 was unclear in the sentences. The full name of E2 should be E2 ubiquitin-conjugating enzyme. Because the name is a bit long, researchers also call it E2 enzyme. We have corrected it and used E2 enzyme to make it clearer. 

      (7) "Although the protein-protein binding affinities were similar, other degraders such as dBET1 and dBET57 had a DC50/5h of about 500 nM". It's unclear what experimental data supports the assertion that the protein-protein binding affinities are similar.

      We thank reviewer for the question. Indeed, the statement is unclear.

      We corrected the sentence in page 6: “Although utilizing the exact same warheads, other degraders such as dBET1 and dBET57 had a DC<sub>50/5h</sub> of about 500 nM.”

      (8) Was the construction of the degradation machinery complex guided by experimental data (maybe cryo-EM or tomography)? If not, what is the accuracy of the starting complex for MD? This may impact the reliability of the obtained results.

      Thank you for your insightful comments! Yes, the construction of the degradation machinery complex was guided by available high-resolution crystal structures, which was selected to maintain consistency and align with the methodology established in a previously published JBC paper (https://doi.org/10.1016/j.jbc.2022.101653).

      We acknowledged that static crystal structures represent only a single snapshot of the system and may not capture the full conformational flexibility of the complex. To address this limitation, we performed MD simulations using multiple starting structures. This approach allowed us to explore a broader conformational landscape and reduced the dependence on any single starting configuration, thereby enhancing the reliability of the results.

      We hope this clarifies the robustness of our methodology and the steps taken to ensure accuracy in our simulations.

      (9) "With quantitative data, we revealed the mechanism underlying dBETx-induced degradation machinery": I think this may be too strong of an assertion. The authors may have developed a mechanistic hypothesis that can be tested experimentally in the future.

      We thank the reviewer for the suggestion. This is indeed a strong assertion and needs to be modified. We edited the sentence in page 7: “With quantitative data, we revealed the importance of the structural dynamics of dBETx-induced motions, which arrange positions of surface lysine residues of BRD4<sup>BD1</sup> and the entire degradation machinery.”

      (10) Figure S2: are the RMSDs calculated over all residues? Or just the BRD4 residues? Given that the structures are aligned with respect to CRBN, the reported RMSD numbers might be artificially low since there are many more CRBN residues than there are BRD4 residues. Also, why weren't the crystal structures used for dBET 23 and 70 for the modeling? Wouldn't you want to use the most accurate possible structures? Simulations were run for 23. Why not for 70?

      We thank the reviewer for the suggestion. We added a sentence to more clearly explain the RMSD calculations in Figure S2: “The structural superposition is performed based on the backbone of CRBN and RMSD calculation is conducted based on the backbone of BRD4<sup>BD1</sup>.”

      Although dBET70 has crystal structure, its PROTAC structure is not resolved, and thus we decided to still perform protein-protein docking with dBET70.  dBET1 and dBET57 do not have a crystal structure for the ternary complexes.

      We included the explanation at page 8: “Only dBET23 crystal structure is available with the PROTACs and both proteins, while the experimentally determined ternary complexes of dBET1, PROTACs of dBET57 and dBET70 are not available. “

      a. And there are no crystal structures available for 1 and 57? If so, please clearly say that. Otherwise please report the RMSD.

      We thank the reviewer for the suggestion. We included the explanation at page 8: “Only dBET23 crystal structure is available with the PROTACs and both proteins, while the experimentally determined ternary complexes of dBET1, PROTACs of dBET57 and dBET70 are not available.”

      (11) Table 2 is referenced before Table 1.

      We thank the reviewer for catching the error! We switched the order for Table 1 and Table 2.

      (12) Figure S3 is not referenced in the main paper.

      We thank the reviewer for catching the error! We now referred Figure S3 on page7.

      (13) Minor comments on grammar and sentence structure:

      a. It should be "binding of a ternary complex"

      b. "Our shows the importance": word missing.

      c. "...providing insights into potential orientations for ubiquitination. observe whether the preferred conformations are pre-organized for ubiquitination." Word or words missing.

      We thank reviewer for catching the errors! We corrected grammatical errors and unclear sentences throughout the entire paper and revised the sentences to make them easily understandable for non-expert readers.

    1. eLife Assessment

      This study provides a valuable approach to image and analyze in vivo metabolic flux through glucose turnover kinetics in glioblastoma tumor microenvironments. The evidence for the method's validity is convincing, which establishes the dynamic Deuterium Metabolic Imaging technique as an effective tool enabling non-invasive exploration of various tumors.

    2. Reviewer #1 (Public review):

      In the resubmission Simões et al. emphasize the efficacy of their novel, non-invasive imaging methodology in mapping glucose-kinetics to predict key tumor features in two commonly used syngeneic mouse models of glioblastoma. The authors highlight that DGE-DMI has the potential to capture metabolic fluxes with greater sensitivity and acknowledge that future validation of DGE-DMI in patient-derived and spontaneous GBM models, as well as in the context of genetic manipulation of metabolism, would strengthen its clinical application. To further demonstrate the ability of DGE-DMI to predict tumor features, they included an assessment of myeloid cell infiltration along with proliferation, peritumoral invasion, and distant migration. Overall, the authors offer a novel method to the scientific community that can be further tested and adapted for interrogating GBM heterogeneity.

    3. Reviewer #3 (Public review):

      Summary:

      Simoes et al enhanced dynamic glucose-enhanced (DGE) deuterium spectroscopy with Deuterium Metabolic Imaging (DMI) to characterize the kinetics of glucose conversion in two murine models of glioblastoma (GBM). The authors combined spectroscopic imaging and noise attenuation with histological analysis and showcased the efficacy of metabolic markers determined from DGE DMI to correlate with histological features of the tumors. This approach is also potent to differentiate the two models from GL261 and CT2A.

      Strengths:

      The primary strength of this study is to highlight the significance of DGE DMI to interrogate the metabolic flux from glucose. The authors focused on glutamine/glutamate and lactate. They attempted to correlate the imaging findings with in-depth histological analysis to depict the link between metabolic features and pathological characteristics such as cell density, infiltration, and distant migration.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work describes a convincingly validated non-invasive tool for in vivo metabolic phenotyping of aggressive brain tumors in mice brains. The analysis provides a valuable technique that tackles the unmet need for patient stratification and hence for early assessment of therapeutic efficacy. However, wider clinical applicability of the findings can be attained by expanding the work to include more diverse tumor models.

      We thank the Editors for their comments. This concern was also raised by Reviewer 1 in the Public Review, where we address in more detail – please refer to comment PR-R1.C1. In brief, we agree that a more clinically relevant model should provide more translatable results to patients, and acknowledge this better in the revised manuscript: page 18 (lines 14-17), “While patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features, and genetic manipulation of glycolysis and mitochondrial oxidation pathways potentially relevant to ascertain DGE-DMI sensitivity for their quantification, (…)”. However, we also believe that the potential of DGE-DMI for application to different glioblastoma models or patients is demonstrated clearly enough with the two immunocompetent models we chose, extensively reported in the literature as reliable models of glioblastoma.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work introduces a new imaging tool for profiling tumor microenvironments through glucose conversion kinetics. Using GL261 and CT2A intracranial mouse models, the authors demonstrated that tumor lactate turnover mimicked the glioblastoma phenotype, and differences in peritumoral glutamate-glutamine recycling correlated with tumor invasion capacity, aligning with histopathological characterization. This paper presents a novel method to image and quantify glucose metabolites, reducing background noise and improving the predictability of multiple tumor features. It is, therefore, a valuable tool for studying glioblastoma in mouse models and enhances the understanding of the metabolic heterogeneity of glioblastoma.

      Strengths:

      By combining novel spectroscopic imaging modalities and recent advances in noise attenuation, Simões et al. improve upon their previously published Dynamic Glucose-Enhanced deuterium metabolic imaging (DGE-DMI) method to resolve spatiotemporal glucose flux rates in two commonly used syngeneic GBM mouse models, CT2A and GL261. This method can be standardized and further enhanced by using tensor PCA for spectral denoising, which improves kinetic modeling performance. It enables the glioblastoma mouse model to be assessed and quantified with higher accuracy using imaging methods.

      The study also demonstrated the potential of DGE-DMI by providing spectroscopic imaging of glucose metabolic fluxes in both the tumor and tumor border regions. By comparing these results with histopathological characterization, the authors showed that DGE-DMI could be a powerful tool for analyzing multiple aspects of mouse glioblastoma, such as cell density and proliferation, peritumoral infiltration, and distant migration.

      Weaknesses:

      (1) Although the paper provides clear evidence that DGE-DMI is a potentially powerful tool for the mouse glioblastoma model, it fails to use this new method to discover novel features of tumors. The data presented mainly confirm tumor features that have been previously reported. While this demonstrates that DGE-DMI is a reliable imaging tool in such circumstances, it also diminishes the novelty of the study.

      PR-R1.C1 – We thank the Reviewer for the detailed analysis and reply below to each point. PR-R1.C1.1 - novelty: We thank the Reviewer for the comments and understand their perspective. While we acknowledge that our paper is more methodologically oriented, we also believe that significant methodological advances are critical for new discoveries. This was our main motivation and is demonstrated in the present work, showing the ability to map in vivo metabolic fluxes in mouse glioma, a “hot topic” and very desirable in the cancer field. 

      PR-R1.C1.2 – additional tumor features: To strengthen the biological relevance of this methodologic novelty, we have now included immune cell infiltration among the tumor features assessed, besides perfusion, histopathology, cellularity and cell proliferation. For this, we performed iba-1 immunostaining for microglia/ macrophages, now included in Fig. 2-B. These new results demonstrate significantly higher microglia/macrophage infiltration in CT2A tumors compared to GL261, particularly at the tumor border. This is very consistent with the respective tumor phenotypes, namely differences in cell density and cellularity between the 2 cohorts and across pooled cohorts, as we now report: page 9 (lines 10-18), “Such phenotype differences were reflected in the regional infiltration of microglia/macrophages: significantly higher at the CT2A peritumoral rim (PT-Rim) compared to GL261, and slightly higher in the tumor region as well (Fig 2B). Further quantitative regional analysis of Tumor-to-PT-Rim ROI ratios revealed: (i) 47% lower cell density (p=0.004) and 32% higher cell proliferation (p=0.026) in GL261 compared to CT2A (Fig 2C, Table S3); and (ii) strong negative correlations in pooled cohorts between microglia/macrophage infiltration and cellularity (R=-0.91, p=<0.001) or cell density (R=-0.77, p=0.016), suggesting more circumscribed tumor growth with higher peripheral/peritumoral infiltration of immune cells.”; and page 16 (lines 13-19), “GL261 tumors were examined earlier after induction than CT2A (17±0 vs. 30±5 days, p = 0.032), displaying similar volumes (57±6 vs. 60±14, p = 0.813) but increased vascular permeability (8.5±1.1 vs 4.3±0.5 10<sup>3</sup>/min: +98%, p=0.001),  more disrupted stromal-vascular phenotypes and infiltrative growth (5/5 vs 0/5), consistent with significantly lower tumor cell density (4.9±0.2 vs. 8.2±0.3 10<sup>-3</sup> cells/µm<sup>2</sup>: -40%, p<0.001) and lower peritumoral rim infiltration of microglia/macrophages (2.1±0.7 vs. 10.0±2.3 %: -77%, p=0.008)”.

      PR-R1.C1.3 – new tumor features and DGE-DMI: Importantly, such regional differences in cellularity/cell density and immune cell infiltration between the two cohorts were remarkably mirrored by the lactate turnover maps (Fig 3-C), as we now report in the manuscript: page 12 (lines 6-15), “GL261 tumors accumulated significantly less lactate in the core (1.60±0.25 vs 2.91±0.33 mM: -45%, p=0.013) and peritumor margin regions (0.94±0.09 vs 1.46±0.17 mM: 36%, p=0.025) than CT2A – Fig 3 A-B, Table S1. Consistently, tumor lactate accumulation correlated with tumor cellularity in pooled cohorts (R=0.74, p=0.014). Then, lower tumor lactate levels were associated with higher lactate elimination rate, k<sub>lac</sub> (0.11±0.1 vs 0.06±0.01 mM/min: +94%, p=0.006) – Fig 3B – which in turn correlated inversely with peritumoral rim infiltration of microglia/macrophages in pooled cohorts (R=-0.73, p=0.027) – Fig 3-C. Further analysis of Tumor/P-Margin metabolic ratios (Table S3) revealed: (i) +38% glucose (p=0.002) and -17% lactate (p=0.038) concentrations, and +55% higher lactate consumption rate (p=0.040) in the GL261 cohort; and (ii) lactate ratios across those regions reflected the respective cell density ratios in pooled cohorts (R=0.77, p=0.010) – Fig 3-C”. This is a novel, relevant feature compared to our previous work, as highlighted in our discussion: page 17 (lines 1-8), “Tumor vs peritumor border analyses further suggest that lactate metabolism reflects regional histologic differences:

      lactate accumulation mirrors cell density gradients between and across the two cohorts; whereas lactate consumption/elimination rate coarsely reflects cohort differences in cell proliferation, and inversely correlates with peritumoral infiltration by microglia/macrophages across both cohorts. This is consistent with GL261’s lower cell density and cohesiveness, more disrupted stromal-vascular phenotypes, and infiltrative growth pattern at the peritumor margin area, where less immune cell infiltration is detected and relatively lower cell division is expected [43]”.

      We trust that these new features recovered from DGE-DMI (Fig 2-B and Fig 3-C) show its potential for new discoveries in glioblastoma.

      (2) When using DGE-DMI to quantitatively map glycolysis and mitochondrial oxidation fluxes, there is no comparison with other methods to directly identify the changes. This makes it difficult to assess how sensitive DGE-DMI is in detecting differences in glycolysis and mitochondrial oxidation fluxes, which undermines the claim of its potential for in vivo GBM phenotyping.

      PR-R1.C2: We thank the reviewer for raising this important point. The validity of the method for mapping specific metabolic kinetics in mouse glioma was reported in our previous work, using the same animal models, as specified in the introduction (page 4, lines 10-13): “we recently (…) propose[d] Dynamic Glucose-Enhanced (DGE) 2H-MRS [31], demonstrating its ability to quantify glucose fluxes through glycolysis and mitochondrial oxidation pathways in vivo in mouse GBM (…)”. Therefore, this was not reproduced in the present work. 

      In brief, our DGE-DMI results are very consistent with our previous study, where DGE single voxel deuterium spectroscopy was performed in the same tumor models with higher temporal resolution and SNR (as state on page 16, lines 9-10: glycolytic lactate synthesis rate, 0.59±0.04 vs. 0.55±0.07 mM/min; glucose-derived glutamate-glutamine synthesis rate, 0.28±0.06 vs. 0.40±0.08 mM/min), which in turn matched well the values reported by others for glucose consumption rate through: 

      (i) glycolysis, in different tumor models including mouse lymphoma in vivo (0.99 mM/min, by DGE-DMI (Kreis et al. 2020), rat breast carcinoma in situ (1.43 mM/min, using a biochemical assay (Kallinowski et al. 1988), and even perfused GBM cells (1.35 fmol min<sup>−1</sup> cell<sup>−1</sup>, according to Hyperpolarized 13C-MRS (Jeong et al. 2017), very similar to our previous in vivo measurements in GL261 tumors: 0.50 ± 0.07 mM min<sup>−1</sup> = 1.25 ± 0.16 fmol min<sup>−1</sup> cell<sup>−1</sup> (Simoes et al. 2022)); 

      (ii) mitochondrial oxidation, very similar to previous in vivo measurements in mouse GBM xenografts (0.33 mM min<sup>−1</sup>, using 13C spectroscopy (Lai et al. 2018)), and particularly to our in situ measurements in cell culture for (GL261, 0.69 ± 0.09 fmol min<sup>−1</sup> cell<sup>−1</sup>; and CT2A 0.44 ± 0.08 fmol min<sup>−1</sup> cell<sup>−1</sup>), remarkably similar to the in vivo measurements in the respective tumors in vivo (Gl261, 0.32 ± 0.10 mM min<sup>−1</sup> = 0.77 ± 0.23 fmol min<sup>−1</sup> cell<sup>−1</sup>; and CT2A, 0.51 ± 0.11 mM min<sup>−1</sup> = 0.60 ± 0.12 fmol min<sup>−1</sup> cell<sup>−1</sup>) (Simoes et al. 2022)). 

      (3) The study only used intracranial injections of two mouse glioblastoma cell lines, which limits the application of DGE-DMI in detecting and characterizing de novo glioblastomas. A de novo mouse model can show tumor growth progression and is more heterogeneous than a cell line injection model. Demonstrating that DGE-DMI performs well in a more clinically relevant model would better support its claimed potential usage in patients.

      PR-R1.C3: We agree that a more clinically relevant model, such as the one suggested by the Reviewer, would in principle be better suited to provide more translatable results to patients. We however believe that the potential of DGE-DMI for application to different glioblastoma models or patients, with GBM or any other types of brain tumors for that matter, is demonstrated clearly enough with the two syngeneic models we chose, given their robustness and general acceptance in the literature as reliable immunocompetent models of GBM, and for their different histologic and metabolic properties. This way we could fully focus on the novel metabolic imaging method, as compared to our previous single-voxel approach. While both tumor cohorts (GL261 and CT2A) were studied at more advanced stages of tumor progression, the metabolic differences depicted are consistent with the histopathologic features reported, as discussed in the manuscript; namely, the lower glucose oxidation rates. We have now modified the manuscript to highlight this point: page 18 (lines 12-14), “While patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features, and genetic manipulation of glycolysis and mitochondrial oxidation pathways could be relevant to ascertain DGE-DMI sensitivity for their quantification, (…)”.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors attempt to noninvasively image metabolic aspects of the tumor microenvironment in vivo, in 2 mouse models of glioblastoma. The tumor lesion and its surrounding appearance are extensively characterized using histology to validate/support any observations made with the metabolic imaging approach. The metabolic imaging method builds on a previously used approach by the authors and others to measure the kinetics of deuterated glucose metabolism using dynamic 2H magnetic resonance spectroscopic imaging (MRSI), supported by de-noising methods.

      Strengths:

      Extensive histological evaluation and characterization.

      Measurement of the time course of isotope labeling to estimate absolute flux rates of glucose metabolism.

      Weaknesses:

      (1) The de-noising method appears essential to achieve the high spatial resolution of the in vivo imaging to be compatible with the dimensions of the tumor microenvironment, here defined as the immediately adjacent rim of the mouse brain tumors. There are a few challenges with this approach. Often denoising methods applied to MR spectroscopy data have merely a cosmetic effect but the actual quantification of the peaks in the spectra is not more accurate than when applied directly to original non-denoised data. It is not clear if this concern is applicable to the denoising technique applied here. However, even if this is not an issue, no denoising method can truly increase the original spatial resolution at which data were acquired. A quick calculation estimates that the spatial resolution of the 2H MRSI used here is 30-40 times too low to capture the much smaller tumor rim volume, and therefore there is concern that normal brain tissue and tumor tissue will be the dominant metabolic signal in so-called tumor rim voxels. This means that the conclusions on metabolic features of the (much larger) tumor are much more robust than the observations attributed to the (much smaller) tumor microenvironment/tumor rim.

      PR-R2.C1: We thank the Reviewer for the constructive comments regarding resolution and tumor rim, and denoising. These issues were raised more extensively in the section Recommendations For The Authors, where they are addressed in detailed (RA-R2.C2). In summary, we agree with the Reviewer that no denoising method can increase the nominal resolution; not was that our purpose. Thus, we clarify the relevance of spectral matrix interpolation in MRSI, and how our display resolution should in principle provide a better approximation to the ground truth than the nominal resolution, relevant for ROI analysis in the tumor margin. While we further show relevant correlations between metabolic maps and histologic features in tumor core and margin, we agree with the reviewer that our observations in the tumor core are more robust than those in the margin, and acknowledge this in the Discussion: page 19, lines 6-10: “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamate-glutamine, and/or increase the nominal spatial resolution to better correlate those metabolic results with histology findings (e.g. in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55]).”

      (2) To achieve their goal of high-level metabolic characterization the authors set out to measure the deuterium labeling kinetics following an intravenous bolus of deuterated glucose, instead of the easier measurement of steady-state after the labeling has leveled off. These dynamic data are then used as input for a mathematical model of glucose metabolism to derive fluxes in absolute units. While this is conceptually a well-accepted approach there are concerns about the validity of the included assumptions in the metabolic model, and some of the model's equations and/or defining of fluxes, that seem different than those used by others.

      PR-R2.C2: These concerns about the metabolic model, were also raised in more detail in the section Recommendations For The Authors, where they are addressed more extensively – please refer to RA-R2.C3 (glucose infusion protocol) and RA-R2.C4 (equations). In brief, we explain that the total volume injected (100uL/25g animal) is standard for i.v. administration in mice, and clarify this better in the manuscript (page 24, line 23); as well as the differences between our kinetic model and the original one reported by Kreis et al. (Radiology 2020), who quantified glycolysis kinetics on a subcutaneous mouse model of lymphoma, exclusively glycolytic and thus estimating the maximum glucose flux rate was from the lactate synthesis rate (Vmax = Vlac). Instead, we extended this model to account for glucose flux rates for lactate synthesis (Vlac) and also for glutamate-glutamine synthesis (Vglx) in mouse glioblastoma, where Vmax = Vlac + Vglx, also acknowledging its simplistic approach in the Discussion (page 20, lines 22-24: “(…) metabolic fluxes [estimations] through glycolysis and mitochondrial oxidation (…) could potentially benefit from an improved kinetic model simultaneously assessing cerebral glucose and oxygen metabolism, as recently demonstrated in the rat brain with a combination of 2H and 17O MR spectroscopy [62] (…)”).

      Reviewer #3 (Public Review):

      Summary:

      Simoes et al enhanced dynamic glucose-enhanced (DGE) deuterium spectroscopy with Deuterium Metabolic Imaging (DMI) to characterize the kinetics of glucose conversion in two murine models of glioblastoma (GBM). The authors combined spectroscopic imaging and noise attenuation with histological analysis and showcased the efficacy of metabolic markers determined from DGE DMI to correlate with histological features of the tumors. This approach is also potent to differentiate the two models from GL261 and CT2A.

      Strengths:

      The primary strength of this study is to highlight the significance of DGE DMI in interrogating the metabolic flux from glucose. The authors focused on glutamine/glutamate and lactate. They attempted to correlate the imaging findings with in-depth histological analysis to depict the link between metabolic features and pathological characteristics such as cell density, infiltration, and distant migration.

      Weaknesses:

      (1) A lack of genetic interrogation is a major weakness of this study. It was unclear what underlying genetic/epigenetic aberrations in GL261 and CT2A account for the metabolic difference observed with DGE DMI. A correlative metabolic confirmation using mass spectrometry of the two tumor specimens would give insight into the observed imaging findings.

      PR-R3.C1: We thank the Reviewer for the helpful comments, which we break down below.

      PR-R3.C1.1 - genetic interrogation/manipulation: While we did not have access to conditional models for key enzymes of each metabolic pathway, for their genetic manipulation, we did however assess the mitochondrial function in each cell line, showing a significantly higher respiration buffer capacity and more efficient metabolic plasticity between glycolysis and mitochondrial oxidation in GL261 cells compared to CT2A (Simoes et al. NIMG:Clin 2022). This could drive e.g. more active recycling of lactate through mitochondrial metabolism in GL261 cells, aligned with our observations of increased glucose-derived lactate consumption rate in those tumors compared to CT2A. We have now included this in the discussion (page 17, lines 812): “our results suggest increased lactate consumption rate (active recycling) in GL261 tumors with higher vascular permeability, e.g. as a metabolic substrate for oxidative metabolism [44] promoting GBM cell survival and invasion [45], aligned with the higher respiration buffer capacity and more efficient metabolic plasticity of GL261 cells than CT2A [31].”

      PR-R3.C1.2 - correlation with post-mortem metabolic assessment: implementing this validation step would require an additional equipment, also not accessible to us: focalized irradiator, to instantly halt all metabolic reactions during animal sacrifice. We do believe that DGE-DMI could guide further studies of such nature, aimed at validating the spatio-temporal dynamics of regional metabolite concentrations in mouse brain tumors. Thus, the importance of end-point validation is now stressed more clearly in the manuscript (page 20, lines 13-16): “(…) mapping pathway fluxes alongside de novo concentrations (…) may be determinant for the longitudinal assessment of GBM progression, with end-point validation (…)”.

      These concerns and recommendations were also raised by the Reviewer in the Recommendations to Authors section, where we address them more extensively – please see RA-R1.C3 and RA-R1.C2, respectively.

      (2) A better depiction of the imaging features and tumor heterogeneity would support the authors' multimodal attempt.

      PR-R3.C2: We agree with the Reviewer that including more imaging features would improve the non-invasive characterization of each tumor. Due to the RF coil design and time constraints, we did not acquire additional data, such as diffusion MRI to assess tissue microstructure. Instead, our multi-modal protocol included two dynamic MRI studies on each animal, for multiparametric assessment of tumor volume, metabolism and vascular permeability, using 1H-MRI, 2H-spectroscopy during 2H-labelled glucose injection, and 1H-imaging during Gd-DOTA injection, respectively. Rather than aiming at tumor radiomics, we focused on the dynamic assessment of tumor metabolic turnover with heteronuclear spectroscopy, which is challenging per se and particularly in mouse brain tumors, given their very small size. For such multi-modal studies we used a previously developed dual tuned RF coil: the deuterium coil (2H) positioned in the mouse head, for optimal SNR; whereas the proton coil (1H) had suboptimal performance compared a conventional single tuned coil, and was used only for basic localization and adjustments, reference imaging and tumor volumetry (T2-weighted), and DCE-T1 MRI (T1weighted). The latter was analyzed pixel-wise to assess spatial correlations between tumor permeability and metabolic metrics, as shown in Fig S3. Whereas the limited T2w MRI data collected was only analyzed for tumor volume assessment; no additional imaging features were extracted (e.g. kurtosis/skewness), since such assessment did not shown any differences between the two tumor cohorts in our previous study (Simoes et al NIMG:Clin 2022).

      (3) Integration of the various cell types in the tumor microenvironment, as allowed with the resolution of DGE DMI, will explain the observed difference between GL261 and CT2A. Is there a higher percentage of infiltrative "other cells" observed in GL261 tumor?

      PR-R3.C3: While DGE-DMI resolution is far larger than brain and brain tumor cell sizes, we now performed additional analysis to assess the percentage of microglia/macrophages in both cohorts. The results are now included in the manuscript, namely Fig. 2B, as previously explained in PR-R1.1. Interestingly though, we observed a lower percentage of infiltrative "other cells" in GL261 tumors compared to CT2A, which we discuss in the manuscript: pages 19-20 (lines 20-24 and 1-4), “Finally, our results are indicative of higher microglia/macrophage infiltration in CT2A than GL261 tumors, which is inconsistent with another study reporting higher immunogenicity of GL261 tumors than CT2A for microglia and macrophage populations [56]. Such discrepancy could be related to methodologic differences between the two studies, namely the endpointguided assessment of tumor growth (bioluminescence vs MRI, more precise volumetric estimations) and the stage when tumors were studied (GL261 at 23-28 vs 16-18 days postinjection, i.e. less time for immune cell to infiltration in our case), presence/absence of a cell transformation step (GFP-Fluc engineered vs we used original cell lines), or perhaps media conditioning effects during cell culture due to the different formulations used (DMEM vs RPMI).”

      (4) This underlying technology with DGE DMI is capable of identifying more heterogeneous GBM tumors. A validation cohort of additional in vivo models will offer additional support to the potential clinical impact of this study.

      PR-R3.C4: We agree with the Reviewer that applying DGE-DMI to more clinically-relevant models of human brain tumors will enhance its translational impact to patients, as also suggested by Reviewer 1 and addressed in PR-R1.C3. We also believe that the feasibility and potential of DGE-DMI for application to different glioblastoma models or patients, with GBM or any other primary or secondary brain tumors, is clearly demonstrated in our work, using two reliable and well-described immunocompetent models of GBM. In any case, we have now modified the manuscript to better acknowledge this point: page 18 (lines 14-16), “(…) patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features (…)”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors utilize longitudinal MRI to track tumor volumes but perform DMI at endpoint with late-stage tumors. Their previous publication applied metabolic imaging in tumors before the presence of necrosis. It would be valuable to perform longitudinal DMI to examine the evolution of glucose flux metabolic profile over time in the same tumor.

      RA-R1.C1: We thank the Reviewer for the very useful comments to our manuscript. We agree – in this work, we aimed at “extending” our previous DGE-2H single-voxel methodology to multivoxel (DMI), thoroughly demonstrating (1) its in vivo application to the same immunocompetent models of glioblastoma and (2) the ability to depict their phenotypic differences, and therefore (3) the potential for the metabolic characterization of more advanced models of GBM and/or their progression stages. We believe these objectives were achieved. Our results indeed open several possibilities, from longitudinal assessment of the spatio-temporal metabolic changes during GBM progression (and treatment-response) to its application to other models recapitulating more closely the human disease. Now that we have comprehensively demonstrated a protocol for DGE-DMI acquisition, processing and analysis in mouse GBM (a very challenging methodology), and demonstrate it in different mouse GBM cell lines, new studies can be designed to tackle more specific questions, like the one suggested here by the Reviewer. We have modified the manuscript to make this point clearer: page 20 (lines 15-17), “This may be determinant for the longitudinal assessment of GBM progression, with end-point validation; and/or treatment-response, to help selecting among new therapeutic modalities targeting GBM metabolism (…)”; page 21 (lines 5-8), “(…) we report a DGE-DMI method for quantitative mapping of glycolysis and mitochondrial oxidation fluxes in mouse GBM, highlighting its importance for metabolic characterization and potential for in vivo GBM phenotyping in different models and progression stages.”.

      (2) The authors demonstrate a promising correlation between metabolic phenotypes in vivo and key histopathological features of GBM at the endpoint. Directly assessing metabolites involved in glucose fluxes on endpoint tumor samples would strengthen this correlation.

      RA-R1.C2: While we acknowledge the Reviewer’s point, there were two main limitations to implementing such validation step in our protocol: 

      (1) Since we performed dynamic experiments, at the end of each study most 2H-glucose-derived metabolites were already below their maximum concentration (or barely detectable in some cases), as depicted by the respective kinetic curves (Fig 1-D and Fig S7), and thus no longer detectable in the tissues. Importantly, DGE-DMI could guide further studies towards selecting the ideally time-point for validating different metabolite concentrations in specific brain regions.

      (2) Such validation would require sacrificing the animals with a focalized irradiator (which we did not have), to instantly halt all metabolic reactions. Only then we could collect and analyze the metabolic profile of specific brain regions, either by in vitro MS or high-resolution NMR following extraction, or by ex vivo HRMAS analysis of the intact tissue, as reported previously by some of the authors for validation of glucose accumulation in different regions of mouse GL261 tumors (Simões et al. NMRB 2010: https://doi.org/10.1002/nbm.1421). Importantly, even if we did have access to a focalized irradiator, such protocols for metabolic characterization would compromise tissue integrity and thus the histopathologic analysis performed in this study. 

      We do agree with the importance of end-point validation and therefore stress it more clearly in the revised manuscript (page 20, lines 14-16): “(…) mapping pathway fluxes alongside de novo concentrations (…) may be determinant for the longitudinal assessment of GBM progression, with end-point validation (…)”.

      (3) Genetic manipulation of key players in the metabolic pathways studied in this paper (glycolysis and mitochondrial oxidation) would offer a strong validation for the sensitivity of DGE-DMI in accurately distinguishing metabolites (lactate, glutamate-glutamine) and their dynamics.

      RA-R1.C3: Thank you for this comment, we agree. This would be particularly relevant in the context of treatment-response monitoring. While such models were not available to us (conditional spatio-temporal manipulation of metabolic pathway fluxes), we believe our results can still demonstrate this point: We previously used in vivo DGE 2H-MRS to show evidence of decreased glucose oxidation fraction (Vglx/Vlac) in GL261 tumors under acute hypoxia (FiO2=12 %) compared to regular anesthesia conditions (FiO2=31 %), consistent with the inhibition of OXPHOS due to lower oxygens tensions (Simoes et al. NIMG:Clin 2022). In the present work, enhanced glycolysis in tumors vs peritumoral brain regions was clearly observed in all the animals studied, from both cohorts, as shown in Fig 1-B and Fig S4. Moreover, the spectral background (before glucose injection) is limited to a single peak in all the voxels: basal DHO, used as internal reference for spatio-temporal quantification of glucose, glutamine-glutamate, and lactate, all de novo and extensively characterized in healthy and glioma-bearing rodent brain (Lu et al. JCBFM 2018; Zhang et al. NMR Biomed 2024, de Feyter et al. SciAdv 2018; Batsios et al ClinCancerRes 2022;  Simoes et al. NIMG:Clin 2022) and other rodent tumors (Kreis et al. Radiology 2020, Montrazi et al. SciRep 2023). We have modified the manuscript to clarify this point (page 18, lines 14-17) “(…) patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features, and genetic manipulation of glycolysis and mitochondrial oxidation pathways could be relevant to ascertain DGE-DMI sensitivity for their quantification (…)”.

      (4) Please explain more why DEG-DMI can distinguish different glucose metabolites and how accurate it is.

      RA-R1.C4: DGE-DMI is the imaging extension of our previous work based on single-voxel deuterium spectroscopy, therefore relying on the same fundamental technique and analysis pipeline but moving from a temporal analysis to a spatio-temporal analysis for each metabolite, and thus dealing with more data. Unlike conventional proton spectroscopy (1H), only metabolites carrying the deuterium label (2H) will be detected in this case, including the natural abundance DHO (~0.03%), the deuterated glucose injected and its metabolic derivatives, namely deuterated lactate and deuterated glutamate-glutamine. Due to their different molecular structures, the deuterium atoms will resonate at specific frequencies (chemical shifts, ppm) during a 2H magnetic resonance spectroscopy experiment, as illustrated in Fig 1-A. The method is fully reproducible and accurate, and has been extensively reported in the literature from high-resolution NMR spectroscopy to in vivo spectroscopic imaging of different nuclei, such as proton (1H), deuterium (2H), carbon (13C), phosphorous (31P), and fluorine (19F). Since the fundamental principles of DMI and its application to brain tumors have been very well described in the flagship article by de Feyter et al., we have now highlighted this in the manuscript: page 4 (lines 4-7), “Deuterium metabolic imaging (DMI) has been (…) demonstrated in GBM patients, with an extensive rationale of the technique and its clinical translation [18], and more recently in mouse models of patient-derived GBM subtypes (…)”.

      (5) When mapping glycolysis and mitochondrial oxidation fluxes, add a control method to compare the reliability of DEG-DMI.

      RA-R1.C5: This concern (“lack of a control method”) was also raised by the Reviewer in the section Public Reviews section, where we already address it (PR-R1.2).

      (6) If using peritumoral glutamate-glutamine recycling as a marker of invasion capacity, what would be the correct rate of the presence of secondary brain lesions?

      RA-R1.C6: While our results suggest the potential of peritumoral glutamate-glutamine recycling as a marker for the presence of secondary brain lesions, this remains to be ascertained with higher sensitivity for glutamate-glutamine detection. Therefore, we cannot make further conclusions in this regard.  

      To make this point clear, we state in different sections of the discussion: page 19 (lines 1-2), “(…) recycling of the glutamate-glutamine pool may reflect a phenotype associated with secondary brain lesions.”; and page 19 (lines 6-10), “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamateglutamine, and/or increase spatial resolution to correlate those metabolic results with histology findings (e.g in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55]).”).  

      (7) There are duplicated Vlac in Figure S3 B.

      RA-R1.C7: This was a typo that has now been corrected. Thank you.

      (8) Figure 4, it would be better to add a metabolic map of a tumor without secondary brain lesions to compare.

      RA-R1.C8: We fully agree and have modified Fig 4 accordingly, together with its legend.

      Particularly, we have included tumors C4 (without secondary lesions) vs G4 (with) for this “comparison”, since details of their histology, including the secondary lesions, are provided in Fig 2.

      (9) Full name of SNR and FID should be listed when first mentioned.

      RA-R1.C9: Agreed and modified accordingly, on pages 6-7 (lines 22-1), ”signal-to-noise-ratio (SNR)”, and page 19 (lines 5-6), “free induction decay (FID)”.

      (10) Page 2, Line 14: (59{plus minus}7 mm3) is not needed in the abstract.

      RA-R1.C10: As requested we have removed this specification from the Abstract.

      (11) Page 4, Line 22: Closing out the Introduction section with a statement on broader implications of the present work would enhance the effectiveness of the section.

      RA-R1.C11: We have added an additional sentence in this regard – pages 4-5 (lines 24-2): “Since DMI is already performed in humans, including glioblastoma patients [18], DGE-DMI could be relevant to improve the metabolic mapping of the disease.”

      (12) Define all acronyms to facilitate comprehension. For example, principal component analysis (PCR) and signal-to-noise ratio (SNR).

      R1.C12: Thank you for the comment. We have now defined all the acronyms when first used, including PCA (page 4 (line 11), “Marcheku-Pastur Principal Component Analysis (MP-PCA)”) and SNR (pages 6-7 (lines 22-1), as indicated above in comment R1.9).

      (13) Some elements within the figures have lower resolution, specifically bar graphs.

      RA-R1.C13: We apologize for this oversight. All the Figures have been revised accordingly, to correct this problem. Thank you.

      (14) Page 13, Line 8: "underly" should be spelled "underlie."

      RA-R1.C14: The typo has been corrected on page 15 (line 8), thank you.

      (15) Page 14, Line 13: "better vascular permeability" would be more effectively phrased as "increased vascular permeability."

      RA-R1.C15: This has also been corrected on page 16 (line 14), thank you.

      Reviewer #2 (Recommendations For The Authors):

      (1) I strongly suggest adding a scale bar in the histology figures.

      RA-R2.C1: Thank you for spotting our oversight! This has now been added as requested to Fig 2.

      (2) The 2H MRSI data were acquired at a nominal resolution of 2.25 x 2.27 x 2.25 mm^3, resulting in a nominal voxel volume of 11.5 uL. (In reality, this is larger due to the point spread function leading to signal bleeding from adjacent voxels.) If we estimate the volume of the tumor rim, as indicated by the histology slides, as (generously) ~ 50 um in width, 3.2 mm long (the diagonal of a 2.25 x 2.25 mm^2 square, and 2.27 mm high, we get a volume of 0.36 uL. Therefore the native spatial resolution of the 2H MRSI is at least 30 times larger than the volume occupied by the tumor rim/microenvironment. Normal tissue and tumor tissue will contribute the majority of the metabolic signal of that voxel. I feel an opposite approach could have been pursued: find out the spatial resolution needed to characterize the tumor rim based on the histology, then use a de-noising method to bring the SNR of those data to be acceptable. (this is just a thought experiment that assumes de-noising actually works to improve quantification for MRS data instead of merely cosmetically improve the data, so far the jury is still out on that, in my view).

      RA-R2.C2 – We thank the Reviewer for the detailed analysis and reply below to each point.

      RA-R2.C2.1 – spatial resolution and tumor rim: Our nominal voxel volume was indeed 11.5 uL, defined in-plane by the PSF which explains signal bleeding effects, as in any other imaging modality. The DMI raw data were Fourier interpolated before reconstruction, rendering a final in-plane resolution of 0.56 mm (0.72 uL voxel volume). The tumor rim (margin) analyzed was roughly 0.1 mm width (please note, not 0.05 mm), as explained in the methods section (page 28, line 16) and now more clearly defined with the scale bars in Fig 2. According to the Reviewer’s analysis, this would correspond to 0.1*3.2*2.27 = 0.73 uL, which we approximated with 1 voxel (0.72 uL), as displayed in Fig 3-A. Importantly, it has long been demonstrated that Fourier interpolation provides a better approximation to the ground truth compared to the nominal resolution, and even to more standard image interpolation performed after FT - see for instance Vikhoff-Baaz B et al. (MRI 2001. 19: 1227-1234), now citied in the Methods section: page 24, line 24 ([69]). While we do agree that both normal brain and tumor should contribute significantly to the metabolic signal in this relatively small region, we rely on extensive literature to maintain that despite its smoothing effect, the display resolution provides a better approximation to the ground truth and is therefore more suited than the nominal resolution for ROI analysis in this region. Still, we acknowledge this potential limitation in the Discussion: page 19, lines 6-10: “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamate-glutamine, and/or increase the nominal spatial resolution to better correlate those metabolic results with histology findings (e.g. in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55]).”

      RA-R2.C2.2 – metabolic and histologic features at the tumor rim: Furthermore, we also performed ROI analysis of lactate metabolic maps in tumor and peritumoral rim areas closely reflected regional differences in cellularity and cell density, and immune cell infiltration between the 2 tumor cohorts and across pooled cohorts, as explained in the Public Review section - PR-R1.1 – and now report in the manuscript: page 12 (lines 6-16), “GL261 tumors accumulated significantly less lactate in the core (1.60±0.25 vs 2.91±0.33 mM: -45%, p=0.013) and peritumor margin regions (0.94±0.09 vs 1.46±0.17 mM: -36%, p=0.025) than CT2A – Fig 3 A-B, Table S1. Consistently, tumor lactate accumulation correlated with tumor cellularity in pooled cohorts (R=0.74, p=0.014). Then, lower tumor lactate levels were associated with higher lactate elimination rate, k<sub>lac</sub> (0.11±0.1 vs 0.06±0.01 mM/min: +94%, p=0.006) – Fig 3B – which in turn correlated inversely with peritumoral margin infiltration of microglia/macrophages in pooled cohorts (R=-0.73, p=0.027) - Fig 3-C. Further analysis of Tumor/P-Margin metabolic ratios (Table S3) revealed: (i) +38% glucose (p=0.002) and -17% lactate (p=0.038) concentrations, and +55% higher lactate consumption rate (p=0.040) in the GL261 cohort; and (ii) lactate ratios across those regions reflected the respective cell density ratios in pooled cohorts (R=0.77, p=0.010) – Fig 3-C”; page 17 (lines 1-8), “Tumor vs peritumor border analyses further suggest that lactate metabolism reflects regional histologic differences: lactate accumulation mirrors cell density gradients between and across the two cohorts; whereas lactate consumption/elimination rate coarsely reflects cohort differences in cell proliferation, and inversely correlates with peritumoral infiltration by microglia/macrophages across both cohorts. This is consistent with GL261’s lower cell density and cohesiveness, more disrupted stromal-vascular phenotypes, and infiltrative growth pattern at the peritumor margin area, where less immune cell infiltration is detected and relatively lower cell division is expected [43]”.

      RA-R2.C2.3 – alternative method: Regarding the alternative method suggested by the Reviewer, we have tested a similar approach in another region (tumor) and it did not work, as explained the Discussion section (page 19, lines 5-6) and Fig S11. Essentially, Tensor PCA performance improves with the number of voxels and therefore limiting it to a subregion hinders the results. In any case, if we understand correctly, the Reviewer suggests a method to further interpolate our data in the spatial dimension, which would deviate even more from the original nominal resolution and thus sounds counter-intuitive based on the Reviewer’s initial comment about the latter. More importantly, we would like to remark the importance of spectral denoising in this work, questioned by the Reviewer. There are several methods reported in the literature, most of them demonstrated only for MRI. We previously demonstrated how MPPCA denoising objectively improved the quantification of DCE-2H MRS in mouse glioma by significantly reducing the CRLBs: 19% improved fitting precision. In the present study, Tensor PCA denoising was applied to DGE-DMI, which led to an objective 63% increase in pixel detection based on the quality criteria defined, unambiguously reflecting the improved quantification performance due to higher spectral quality. 

      (3) Concerns re. the metabolic model: 2g/kg of glucose infused over 120 minutes already leads to hyperglycemia in plasma. Here this same amount is infused over 30 seconds... such a supraphysiological dose could lead to changes in metabolite pool sizes -which are assumed to not change since they are not measured, and also fractional enrichment which is not measured at all. Such assumptions seem incompatible with the used infusion protocol.

      RA-R2.C3:  We understand the concern. However, the protocol was reproduced exactly as originally reported by Kreis et al (Radiology 2020) that performed the measurements in mice and measured the fraction of deuterium enrichment (f=0.6). Since we also worked with mice, we adopted the same value for our model. The total volume injected was 100uL/25g animal, and adjusted for animal weight (96uL/24g average – Table S1), as we reported before (Simões et al. NIMG:Clin 2022), which is standard for i.v. bolus administration in mice as it corresponds to ~10% of the total blood volume. This volume is therefore easily diluted and not expected to introduce significant changes in the metabolic pool sizes. Continuous infusion protocols on the other hand will administer higher volumes, easily approaching the mL range when performed over periods as large as 120 min. This would indeed be incompatible with our bolus infusion protocol. We have now clarified this in the manuscript – page 24 (line 23): “i.v. bolus of 6,6<sup>′2</sup>H<sub>2</sub>-glucose (2 mg/g, 4 µL/g injected over 30 s (…)”.

      (4) Vmax = Vlac + Vglx. This is incorrect: Vmax = Vlac.

      RA-R2.C4: Thank you for raising this concern. As indicated in RA-R2.C3, our model (Simões et al. NIMG:Clin 2022) was adapted from the original model proposed by Kreis et al. (Radiology 2020), where the authors quantified glycolysis kinetics on a subcutaneous mouse model of lymphoma, exclusively glycolytic and thus estimating the maximum glucose flux rate was from the lactate synthesis rate (Vmax = Vlac). However, we extended this model to account for glucose flux rates for lactate synthesis (Vlac) and also for glutamate-glutamine synthesis (Vglx), where Vmax = Vlac + Vglx, as explained in our 2022 paper. While we acknowledge the rather simplistic approach of our kinetic model compared to others - reported by 13C-MRS under continuous glucose infusion in healthy mouse brain (Lai et al. JCBFM 2018) and mouse glioma (Lai et al. IJC 2018) – and acknowledge this in the Discussion (page 20, lines 22-24: “(…) metabolic fluxes [estimations] through glycolysis and mitochondrial oxidation (…) could potentially benefit from an improved kinetic model simultaneously assessing cerebral glucose and oxygen metabolism, as recently demonstrated in the rat brain with a combination of 2H and 17O MR spectroscopy [62] (…)”), our Vlac and Vglx results are consistent with our previous DGE 2H-MRS findings in the same glioma models, and very aligned with the literature, as discussed in PR-R1.C2.1.

      (5) Some other items that need attention: 0.03 % is used as the value for the natural abundance of DHO. The natural abundance of 2H in water can vary somewhat regionally, but I have never seen this value reported. The highest seen is 0.015%.

      RA-R2.C5: The Reviewers is referring to the natural abundance of deuterium in hydrogen: 1 in ~6400 is D, i.e. 0.015 %. The 2 hydrogen atoms in a water molecule makes ~3200 DHO, i.e. 0.03%. Indeed the latter can have slight variations depending on the geographical region, as nicely reported by Ge et al (Front Oncol 2022), who showed a 16.35 mM natural-abundance of DHO in the local tap water of St Luis MO, USA (55500/16.35 = 1/3364 = 0.034%).

      (6) Based on the color scale bar in Figure 1, the HDO concentration appears to go as high as 30 mM. Even if this number is off because of the previous concern (HDO), it appears to be a doubling of the HDO concentration. Is this real? What would be the origin of that? No study using [6,6'-2H2]-glucose that I'm aware of has reported such an increase in HDO.

      RA-R2.C6: As explained before (RA-R2.C3 and RA-R2.C4), we based our protocol and model on Kreis et al (Radiology 2020), who reported ~10 mM basal DHO levels raising up to ~27 mM after 90min, which are well within the ~30 mM ranges we report over a longer period (132 min).

      Similar DHO levels were mapped with DGE-DMI in mouse pancreatic tumors (Montrazi et al. SciRep 2023).

      (7) "...the central spectral matrix region selected (to discard noise regions outside the brain, as well as the olfactory bulb and cerebellum)". This reads as if k-space points correspond one-toone with imaging pixels, which is not the case.

      RA-R2.C7: We rephrased the sentence to avoid such potential misinterpretation, specifically: page 25 (lines 19-21): “Each dataset was averaged to 12 min temporal resolution and the noise regions outside the brain, as well as the olfactory bulb and cerebellum, were discarded (…)”.

      (8) The use of the term "glutamate-glutamine recycling" is not really appropriate since these metabolites are not individually detected with 2H MRS, which is a requirement to measure this neurotransmitter cycling.

      RA-R2.C8: Thank you for this comment. To avoid this misinterpretation, we have now rephrased "glutamate-glutamine recycling" to “recycling of the glutamate-glutamine pool” in all the sentences, namely: page 2 (lines 14-15); page 15 (line 8); page 15 (line 8); page 19 (line 1); page 21 (line 10).

      Reviewer #3 (Recommendations For The Authors):

      (1) One major issue is the lack of underlying genetics, and therefore it is hard for readers to put the observed difference between GL261 and CT2A into context. The authors might consider perturbing the genetic and regulatory pathways on glycolysis and glutamine metabolism, repeating DGE DMI measure, in order to enhance the robustness of their findings.

      RA-R3.C1: We thank the reviewer for the helpful revision and comments. The point made here is aligned with Reviewer 1’s, addressed in RA-R1.C3; and also with our previous reply to the Reviewer, PR-R3.C1. Thus, we agree that conditional spatio-temporal manipulation of metabolic pathway fluxes would be relevant to further demonstrate the robustness of DGEDMI, particularly for treatment-response monitoring. While such models were not available to us, our previous findings seem compelling enough to demonstrate this point. Thus, we previously showed a significantly higher respiration buffer capacity and more efficient metabolic plasticity between glycolysis and mitochondrial oxidation in GL261 cells compared to CT2A (Simoes et al. NIMG:Clin 2022), which could enhance lactate recycling through mitochondrial metabolism in GL261 cells and thus explain our observations of increased glucose-derived lactate consumption rate in those tumors compared to CT2A. We have now included this in the discussion (page 17, lines 8-12): “our results suggest increased lactate consumption rate (active recycling) in GL261 tumors with higher vascular permeability, e.g. as a metabolic substrate for oxidative metabolism [44] promoting GBM cell survival and invasion [45], aligned with the higher respiration buffer capacity and more efficient metabolic plasticity of GL261 cells than CT2A [31].” Moreover, we previously showed evidence of DGE-2H MRS’ ability to detect decreased glucose oxidation fraction (Vglx/Vlac) in GL261 tumors under acute hypoxia (FiO2=12 %) compared to regular anesthesia conditions (FiO2=31 %), consistent with the inhibition of OXPHOS due to lower oxygens tensions (Simoes et al. NIMG:Clin 2022).

      (2) Is increased resolution possible for DGE DMI to correlate with histological findings?

      RA-R3.C2: The resolution achieved with DGE DMI, or any other MRI method, is limited by the signal-to-noise ratio (SNR), which in turn depends on the equipment (magnetic field strength and radiofrequency coil), the pulse sequence used, and post-processing steps such as noiseremoval. Thus, increased resolution could be achieved with higher magnetic field strengths, more sensitive RF coils, more advanced DMI pulse sequences, and improved methods for spectral denoising if available. We have used the best configuration available to us and discussed such limitations in the manuscript, including now a few modifications to address the Reviewer’s point more clearly – page 19 (lines 6-10): “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamateglutamine, and/or increase the nominal spatial resolution to better correlate those metabolic results with histology findings (e.g in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55])”.

      (3) The authors might consider measuring the contribution of stromal cells and infiltrative immune cells in the analysis of DGE DMI data, to construct a more comprehensive picture of the microenvironment.

      RA-R3.C3: Thank you for this important point. We now added additional Iba-1 stainings of infiltrating microglia/macrophages, for each tumor, as suggested by the Reviewer; stromal cells would be more difficult to detect and we did not have access to a validated staining method for doing so. Our new data and results - now included in Fig 2B – indicate significantly higher levels of Iba-1 positive cells in CT2A tumors compared to GL261, which are particularly noticeable in the periphery of CT2A tumors and consistent with their better-defined margins and lower infiltration in the brain parenchyma. This has been explained more extensively in PRR1.1.

      (4) Additional GBM models with improved understanding of the genetic markers would serve as an optimal validation cohort to support the potential clinical translation.

      RA-R3.C4: We agree with the Reviewer and direct again to RA-R1.3, where we already addressed this suggestion in detail and introduced modifications to the manuscript accordingly.

    1. eLife Assessment

      Seminal plasma is a crucial component of semen that can affect sperm capacitation. However, the role of seminal plasma components, including fatty acids, in sperm function and fertility is poorly understood. In this important study, the authors provide a solid evidence of the testosterone-induced metabolic shift in the epithelial cells of seminal vesicle to support an fatty acid synthesis and also describe the potential effect of oleic acid on sperm motility.

    2. Reviewer #1 (Public review):

      Summary:

      In this revised report, Yamanaka and colleagues investigate a proposed mechanism by which testosterone modulates seminal plasma metabolites in mice. The authors identify oleic acid as a particularly important metabolite, derived from seminal vesicle epithelium, that stimulates linear progressive motility in isolated cauda epidydimal sperm in vitro. The authors provide additional experimental evidence of a testosterone dependent mechanism of oleic acid production by the seminal vesicle epithelium.

      Strengths:

      Often, reported epidydimal sperm from mice have lower percent progressive motility compared with sperm retrieved from the uterus or by comparison with human ejaculated sperm. The findings in this report may improve in vitro conditions to overcome this problem, as well as add important physiological context to the role of reproductive tract glandular secretions in modulating sperm behaviors. The strongest observations are related to the sensitivity of seminal vesicle epithelial cells to testosterone. The revisions include addition of methodological detail, modified language to reflect the nuance of some of the measurements, as well as re-performed experiments with more appropriate control groups. The findings are likely to be of general interest to the field by providing context for follow-on studies regarding the relationship between fatty acid beta oxidation and sperm motility pattern.

      Weaknesses:

      Support for the proposed mechanism is stronger in this revised report than in the previous report, but there are many challenges in measuring sperm metabolism and its direct relationship with motility patterns. This study is no exception and largely relies on correlations between various experiments in lieu of direct testing. Additionally, the discussion is framed from a human pre-clinical perspective, and it should be noted that the reproductive physiology between mice and humans is very different.

    3. Reviewer #2 (Public review):

      Using a combination of in vivo studies with testosterone-inhibited and aged mice with lower testosterone levels as well as isolated mouse and human seminal vesicle epithelial cells the authors show that testosterone induces an increase in glucose uptake. They find that testosterone induces a difference in gene expression with a focus on metabolic enzymes. Specifically, they identify increased expression of enzymes regulating cholesterol and fatty acid synthesis, leading to increased production of 18:1 oleic acid. The revised version strengthens the role of ACLY as the main regulator of seminal vesicle epithelial cell metabolic programming. 18:1 oleic acid is secreted by seminal vesicle epithelial cells and taken up by sperm, inducing an increase in mitochondrial respiration. The difference in sperm motility and in vivo fertilization in the presence of 18:1 oleic acid and the absence of testosterone, however, is small. Additional experiments should be included to further support that oleic acid positively affects sperm function.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this report, the authors investigated the effects of reproductive secretions on sperm function in mice. The authors attempt to weave together an interesting mechanism whereby a testosterone-dependent shift in metabolic flux patterns in the seminal vesicle epithelium supports fatty acid synthesis, which they suggest is an essential component of seminal plasma that modulates sperm function by supporting linear motility patterns.

      Strengths:

      The topic is interesting and of general interest to the field. The study employs an impressive array of approaches to explore the relationship between mouse endocrine physiology and sperm function mediated by seminal components from various glandular secretions of the male reproductive tract.

      Thank you for your positive evaluation of our study's topic and approach. We are pleased that you found our investigation into the effects of reproductive secretions on sperm function to be of general interest to the field. We appreciate your positive feedback on the diverse methods we employed to explore this complex relationship.

      Weaknesses:

      Unfortunately, support for the proposed mechanism is not convincingly supported by the data, and the experimental design and methodology need more rigor and details, and the presence of numerous (uncontrolled) confounding variables in almost every experimental group significantly reduce confidence in the overall conclusions of the study.

      The methodological detail as described is insufficient to support replication of the work. Many of the statistical analyses are not appropriate for the apparent designs (e.g. t-tests without corrections for multiple comparisons). This is important because the notion that different seminal secretions will affect sperm function would likely have a different conclusion if the correct controls were selected for post hoc comparison. In addition, the HTF condition was not adjusted to match the protein concentrations of the secretion-containing media, likely resulting in viscosity differences as a major confounding factor on sperm motility patterns.

      We appreciate you highlighting concerns regarding our weak points and apologize for our unclear description. We revised the manuscript to be as rigorous and detailed as possible. In addition, some experimental designs were changed to simpler direct comparisons, and additional experiments were conducted (New Figure 1A-F, lines 103-113). We have made our explanations more consistent with the provided data, which includes further experimentation with additional controls and larger sample sizes to increase the reliability of the findings.

      To address the multiple testing problem, a multiple testing correction was made by making the statistical tests more stringent (Please see Statistical analysis in the Methods section and the Figure legends). Based on different statistical methods, the analysis results did not require significant revisions of the previous conclusions.

      Because the experiments on mixing extracts from the seminal vesicles were exploratory, we planned to avoid correcting for multiple comparisons. Repeating the t-test could lead to a Type I error in some results, so we apologize for not interpreting and annotating them. In the revised version, we removed the dataset for experiments on mixing extracts from the seminal vesicles and prostate, and we changed the description to refer to the clearer dataset mentioned above.

      The viscosity of the secretion-containing medium was measured with a viscometer, confirming that secretions did not significantly affect the viscosity of the solution. In addition, as the reviewer pointed out, we addressed the issue that the HTF condition could not be used as a control because of the heterogeneity in protein concentration (New Fig.1G, lines 110-111).

      Overall, we concluded that seminal vesicle secretion improves the linear motility of sperm more than prostate secretion.

      There is ambiguity in many of the measurements due to the lack of normalization (e.g. all Seahorse Analyzer measurements are unnormalized, making cell mass and uniformity a major confounder in these measurements). This would be less of a concern if basal respiration rates were consistently similar across conditions and there were sufficient independent samples, but this was not the case in most of the experiments.

      We apologize for the many ambiguities in the first manuscript. Cell culture experiments in the paper, including the flux analysis, were performed under conditions normalized or fixed by the number of viable cells. The description has also been revised to emphasize that the measurement values are standardized by cell count (lines 183-185, 189-190, 194-197). We emphasize that testosterone affects metabolism under the same number of viable cells (New Fig.4). This change in basal respiration is thought to be due to the shift in the metabolic pathway of seminal vesicle epithelial cells to a “non-normal TCA cycle” in which testosterone suppresses mitochondrial oxygen consumption, even under aerobic conditions (New Figs.3, 4, 5).

      The observation that oleic acid is physiologically relevant to sperm function is not strongly supported. The cellular uptake of 10-100uM labeled oleic acid is presumably due to the detergent effects of the oleic acid, and the authors only show functional data for nM concentrations of exogenous oleic acid. In addition, the effect sizes in the supporting data were not large enough to provide a high degree of confidence given the small sample sizes and ambiguity of the design regarding the number of biological and technical replicates in the extracellular flux analysis experiments.

      Thank you for your important critique. As you noted, the too-high oleic acid concentration did not reflect physiological conditions. Therefore, we changed the experimental design of an oleic acid uptake study and started again. We added an in vitro fertilization experiment corresponding to the functional data of exogenous oleic acid at nM concentrations (New Fig.7J,K, Lines 274-282).

      For the flux data to determine the effect of oleic acid on sperm metabolism, we have indicated in the text that the data were obtained based on eight male mice and two technical replicates. Pooled sperm isolated and cultured from multiple mice were placed in one well. The measurements were taken in three different wells, and each experiment was repeated four times. We did not use the extracellular flux analyzers XFe24 or XFe96. The measurements were also repeated because the XF HS Mini was used in an 8-well plate (only a maximum of 6 samples at a run since 2 wells were used for calibration).

      Overall, the most confident conclusion of the study was that testosterone affects the distribution of metabolic fluxes in a cultured human seminal vesicle epithelial cell line, although the physiological relevance of this observation is not clear.

      We thank the comments that this finding is one of the more robust conclusions of our study. Below we have written our thoughts on the physiological relevance of the observation results and our proposed revisions. In the mouse experiments, when the action of androgens was inhibited by flutamide, oleic acid was no longer synthesized in the seminal vesicles. The results of the experiments using cultured seminal vesicle epithelial cells showed that oleic acid was not being synthesized because of a change in metabolism dependent on testosterone. We have also added IVF data on the effects of oleic acid on sperm function (New Fig.7 and Supplementary Fig. 5, lines 274-282).<br /> As you can see, we have obtained consistent data in vitro and in vivo in mice. Our data also showed that the effects of testosterone on metabolic fluxes in vitro are similar in mouse and human seminal vesicle epithelial cells (New Fig.9). Therefore, it can be assumed that a decrease in testosterone levels causes abnormalities in the components of human semen. However, the conclusion was overestimated in the original manuscript, so we changed the wording as follows: It could be assumed that a decrease in testosterone levels causes abnormalities in the components of human semen. (lines 422-423)

      In the introduction, the authors suggest that their analyses "reveal the pathways by which seminal vesicles synthesize seminal plasma, ensure sperm fertility, and provide new therapeutic and preventive strategies for male infertility." These conclusions need stronger or more complete data to support them.

      We appreciate your comments about the suggestion presented in the introduction.

      We also removed our conclusions regarding treatment and prevention strategies for male infertility (lines 96-98). We wanted to discuss our findings not conclusively but as future applications that could result from further research based on our initial findings.

      The last sentence of the introduction has been revised to tone down these assertions as follows: These analyses revealed that testosterone promotes the synthesis of oleic acid in seminal vesicle epithelial cells and its secretion into seminal plasma, and the oleic acid ensures the linear motility and fertilization ability of sperm.

      We are grateful for your suggestions, which have prompted us to refine our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Using a combination of in vivo studies with testosterone-inhibited and aged mice with lower testosterone levels, as well as isolated mouse and human seminal vesicle epithelial cells, the authors show that testosterone induces an increase in glucose uptake. They find that testosterone induces differential gene expression with a focus on metabolic enzymes. Specifically, they identify increased expression of enzymes that regulate cholesterol and fatty acid synthesis, leading to increased production of 18:1 oleic acid.

      Strength:

      Oleic acid is secreted by seminal vesicle epithelial cells and taken up by sperm, inducing an increase in mitochondrial respiration. The difference in sperm motility and in vivo fertilization in the presence of 18:1 oleic acid and the absence of testosterone is small but significant, suggesting that the authors have identified one of the fertilization-supporting factors in seminal plasma.

      Thank you for your positive comments regarding our work on the role of testosterone in regulating metabolic enzymes and the subsequent production of 18:1 oleic acid in seminal vesicle epithelial cells. We are pleased that the strength of our findings, particularly identifying oleic acid as a factor influencing sperm motility and mitochondrial respiration, has been recognized.

      Weaknesses:

      Further studies are required to investigate the effect of other seminal vesicle components on sperm capacitation to support the author's conclusions. The author's experiments focused on potential testosterone-induced changes in the rate of seminal vesicle epithelial cell glycolysis and oxphos, however, provide conflicting results and a potential correlation with seminal vesicle epithelial cell proliferation should be confirmed by additional experiments.

      Thank you very much for your valuable criticism. Although we fully agree with your comment, conducting experiments to investigate the effects of other seminal vesicle components on the fertilization potential of sperm would be a great challenge for us. This is because it has taken us the last three years to identify oleic acid as a key factor in seminal plasma. We are considering a follow-up study to explore the effect of other seminal vesicle components on sperm capacitation. Therefore, we have revised the Introduction and conclusions to tone down our assertions .

      The revised manuscript also includes additional data showing a correlation between changes in metabolic flux and the proliferation of seminal vesicle epithelial cells using shRNA. As a result, it was shown that cell proliferation is promoted when mitochondrial oxidative phosphorylation is promoted by ACLY knockdown (New Fig.8D, lines 303-305). This shows a close relationship between the metabolic shift in seminal vesicle epithelial cells and cell proliferation. The revised manuscript includes an interpretation and discussion of these results (lines 369-379).

      We are grateful for your suggestions, which have prompted us to refine our manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Male fertility depends on both sperm and seminal plasma, but the functional effect of seminal plasma on sperm has been relatively understudied. The authors investigate the testosterone-dependent synthesis of seminal plasma and identify oleic acid as a key factor in enhancing sperm fertility.

      Strengths:

      The evidence for changes in cell proliferation and metabolism of seminal vesicle epithelial cells and the identification of oleic acid as a key factor in seminal plasma is solid.

      Weaknesses:

      The evidence that oleic acids enhance sperm fertility in vivo needs more experimental support, as the main phenotypic effect in vitro provided by the authors remains simply as an increase in the linearity of sperm motility, which does not necessarily correlate with enhanced sperm fertility.

      We appreciate the positive feedback on the solid evidence of cell proliferation and metabolic changes in seminal vesicle epithelial cells and the identification of oleic acid as an important factor in seminal plasma. We fully agree with the assessment that the evidence linking oleic acid and increased sperm fertility in vivo needs further experimental support. To address this concern, we changed the experimental design of an oleic acid study and started again to be more physiological regarding the effect of oleic acid on fertility outcomes, increased the replicates of artificial insemination, and added in vitro fertilization assessments (New Fig.7 and supplementary Fig.5, lines 274-282). The revised manuscript describes these experiments and discusses the association between oleic acid and fertility.

      We are grateful for your suggestions, which have prompted us to refine our manuscript.

      Recommendations for the authors:

      Reviewing Editor's note:

      As you can see from the three reviewers' comments, the reviewers agree that this study can be potentially important if major concerns are adequately addressed. The major concern common to all the reviewers is the incomplete mechanistic link between the physiological androgen effect on the production of oleic acid and its effect on sperm function. Statistical analyses need more rigor and consideration of other important capacitation parameters are needed to address these concerns and to improve the manuscript to support the current conclusions.

      Thank you for summarizing the reviewers' feedback and for your insights regarding the major concerns raised. We appreciate the reviewers' understanding of the potential importance of our work and have addressed the issues highlighted to strengthen the manuscript. We believe these changes will improve the quality of the manuscript and provide a clearer and more complete understanding of the role of androgens and oleic acid in sperm function.

      Reviewer #1 (Recommendations For The Authors):

      The following comments are provided with the hope of aiding the authors in improving the alignment between the data and their interpretations.

      Thank you for allowing us to strengthen our manuscript with your valuable comments and queries. We have made our best efforts to reflect your feedback.

      Major Comments:

      (1) The methodological detail is not sufficient to reproduce the work. For example:<br /> a. Manufacturer protocols are referred to extensively. These protocols are neither curated nor version-controlled. Please consider describing the underlying components of the assays. If information is not available, please consider providing catalog numbers and lot numbers in the methods (if appropriate for journal style requirements).

      We appreciate this suggestion, which we believe is important to ensure reproducibility. We described the catalog number in our Methodology and included as much information as possible.

      b. Please consider describing the analyses in full, with consideration given to whether blinding was part of the design. For example- line 492: "apoptotic cells were quantified using ImageJ". How was this quantified? How were images pre-processed? Etc.

      Although blinding was not performed, experiments and analyses based on Fisher's three principles were conducted to eliminate bias (lines 549-552). In order to avoid false-positive or false-negative results, it is clearly stated that tissue sections treated with DNAse were used as positive controls, and tissue sections without TdT were used as negative controls for apoptosis. We have added detailed quantification methods (lines 544-546).

      c. Please consider providing versions of all acquisition and analysis software used.

      We have added software version information in Materials and Methods.

      (2) Please consider revisiting the statistical analyses. Many of the analyses don't seem appropriate for the design. For example, the use of a t-test with multiple comparisons for repeated measures design in Figure 2 and the use of t-test for two-factor design in Figure 8. etc.

      To address the multiple testing issues, the statistical methodology was changed to a more rigorous one. Details are given in the Statistical analysis in the Methods section and the Figure legends.

      (3) The increase in % LIN in Figure 1 may be confounded by differences in viscosity between HTF and the fluid secretion mixtures. For this reason, HTF may not be an appropriate control for the ANOVA post hoc analysis. HTF protein was not adjusted to the same concentration as the secretion mixtures, correct? Ultimately, it does not appear that there would be a significant statistical effect of the different fluid mixtures if appropriate statistical comparisons were made. This detracts from the notion that the secretions impact sperm function.

      (4) Figure 1, the statistical analysis in the legend suggests that the experiments were analyzed with a t-test. Were corrections made for multiple comparisons in B-D? An ANOVA would probably be more appropriate.

      We used a viscometer to measure the viscosity of a solution of prostate and seminal vesicle secretions adjusted to a protein concentration of 10 mg/mL. The results showed that the secretions did not cause any significant viscosity changes (New Fig.1G, Lines 110-111).

      As you pointed out, the protein levels in the HTF medium and the secretion mixture are not adjusted to the same concentration. In addition, the original manuscript was not a controlled experiment because the two factors, seminal vesicle and prostate extracts, were modified. Therefore, to investigate the effect of prostate and seminal vesicle secretions on sperm motility, we modified the experimental design to directly compare the effects of the two groups: seminal vesicle and prostate extracts (New Fig.1A-G, lines 101-113). To show the sperm quality used in this study, motility data from sperm cultured in the HTF medium are presented independently in New Supplemental Fig.1A.

      (5) Additionally in Figure 1, there is no baseline quality control data to show that there are no intrinsic differences between sperm sampled from the two treatment groups. So baseline differences in sperm quality/viability remain a potential confounder.

      We thank you for this important point. Epididymal sperm were collected from healthy mice. We recovered only the seminal vesicle secretions from the flutamide-treated mice to pursue its role in the accessory reproductive glands, since testosterone targets the testes and accessory reproductive organs. So, there was no qualitative difference between the epididymal sperm before treatment. Nevertheless, incubation with seminal vesicle secretion for one hour altered the sperm motility pattern and in vivo fertilization results. Sperm function was altered by seminal vesicle secretion in a short period of culture time. We apologize for the confusion, and we have revised the text and figure to carry a clearer message (lines 128-132).

      (6) Figure 1E, did the authors confirm that flutamide-treated mice had decreased serum androgens? How often were mice treated with flutamide? This is important because flutamide has a relatively short half-life and is rapidly metabolized to inert hydroxyflutamide.

      Serum testosterone levels were unchanged. Flutamide was administered every 24 hours for 7 consecutive days. Although there was no change in blood testosterone levels (New Supplemental Fig.1B), a decrease in the weight of the seminal vesicles, prostate, and epididymis was confirmed. This is thought to be due to the pharmacological activity of flutamide.

      (7) Figure 1H, the meaning of 'relative activity of mitochondria' isn't clear. JC-1 does not measure 'activity'. A decreased average voltage potential across the inner mitochondrial membrane may indicate that more of the sperm from the flutamide group were dead. Additionally, J-aggregates are slow to form, generally requiring long incubation periods of at least 90 minutes or more. Additional positive and negative controls for predictable mitochondrial transmembrane voltage potential polarization states would have improved the quality of this experiment.

      Thank you for pointing this out. We have replaced the relative activity of mitochondria with high mitochondrial membrane potential (New Fig.1M, lines 125-128). Actually, it is thought that the sperm cultured in seminal vesicle secretions from mice that had been administered flutamide died because the motility of the sperm was also significantly reduced. Since antimycin reduces mitochondrial membrane potential, we have added an experiment in which 10 µM antimycin-treated sperm were used as a control to confirm that the JC-1 reaction is sensitive to changes in membrane potential.

      (8) Figure 4, the extracellular flux data appear to be unnormalized. The Seahorse instruments are extremely sensitive to the mass and uniformity of the cells at the bottom of the well. This may be a significant confounder in these results. For example, all of the observed differences between groups could simply be a product of differential cell mass, which is in line with the reduced growth potential of testosterone-treated cells indicated by the authors in the results section.

      We thank you for this important point. After correcting for cell viability, we seeded the same number of viable cultured cells into wells between experimental groups before measuring them in the flux analyzer. There were no significant differences in survival rates in all experiments. As a result, an increase in glucose-induced ECAR and a suppression of mitochondrial respiration were observed. We would like to emphasize that this difference based on metabolic data does not imply a reduction in the growth potential of the cells due to testosterone treatment.

      We described that these measurements are normalized based on cell count and viability (lines 184, 190, 195).

      (9) How did the authors know that the isolated mouse primary cells were epithelial cells? Was this confirmed? What was the relative sample purity?

      The cells were labeled with multiple epithelial cell markers (cytokeratin) and confirmed using immunostaining and flow cytometry. The percentage of cells positive for epithelial cell markers was approximately 80%. A stromal cell marker (vimentin) was also used to confirm purity, but only a few percent of cells were positive. The contaminating cell type was considered to be mainly muscle cells because the gene expression levels of muscle cell markers verified by RNA-seq were relatively high.

      (10) It is misleading to include the lactate/pyruvate media measurements in the middle of the figure in Figure 4 D and E because it seems at first glance like these measurements were made in the seahorse media but they are completely unrelated. Additionally, these measures are not normalized and are sensitive to confounding differences in cell viability, seeding density, mass, etc.

      Thank you for pointing this out. We have placed the lactate and pyruvate measurement graphs after the flux data of ECAR. We noted that these measurements are normalized based on cell count and viability (lines 189-190). The doubling time of seminal vesicle epithelial cells was approximately 3 days, and testosterone inhibited cell proliferation. Therefore, the seeding concentration of cells was increased 4-fold in the testosterone-treated group compared to the control, and experiments were conducted to ensure that the confluency at the time of measurement after 7 days of culture was comparable between groups.

      (11) The flux analyzer assays sold by Agilent have many ambiguities and problems of interpretation. Unfortunately, Agilent's interest in marketing/sales has outpaced their interest in scientific rigor. Please consider revising some of the language regarding the measurements. For example, 'ATP production rate' is not directly measured. Rather, oligomycin-sensitive respiration rate is measured. The conversion of OCR to ATP production rate is an estimation that depends on complex assumptions often requiring additional testing and validation. The same is true for other ambiguous terms such as 'maximal respiration' referring to FCCP uncoupled respiration, and glycolytic rate- which is also not measured directly. If the authors are interested in a more detailed description of the problems with Agilent's interpretation of these assays please see the following reference (PMID: 34461088).

      Thank you for your critical criticism and thoughtful advice, as well as for sharing the excellent reference. We agree with you on the flux analyzer ambiguities and data interpretation problems. The description of the measured values has been revised as follows.

      We have replaced the “ATP production rate” with the “oligomycin-sensitive respiratory rate.” Similarly, we have replaced “maximal respiration” with “FCCP-induced unbound respiration.” (lines 197-202) We chose not to deal with the conversion of OCR to ATP production rate because it is outside the scope of interest in our study.

      Avoid using the term "glycolytic capacity". We use “Oligomycin-sensitive ECAR.” (line 186) We recognize that the ECARs measured in this study reflect experimental conditions and may not fully represent physiological glycolytic flux in vivo. So, the main section includes a data set of glucose uptake studies to emphasize the significance of the changes obtained with the flux analyzer assay. (New Fig.6, lines 230-254)

      Figure 6, it's not surprising to see the accumulation of labeled oleic acid in the cells, however, this does not mean that oleic acid is participating in normal metabolic processes. Oleic acid will have detergent effects at high (uM) concentrations. The observation that sperm 'take up' OA at 10-100 uM concentrations should also be validated against sperm function the health of the cells is very likely to be negatively impacted. Additionally, no apparent accumulation is noted in the fluorescence imaging at 1uM, but the authors insinuate that uptake occurs at low nM concentrations. The effects in Figure 6D-F are nominal at best and are likely a result of the small sample sizes.

      Thank you for your good suggestion. We agree with the reviewer that high concentrations of oleic acid had a detergent effect. To improve the consistency of functional data and observations, oleic acid uptake tests were performed under the same concentration range as the sperm motility tests (New Fig.7A-C). The oleic acid concentration at this time was calculated regarding the oleic acid concentration in seminal fluid recovered from mice as detected by GCMS to reflect in vivo conditions.

      Epididymal sperm were incubated with fluorescently labeled oleic acid and observed after quenching of extracellular fluorescence. Fluorescent signals were detected selectively in the midpiece of the sperm. The fluorescence intensity of sperm quantified by flow cytometry increased significantly in a dose-dependent manner (New Fig.7A-C, lines 261-264).

      Furthermore, increasing the sample size did not change the trend of the sperm motility data. Although the effect size of oleic acid on sperm motility was small (New Fig.7D-G, lines 265-268), an improvement in fertilization ability was observed both in vitro (IVF) and in vivo (AI) (New Fig.7J-L, lines 274-282, 286-291). We conclude that the effect of oleic acid on sperm is of substantial significance. These data and interpretations have been revised in the text in the Results section.

      (12) Figure 6H, I applaud the authors for attempting intrauterine insemination experiments to test their previous findings. That said, there is no supporting data included to show that the sperm from the treatment groups had comparable starting viability/quality. Additionally, it is difficult to tell if the results are due to the small sample sizes and particularly the apparent outlier in the flutamide-only group.

      Thanks for the praise and comments for improvement. As we answered in your comment #5 above, the epididymal sperm was collected from healthy mice. Therefore, there is no qualitative difference in the epididymal sperm before treatment. This is described in the figure legend (lines 1130-1131). We apologize again for this complication. We also more than doubled the number of replications of the experiment. The impact of the outlier would have been minimal.

      (13) One final question related to Figure 6H: how did the authors know they were retrieving all of the possible 2-cell embryos from the uterus? Perhaps the authors could provide the raw counts of unfertilized eggs and 2-cell embryos so we can see if there were differences between the mice.

      We retrieved the pronuclear stage embryos from the fallopian tubes. It is not certain whether all embryos were recovered. Therefore, we added the number of embryos in the graph and in the supplementary data.

      (14) Figure 7 has the same seahorse assay normalization problem as mentioned earlier. Without normalization, it is difficult to tell if the effects are simply due to differences in cell mass. Were the replicates indicated in the graphs run on the same plate? If so, it would be much more convincing to see a nested design, with technical replicates within plates, and additional replicates run on separate plates.

      As we answered in your comment #8 above, these measurements were normalized based on sperm count. This has been corrected to be noted in the text and the figure legend (lines 1123-1124).

      Pooled sperm isolated and cultured from multiple mice were placed in one well. The measurements were taken in three different wells, and each experiment was repeated four times. We did not use the extracellular flux analyzers XFe24 or XFe96. The measurements were also repeated because the XF HS Mini was used in an 8-well plate (only a maximum of 6 samples at a run since 2 wells were used for calibration).

      (15) The statistical test in Figures 8E and F described in the legend is inappropriate (t-test), this appears to be a two-factor design.

      Thank you for pointing this out. Differences between groups were assessed using a two-way analysis of variance (ANOVA). When the two-way ANOVA was significant, differences among values were analyzed using Tukey's honest significant difference test for multiple comparisons.

      (16) The data in Figure 8 are interesting, and the effects appear to be a little more consistent compared with the mouse primary cells, potentially due to cell uniformity. However, the data are unnormalized, causing significant ambiguity, and there are no measures of cell viability to determine if the effects are due to cell death (or at least relative cell mass).

      As we answered in your comments #8 and #14 above, these measurements were normalized based on cell count and viability. This has been corrected to be noted in the figure legend (lines 1185-1186).

      Minor Comments:

      (1) The section title indicating the beginning of the results section is missing.

      A section title has been added to indicate the beginning of the results section.

      (2) There were several typos and confusingly worded statements throughout. Please consider additional editing.

      We used a proofreading service and corrected as much as possible.

      (3) In the introduction, a brief description of seminal fluid physiology is provided, but the reference is directed toward human physiology. Given that the research is performed solely in the mouse, a brief comparative description of mouse physiology would be helpful. For example, what is the role of mouse seminal fluid in the formation of the mating plug? What are the implications of the relative size disparity in seminal vesicles in mice versus humans? Etc.

      The third paragraph of the introduction has been revised (lines 57-60).

      Reviewer #2 (Recommendations For The Authors):

      Thank you for allowing us to strengthen our manuscript with your valuable comments and queries. We have made our best efforts to reflect your feedback.

      (1) The abstract is confusing and partly misleading and should be revised to more clearly and accurately summarize the study.

      The abstract was revised to be clearer and more accurate (lines 20-34).

      (2) The introduction should be revised to more accurately describe the sperm life cycle. Spermatogenesis, per definition, for example, exclusively takes place in the testis, sperm do not gain fertilization competence in the epididymis, sperm isolated from the epididymis cannot fertilize an oocyte unless in vitro capacitated, etc. In the last paragraph the connection between changes in fructose and citrate concentration, sperm metabolism and testicular-derived testosterone and AR remain unclear.

      The introduction was revised to be clearer and more accurate (lines 44-45).

      Citric acid and fructose are chemical components that are the subject of biochemical testing and are commonly used as semen testing items for humans and livestock. This is because the secretory function of the prostate and seminal vesicles is dependent on androgens. The measurement of citric acid and fructose concentrations in semen is routinely used to indicate testicular androgen production function (ISBN: 978-1-4471-1300-3, 978 92 4 0030787).

      (3) Throughout the manuscript the concept of (in vitro) capacitation is missing. Mixing sperm with seminal plasma is not the only way to achieve sperm that can fertilize the oocyte. Since media containing bicarbonate and albumin is the standard procedure in the field to capacitate epididymal mouse sperm rein vitro, the manuscript would gain value from a comparison between the effect of seminal plasma and in vitro capacitating media. Interesting readouts in addition to motility would i.e. be sAC activation, PKA-substrate phosphorylation, and acrosomal exocytosis.

      Thank you for pointing out this important point. As the reviewer points out, fertilization can be achieved in artificial insemination and in vitro fertilization using epididymal sperm which have not been exposed to seminal plasma. This has historically led to an underestimation of the role of accessory reproductive glands, such as the prostate and seminal vesicles. However, it has been reported that the removal of seminal vesicles in rodents decreases the fertilization rate after natural mating. This has been shown to be due to multiple factors affecting sperm motility rather than factors involved in plug formation (PMID: 3397934), but details of these factors and the whole picture of the role of the accessory glands were not known. This led us to become interested in the effects of sperm plasma on sperm other than fertilization and led us to begin research on the role of the accessory glands that synthesize sperm plasma.

      Early in our study, we found that simply exposing sperm to seminal vesicle extracts for 1 hour before IVF dramatically reduced fertilization rates, even in HTF medium containing bicarbonate and albumin. The experiment was designed on the assumption that seminal plasma contains factors that inhibit sperm from acquiring fertilizing ability. Therefore, we conducted experiments using modified HTF without albumin to avoid unintended motility patterns.

      However, we also respect the reviewer's opinion, and we have added our preliminary data related to IVF (New supplementary Fig.5).

      (4) In the introduction and throughout the manuscript it is unclear what the authors mean by "linear motility". An increase in VSL doesn't mean that the sperm swim in a more linear or straight way, or even that the sperm are 'straightened', it means that they swim faster from point A to point B. Do the authors mean progressive or hyperactivated motility? Please clarify.

      For all conditions tested the authors should follow the standard in the field and include the % of motile, progressively motile, and hyperactivated sperm.

      Thank you for pointing this out. We appreciate your feedback regarding the terminology. In our manuscript, "linear motility" refers to the degree to which sperm move in a straight line. We have clarified this by explaining that VSL (Straight-Line Velocity) and LIN (Linearity) are used to quantify and describe linear motility in sperm analysis: Higher VSL values indicate more direct, linear movement. A higher LIN value indicates a straighter path, thus representing greater linear motility. These terms have been standardized, and explanations have been added to the main text (lines 111-113).

      In response to your suggestion, we have included the percentage of motility and progressive motility for all conditions tested. However, since the experiment was performed using modified HTF without albumin, we have decided not to report the percentage of hyperactivation to avoid confusion.

      (5) Did the authors confirm that the injection of flutamide decreases androgen levels? That control needs to be included in the experiment to validate the conclusion.

      Injection of flutamide did not reduce androgen levels (see reviewer #1, comment 6). This is because flutamide's mechanism of action is based on antagonizing androgen and inhibiting its binding to the androgen receptor (New Fig.2A).

      (6) The role of mitochondrial activity in sperm progressive motility is still under investigation. PMID: 37440924 i.e. showed that inhibition of the ETC does not affect progressive but hyperactivated motility. The authors should either include additional experiments to confirm the correlation between mitochondrial activity and sperm progressive motility or tone down that conclusion.

      We have previously shown that treatment with D-chloramphenicol, an inhibitor of mitochondrial translation, significantly reduced sperm mitochondrial membrane potential, ATP levels, and linear motility (PMID: 31212063). Also, in the previous manuscript, we did not address progressive motility or hyperactivated motility in our analysis. We have chosen to discuss the effect of mitochondrial activity on linear motility rather than on progressive motility and hyperactivation of sperm.

      Was mitochondrial activity also altered in epididymal sperm incubated with and without seminal plasma or in aged mice?

      The mitochondrial membrane potential of epididymal sperm cultured with seminal vesicle extract (SV) was higher than that of epididymal sperm cultured without seminal vesicle extract (without SV: 67.3 ± 0.8%, with SV: 83.4 ± 1.8%). On the other hand, the mitochondrial membrane potential of epididymal sperm cultured with seminal vesicle extract recovered from aged mice was decreased (SV from aged: 60.3 ± 2.7%). It should be noted that the epididymal spermatozoa used in these experiments were healthy individuals, different from those from which seminal vesicle extracts were collected. (See also the response to reviewer 1's comment #5.)

      (7) The quality of the provided images showing AR, Ki67, and TUNEL staining should be improved or additional images should be included. Especially the AR staining is hard to detect in the provided images. The authors should also include a co-staining between AR and vesicle epithelial cells. That epithelial cells are multilayered does not come across in the pictures provided.

      We apologize for any inconvenience caused. The image has been replaced with one of higher resolution. The multilayered structure of the epithelial cells will also be seen.

      For the 12-month-old mice, an age-matched control should be included to support the authors' conclusion.

      To clarify the seminal vesicle changes associated with aging, we included images of 3-month-old mice as controls (New Supplementary Fig.2D).

      Overall, the rationale for the experiment does not become clear. How are the amount of seminal vesicle epithelial cells, testosterone, and AR expression connected to seminal plasma secretions? Why is it a disadvantage to have proliferating seminal vesicle epithelial cells? How is proliferation connected to the proposed switch in metabolic pathway activity?

      We have added some explanations and supporting data to the manuscript (New Fig.8D, lines 303-305, 315-319, 369-379). Cell proliferation stopped when the metabolic shift occurred, redirecting glucose toward fatty acid synthesis. Fatty acid synthesis is an important function of the seminal vesicle, and in the presence of testosterone, fatty acid synthesis enhancement and arrest of proliferation occur simultaneously. The connection between metabolism and cell proliferation was further demonstrated when ACLY was knocked down by shRNA, which stopped fatty acid synthesis and released the proliferative arrest induced by testosterone, allowing the cells to proliferate again. However, we do not know what effects occur when cell proliferation is stopped.

      (8) The experiments provided for glycolysis and oxphos are inconsistent and insufficient to support the authors' conclusion that testosterone shifts glycolytic and oxphos activity of seminal vesicle epithelial cells. Multiple groups (PMID 37440924, 37655160, 32823893) have shown that the increased flux through central carbon metabolism during capacitation is accompanied by an accumulation of intracellular lactate and increased secretion of lactate into the surrounding media. How do the authors explain that they see an increase in glucose uptake and ECAR but not in lactate and a decrease in pyruvate? Did the authors additionally quantify intracellular pyruvate and lactate? Since pyruvate and lactate are in constant equilibrium, it is odd that one metabolite is changing and the other one is not.

      Thank you for pointing this out. Since ECAR is often used as an alternative to lactate production but does not directly measure lactate levels, we measured changes in lactate and pyruvate concentrations in the culture medium. Under our experimental conditions, glucose appeared to be directed primarily towards anabolic processes, such as fatty acid synthesis, rather than the OXPHOS pathway, which may explain the lack of lactate production. The observed decrease in pyruvate might indicate its conversion to acetyl-CoA in the mitochondria, supporting both fatty acid synthesis and the TCA cycle. This shift would be consistent with the metabolic reprogramming toward anabolic activity.

      What do the authors mean by "the glycolytic pathway was not enhanced despite the activation of glycolysis" Seahorse, especially using a series of pathway inhibitors, only provides an indirect measurement of glycolysis and oxphos since the instrument does not provide a distinction from which pathways the detected protons are originating. The authors should consider a more optimized experimental design, i.e. the authors could monitor ECAR and OCR in the presence of glucose over time with and without the addition of testosterone. That would be less invasive since the sperm are not starved at the beginning of the experiment and would provide a more direct read-out. Did the authors normalize cell numbers in their experiment? Alternatively, the authors could consider performing metabolomics experiments.

      I agree with the reviewer. Buzzwords such as “glycolytic capacity” simply do not make sense, so we have removed them from the phrases noted by the reviewer. Please refer to the response to some of reviewer 1's points regarding the ambiguity of the data measured by the flux analyzer. Nevertheless, the assay design of the flux analysis could be used as a good “starting point” and provide information on the glycolytic system and respiratory control. Therefore, the interpretation of the flux analysis is supported by subsequent data sets.

      (9) The authors would strengthen their results by confirming their gene expression data by quantifying the expression of the respective proteins.

      Does testosterone treatment increase GLUT4 protein levels in isolated seminal vesicle epithelial cells? Or does it change the localization of the transporter? Are GLUT4 gene and protein levels altered in flutamide-treated cells? How do the authors explain that testosterone increases glucose uptake without changing Glut gene expression?

      We performed Western blot analysis to measure GLUT4 protein levels in seminal vesicle epithelial cells after testosterone treatment. The results showed that testosterone does not alter the expression of GLUT4 protein but simply changes its subcellular localization (New Fig.6C,D, lines 238-244).

      The discussion includes the interpretation of the observation that testosterone increases glucose uptake by altering localization without altering GLUT4 gene expression, a phenomenon commonly seen in other cells, such as cardiomyocytes (lines 362-364). The revised main figure also includes a data set of changes in GLUT4 localization, including flutamide-treated data. See also Reviewer 3's main comment #1.

      (10) Considering that the authors claim that SV secretions are crucial for sperm fertilization capacity, how do they explain that fertilization rates are still at 40 % when sperm are treated with flutamide?

      It is actually about 50% fertilized with HTF because it is fertilized without SV. Considering this baseline, we found that seminal vesicle secretions positively affect sperm in vivo fertilization. On the other hand, seminal plasma from flutamide-treated mice reduced the fertilization ability of healthy sperm. These are described in the text (lines 283-294).

      (11) It would be beneficial for the reader to include a schematic summarizing the results.

      Thank you for your advice from the reader's point of view. We have visualized the summaries of this study and added them to the manuscript (New Fig.10).

      Minor comments:

      Line 38: Male fertility, no article, please revise.

      I have changed “The male fertility” to “Male fertility” and added some references (lines 42-43).

      Line 55: Seminal plasma or TGFb? Please clarify.

      Corrected as follows. “TGFβ, a component of seminal plasma, increases antigen-specific Treg cells in the uterus of mice and humans, which induces immune tolerance, resulting in pregnancy.” (lines 60-62)

      Line 63: Why do the authors find it surprising that blood and seminal plasma have different compositions?

      This is because seminal plasma contains unique biochemical components that are not normally found in blood or only in small quantities. The intention was to emphasize the unique function of seminal plasma in supporting the physiological functions of sperm and to highlight its complex role by comparing it to blood. We clarified these intentions and reflected them in the revised text (lines 62-67).

      Line 94: The headline causes confusion. Seminal plasma does not induce sperm motility, it increases progressive sperm motility.

      Corrected as follows. “The effect of androgen-dependent changes in mouse seminal vesicle secretions on the linear motility of sperm” (lines 101-102)

      Reviewer #3 (Recommendations For The Authors):

      Thank you for allowing us to strengthen our manuscript with your valuable comments and queries. We have made our best efforts to reflect your feedback.

      Major:

      Figure 4 and Figure 5: The trend shows that GLUT3 is up-regulated and GLUT4 is downregulated although both of them are not statistically significant. However, GLUT4 is picked for all the following experiments based on protein localization. Providing other evidence/discussion why not to further consider other GLUTs will help to justify. Also, this reviewer suggests including GLUT4 localization data in the main figure as it is important data for the logical flow to link the following figures.

      We focused on GLUT4 because it was known that testosterone increases glucose uptake by changing the localization of GLUT4 without changing its expression (lines 230-231). In the revised manuscript, the increasing trend in Glut3 gene expression was also mentioned in the discussion, in addition to GLUT4 (lines 360-362). In any case, the results showed that testosterone increased glucose uptake by regulating the function of glucose transporters.

      Immunostaining of GLUT1~4 was performed to compare seminal vesicles from flutamide-treated mice with controls, and localization changes were observed only in GLUT4. Therefore, we hypothesized that GLUT4 is regulated by testosterone and performed the experiment. Fortunately, we were able to obtain a GLUT4-specific inhibitor, which dramatically inhibited the testosterone-dependent glucose uptake and subsequent lipid synthesis in seminal epithelial cells, leading us to believe that GLUT4 is a major glucose transporter.

      Increasing sperm linearity by oleic acid is observed and interpreted as enhanced sperm fertilizing potential. It is not clear why and how sperm linearity can be a determinant factor for enhancing sperm fertility in vivo. Providing an explanation of the effect of oleic acid on another key motility parameter more proven to be directly correlated with fertility (i.e., hyperactivation), and more direct evidence of oleic acid on enhancing sperm linearity indeed increasing sperm fertilization using IVF, is strongly recommended to support the author's main conclusion.

      Thank you for pointing this out. It is known that proteins derived from the seminal vesicles inhibit the hyperactivation of sperm and the acrosome reaction. Therefore, we conducted an experiment to add oleic acid, focusing on fatty acid synthesis caused by the metabolic shift of the seminal vesicles, which had not been known until now.

      Sperm were pretreated with an oleic acid-containing medium before IVF and oleic acid enhanced sperm linearity. When the sperm number was sufficient, there was no change in the cleavage rate after in vitro fertilization, but when the sperm count was reduced to one-tenth of the normal, the cleavage rate increased compared to the control (lines 274-282). In other words, the physiological role of oleic acid is to increase the probability of fertilization by keeping the sperm motility pattern linear or progressive. This increases the likelihood of the sperm passing through the female reproductive tract and environments that are unfavorable to sperm survival. Our research has uncovered significant insights into the role of seminal vesicle fluid and oleic acid in sperm fertilization. Due to the strong effect of the decapacitation factor, we found that seminal vesicle fluid reduces the fertilization rate in IVF. However, it does not interfere with the fertilization rate in in vivo during artificial insemination. This emphasizes the importance of oleic acid, along with other protein components of seminal plasma, in ensuring the in vivo fertilization ability of sperm.

      Minor:

      Please correct a typo in Line 173: sifts to shifts

      All typographical errors have been corrected.

    1. eLife Assessment

      This important study addresses the idea that defective lysosomal clearance might be causal to renal dysfunction in cystinosis. With mostly solid data, the authors observe that restoring expression of vATPase subunits and treatment with Astaxanthin ameliorate mitochondrial function in a model of renal epithelial cells, opening opportunities for translational application to humans.

    2. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Sur and colleagues present insights into the potential pathways and mechanisms underlying the pathogenesis of cystinosis - a prototypical lysosomal storage disorder caused by the loss of the cystine transporter cystinosin (CTNS). This deficiency results in early dysfunction of proximal tubule (PT) cells and proximal tubulopathy, which progresses to chronic kidney disease and multisystem complications later in life. The authors utilize patient-derived cell lines and knockout (KO) strategies in immortalized PT cell systems, alongside transcriptomics and pathway enrichment analyses, to demonstrate that the loss of CTNS function reduces V-ATPase subunits (specifically V-ATP6V0A1), impairing autophagy and mitochondrial homeostasis. These findings are consistent with their prior work and follow-up studies conducted in preclinical models (mouse, rat, and zebrafish) of cystinosis and CTNS deficiency.

      Importantly, the authors highlight rescue strategies that involve correcting V-ATP6V0A1 expression or modulating redox dyshomeostasis through ATX treatment. These interventions restore cellular homeostasis in patient-derived cells, providing actionable therapeutic targets for patients in need of novel causal therapies.

      Strengths:

      The implications for health, disease, and therapeutic discovery are considerable, given the central role of autophagy and lysosome-related pathways in regulating critical cellular processes and physiological functions.

      Weaknesses:

      Despite these promising findings, further experimental research is required to strengthen the study's framework and conclusions. This includes characterizing the physiological properties of the PT cellular systems used, performing appropriate control or sentinel experiments in lysosome function assays, and further delineating disease phenotypes associated with cystinosis. Follow-up investigations into lysosome abnormalities and autophagy dysfunctions are also needed, along with a detailed exploration of the molecular mechanisms underlying the rescue of lysosomal, autophagic, and mitochondrial phenotypes through ATX treatment.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Cystinosis is a rare hereditary disease caused by biallelic loss of the CTNS gene, encoding two cystinosin protein isoforms; the main isoform is expressed in lysosomal membranes where it mediates cystine efflux whereas the minor isoform is expressed at the plasma membrane and in other subcellular organelles. Sur et al proceed from the assumption that the pathways driving the cystinosis phenotype in the kidney might be identified by comparing the transcriptome profiles of normal vs CTNS-mutant proximal tubular cell lines. They argue that key transcriptional disturbances in mutant kidney cells might not be present in non-renal cells such as CTNS-mutant fibroblasts.

      Using cluster analysis of the transcriptomes, the authors selected a single vacuolar H+ATPase (ATP6VOA1) for further study, asserting that it was the "most significantly downregulated" vacuolar H+ATPase (about 58% of control) among a group of similarly downregulated H+ATPases. They then showed that exogenous ATP6VOA1 improved CTNS(-/-) RPTEC mitochondrial respiratory chain function and decreased autophagosome LC3-II accumulation, characteristic of cystinosis. The authors then treated mutant RPTECs with 3 "antioxidant" drugs, cysteamine, vitamin E, and astaxanthin (ATX). ATX (but not the other two antioxidant drugs) appeared to improve ATP6VOA1 expression, LC3-II accumulation, and mitochondrial membrane potential. Respiratory chain function was not studied. RTPC cystine accumulation was not studied.

      In this manuscript, as an initial step, we have studied the first step in respiratory chain function by performing the Seahorse Mito Stress Test to demonstrate that the genetic manipulation (knocking out the CTNS gene and plasmid-mediated expression correction of ATP6V0A1) impacts mitochondrial energetics. We did not investigate the respirometry-based assays that can identify locations of electron transport deficiency, which we plan to address in a follow-up paper.

      We would like to draw attention to Figure 3D, where cystine accumulation has been studied. This figure demonstrates an increased intracellular accumulation of cystine.

      The major strengths of this manuscript reside in its two primary findings.

      (1) Plasmid expression of exogenous ATP6VOA1 improves mitochondrial integrity and reduces aberrant autophagosome accumulation.

      (2) Astaxanthin partially restores suboptimal endogenous ATP6VOA1 expression.

      Taken together, these observations suggest that astaxanthin might constitute a novel therapeutic strategy to ameliorate defective mitochondrial function and lysosomal clearance of autophagosomes in the cystinotic kidney. This might act synergistically with the current therapy (oral cysteamine) which facilitates defective cystine efflux from the lysosome.

      There are, however, several weaknesses in the manuscript.

      (1) The reductive approach that led from transcriptional profiling to focus on ATP6VOA1 is not transparent and weakens the argument that potential therapies should focus on correction of this one molecule vs the other H+ ATPase transcripts that were equally reduced - or transcripts among the 1925 belonging to at least 11 pathways disturbed in mutant RPTECs.

      The transcriptional profiling studies on ATP6V0A1 have been fully discussed and publicly shared. Table 2 lists the v-ATPase transcripts that are significantly downregulated in cystinosis RPTECs. We have also clarified and justified the choice of further studies on ATP6V0A1, where we state the following: "The most significantly perturbed member of the V-ATPase gene family found to be downregulated in cystinosis RPTECs is ATP6V0A1 (Table 2). Therefore, further attention was focused on characterizing the role of this particular gene in a human in vitro model of cystinosis."

      (2) A precise description of primary results is missing -- the Results section is preceded by or mixed with extensive speculation. This makes it difficult to dissect valid conclusions from those derived from less informative experiments (eg data on CDME loading, data on whole-cell pH instead of lysosomal pH, etc).

      We appreciate the reviewer highlighting areas for further improving the manuscript's readership. In our resubmission, we have revised the results section to provide a more precise description of the primary findings and restrict the inferences to the discussion section only.

      (3) Data on experimental approaches that turned out to be uninformative (eg CDME loading, or data on whole=cell pH assessment with BCECF).

      We have provided data whether it was informative or uninformative. Though lysosome-specific pH measurement would be important to measure, it was not possible to do it in our cells as they were very sick and the assay did not work. Hence we provide data on pH assessment with BCECF, which measures overall cytoplasmic and organelle pH, which is also informative for whole cell pH that is an overall pH of organelle pH and cytoplasmic pH.

      (4) The rationale for the study of ATX is unclear and the mechanism by which it improves mitochondrial integrity and autophagosome accumulation is not explored (but does not appear to depend on its anti-oxidant properties).

      We have provided rationale for the study of ATX; provided in the introduction and result section, where we mentioned the following: “correction of ATP6V0A1 in CTNS-/- RPTECs and treatment with antioxidants specifically, astaxanthin (ATX) increased the production of cellular ATP6V0A1, identified from a custom FDA-drug database generated by our group, partially rescued the nephropathic RPTEC phenotype. ATX is a xanthophyll carotenoid occurring in a wide variety of organisms. ATX is reported to have the highest known antioxidant activity and has proven to have various anti-inflammatory, anti-tumoral, immunomodulatory, anti-cancer, and cytoprotective activities both in vivo and in vitro_”._

      We are still investigating the mechanism by which ATX improves mitochondrial integrity, and this will be the focus of a follow-on manuscript.

      (5) Thoughtful discussion on the lack of effect of ATP6VOA1 correction on cystine efflux from the lysosome is warranted, since this is presumably sensitive to intralysosomal pH.

      In the revised manuscript, we have included a detailed discussion on the plausible reasons why ATP6V0A1 correction has no effect on cysteine efflux from the lysosome. We have now added to the Discussion – “However, correcting ATP6V0A1 had no effect on cellular cystine levels, likely because cystinosin is known to have multiple roles beyond cystine transport Cystinosin is demonstrated to be crucial for activating mTORC1 signaling by directly interacting with v-ATPases and other mTORC1 activators. Cystine depletion using cysteamine does not affect mTORC1 signaling. Our data, along with these observations, further supports that cystinosin has multiple functions and that its cystine transport activity is not mediated by ATP6V0A1.”

      (6) Comparisons between RPTECs and fibroblasts cannot take into account the effects of immortalization on cell phenotype (not performed in fibroblasts).

      The purpose of examining different tissue sources of primary cells in nephropathic cystinosis was to assess if any of the changes in these cells were tissue source specific. We used primary cells isolated from patients with nephropathic cystinosis—RPTECs from patients' urine and fibroblasts from patients' skin—these cells are not immortalized and can therefore be compared. This is noted in the results section - “Specific transcriptional signatures are observed in cystinotic skin-fibroblasts and RPTECs obtained from the same individual with cystinosis versus their healthy counterparts”.

      We next utilized the immortalized RPTEC cell line to create CRISPR-mediated CTNS knockout RPTECs as a resource for studying the pathophysiology of cystinosis. These cells were not compared to the primary fibroblasts.

      (7) This work will be of interest to the research community but is self-described as a pilot study. It remains to be clarified whether transient transfection of RPTECs with other H+ATPases could achieve results comparable to ATP6VOA1. Some insight into the mechanism by which ATX exerts its effects on RPTECs is needed to understand its potential for the treatment of cystinosis.

      In future studies we will further investigate the effect of ATX on RPTECs for treatment of cystinosis- this will require the conduct of Phase 1 and Phase 2 clinical studies which are beyond the scope of this current manuscript.

      Reviewer #2 (Public Review):

      Sur and colleagues investigate the role of ATP6V0A1 in mitochondrial function in cystinotic proximal tubule cells. They propose that loss of cystinosin downregulates ATP6V0A1 resulting in acidic lysosomal pH loss, and adversely modulates mitochondrial function and lifespan in cystinotic RPTECs. They further investigate the use of a novel therapeutic Astaxanthin (ATX) to upregulate ATP6V0A1 that may improve mitochondrial function in cystinotic proximal tubules.

      The new information regarding the specific proximal tubular injuries in cystinosis identifies potential molecular targets for treatment. As such, the authors are advancing the field in an experimental model for potential translational application to humans.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) There is a lack of care with precise wording and punctuation, which negatively affects the text. Importantly, the manuscript lacks a clear description of experimental Results. This section begins with speculation, then wanders through experimentation that didn't work (could be deleted). Figure 1A and lines 94-102 could be deleted. Data from CDME loading was found to be a "poor surrogate" for cystinosis and could be deleted from the manuscript or mentioned as a minor point in the discussion. The number of individual patient cell lines used for experimentation is unclear - 8 patients are mentioned on line 109, Figure 2B shows 6 normal fibroblasts, 3 CDME-loaded fibroblasts, and an indeterminate number of normal vs CDME-loaded cells (both colored red). Cluster analysis refers to two large gene clusters - data supporting this key conclusion is not shown. It is unclear why ATP6VOA1 was selected as the most significantly reduced H+ATPase from Table II. Thus, the focus on this particular gene appears to be largely "a hunch".

      In this study, we aim to establish a new concept by using multiple cell types and various assays tailored to each affected organelle, which might be confusing. Therefore, we believe Figure 1a provides a roadmap and helps clarify what to expect from this paper.

      This study was started a decade back, when CDME-mediated lysosomal loading was regularly used as a surrogate in vitro model to study cystinosis tissue injury. That was the reason to include CDME in the study design. Since we already had the CDME-treated data and in this article we are talking about another superior in vitro cystinosis model, we would like to include it.

      In the Result and Methods section, we mentioned “8 patients” with nephropathic cystinosis from whom we collected the RPTECs and Fibroblasts. These cystinotic cells are shown in blue and purple dots, respectively in figure 2B. Normal RPTEC and fibroblast cells were purchased from company and these cells were then treated with CDME to artificially load lysosomes with cystine. Details on the cell types and its procurement can be found in the Methods section under “Study design and Samples”. Normal and CDME-loaded RPTECs are shown in red and orange dots, whereas normal and CDME-loaded fibroblasts are shown in green and yellow dots, respectively in figure 2B.

      We removed this figure from the manuscript because the data is already detailed in Tables 1 and 2. As a sub-figure, the string pathway analysis output was illegible and did not add any new information. However, for your reference, we have now provided this data below.

      Author response image 1.

      STRIG pathway analysis using the microarray transcriptomic data from normal vs.cystinotic RPTECs. Ysing K-mean clustering on the genes in these significantly enriched pathways, we identified 2 distinct clusters, red and green nodes. Red nodes are enriched in nucleus-encoded mitochondrial genes and v-ATPases family, which are crucial for lysosomes and kidney tubular acid secretion. ATP6VOA1, the topmost v-ATPase in our cystinotic transcriptome dataset is highlighted in cyan. Green nodes are enriched in genes needed for DNA replication.

      (2) It was decided to use transcriptional profiling of CTNS mutant vs wildtype renal proximal tubular cells (RPTECs) as a way to uncover defective secondary molecular pathways that might be upstream drivers of the cystinosis phenotype. Since the kidneys are the first organs to deteriorate in cystinosis, it is postulated that transcriptome differences might be more obvious in kidney cells than in non-renal tissues, such as fibroblasts. A potential pitfall is that the RPTECs were transformed cell lines whereas fibroblasts were not.

      Transcriptional profiling was done on primary cells isolated from patients with nephropathic cystinosis—RPTECs from patients' urine and fibroblasts from patients' skin—these cells are not immortalized and can therefore be compared. This is noted in the results section - “Specific transcriptional signatures are observed in cystinotic skin-fibroblasts and RPTECs obtained from the same individual with cystinosis versus their healthy counterparts”.

      We utilized the immortalized RPTEC cell line to create CRISPR-mediated CTNS knockout RPTECs as a resource for studying the pathophysiology of cystinosis. These cells were not compared to the primary fibroblasts.

      (3) The authors wanted to study intralysosomal pH but could not, so used a pH-sensitive dye that reflects whole cell pH. It would be incorrect to take this measurement as support for their hypothesis that intralysosomal pH is increased. Since these experiments cannot be interpreted, they should be deleted from the manuscript.

      We have now corrected the term to "intracellular pH." Although measuring lysosome-specific pH would be important, it was not feasible in our cells as knocking out cystinosin gene made them fragile, making the assay ineffective. Therefore, we provide data on pH assessment using BCECF, which measures the overall pH of the cytoplasm and organelles. This information is still valuable for understanding the whole cell pH, encompassing both organelle and cytoplasmic pH. We have mentioned this as one of our limitations in the Discussion section.

      (4) The choice of ATX as a potential therapy is puzzling. Its antioxidant properties seem to be irrelevant since two other antioxidants had no effect. The mechanism by which it appears to correct some aspects of the cystinosis phenotype remains unknown and this should be pointed out. A key experiment to assess whether ATX reduces lysosomal cystine accumulation is missing. While the impact of ATX on cystinosis is interesting, the mechanism is unexplored.

      A detailed study on the mechanism by which ATX corrects certain aspects of the cystinosis phenotype is currently underway and will be presented in a follow-up paper. We have measured the effect of ATX and cysteamine, both individually and combined, on cystine accumulation using HPLC, as shown in the figure below. Our results indicate a significant increase in cystine levels with ATX treatment alone, while the combined ATX and cysteamine treatment significantly reduced cystine accumulation to the normal level. This suggests that ATX addresses specific aspects of the cystinosis phenotype through a different mechanism, not by reducing the accumulated cystine levels. When co-administered with cysteamine, they have the potential to complement each other's shortcomings. We believe that the increase in cystine with ATX alone may be due to interactions between ATX's ketone or hydroxyl groups and cystine's amine or carboxylic groups. Further research on this interaction is ongoing.

      We have now added to the Discussion – “We noticed a significant increase in cystine levels with ATX treatment alone (data not shown in the manuscript), while the combined ATX and cysteamine treatment significantly reduced cystine accumulation to the normal level. This may suggest that when co-administered with cysteamine, they have the potential to complement each other's shortcomings. We believe that the increase in cystine with ATX alone could be due to interactions between ATX's ketone or hydroxyl groups and cystine's amine or carboxylic groups. Further research on this interaction is ongoing.”

      Author response image 2.

      (5) The effects of exogenous ATP6VOA1 are interesting but had no effect on lysosomal cystine efflux, a hallmark of the cystinosis cellular phenotype. A discussion of this issue would be important.

      In the revised manuscript, we have included a detailed discussion on the plausible reasons why ATP6V0A1 correction has no effect on cysteine efflux from the lysosome. We have added to the Discussion – “However, correcting ATP6V0A1 had no effect on cellular cystine levels (Figure 7C), likely because cystinosin is known to have multiple roles beyond cystine transport. Cystinosin is demonstrated to be crucial for activating mTORC1 signaling by directly interacting with v-ATPases and other mTORC1 activators. Cystine depletion using cysteamine does not affect mTORC1 signaling (47). Our data, along with these observations, further supports that cystinosin has multiple functions and that its cystine transport activity is not mediated by ATP6V0A1.”

      (6) The arguments on lines 260-273 are not comprehensible. The authors confirm that RPTC LC3-II levels are increased, a marker of active processing of autophagosome cargo, prior to delivery to lysosomes. Discussion of balfilomycin (not used), mTORC activity, and endocytosis are not directly relevant and wander from interpretation of the LC3-II observation. One possibility is that the 50% decrease in ATP6VOA1 transcript is sufficient to slow the transfer of LC3-II-tagged cargo from autophagosome to lysosome - however, it would be important to offer a plausible explanation for why decreased ATP6VOA1 expression alone does not appear to be the key limitation on lysosomal cystine efflux.

      We have now rephrased our explanation in the Discussion section – “Cystinotic cells are known to have an increased autophagy or reduced autophagosome turnover rate. Autophagic flux in a cell is typically assessed by examining the accumulation of the autophagosome or autophagy-lysosome marker LC3B-II. This accumulation can be artificially induced using bafilomycin, which targets the V-ATPase, thereby inhibiting lysosomal acidification and degradation of its contents. Taken together, the observed innate increase in LC3B-II in cystinotic RPTECs (Figure 5A) without bafilomycin treatment suggests dysfunctional lysosomal acidification and thus could be linked to inhibited v-ATPase activity”.

    1. eLife Assessment

      This is an important and convincing study of the morphological properties of Purkinje cell dendrites and dendritic spines in adult humans and mice, and the anatomical determinants of multi-innervation by climbing fibers. The data will provide a helpful resource for the field of cerebellar computation.

    2. Reviewer #1 (Public review):

      Summary:

      Busch and Hansel present a morphological and histological comparison between mouse and human Purkinje cells (PCs) in the cerebellum. The study reveals species-specific differences that have not previously been reported despite numerous observations of these species. While mouse PCs show morphological heterogeneity and occasional multi-innervation by climbing fibers (CFs), human PCs exhibit a widespread, multi-dendritic structure that exceeds expectations based on allometric scaling. Specifically, human PCs are significantly larger, and exhibit increased spine density, with a unique cluster-like morphology not found in mice.

      Strengths:

      The manuscript provides an exceptionally detailed analysis of PC morphology across species, surpassing any prior publication. Major strengths include a systematic and thorough methodology, rigorous data analysis, and clear presentation of results. This work is likely to become the go-to resource for quantitation in this field. The authors have largely achieved their aims, with the results effectively supporting their conclusions.

      Weaknesses:

      There are a few concerns that need to be addressed, specifically related to details of the methodolology as well as data interpretation based on the limits of some experimental approaches. Overall, these weaknesses are minor.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript aims to follow up on a previously published paper (Busch and Hansel 2023) which proposed that the morphological variation of dendritic bifurcation in Purkinje cells in mice and humans is indicative of the number of climbing fiber inputs, with dendritic bifurcation at the level of the soma resulting in a proportion of these neurons being multi-innervated. The functional and anatomical climbing fiber data was obtained solely from mice since all human tissue was embalmed and fixed, and the extension of these findings to human Purkinje cells was indirect. The current comparative anatomy study aims to resolve this question in human tissue more directly and to further analyse in detail the properties of adult human Purkinje cell dendritic morphology.

      Strengths:

      The authors have carried out a meticulous anatomical quantification of human Purkinje cell dendrites, in tissue preparations with a better signal-to-noise ratio than their previous study, comparing them with those from mice. Importantly, they now present immunolabelling results that trace climbing fiber axons innervating human PCs. As well as providing detailed analyses of spine properties and interesting new findings of human PC dendritic length and spine types, the work confirms that human PCs that have two clearly distinct dendritic branches have an approximately x% chance of receiving more than one CF input, segregated across the two branches. Albeit entirely observational, the data will be of widespread interest to the cerebellar field, in particular, those building computational models of Purkinje cells.

      Weaknesses:

      The work is, by necessity, purely anatomical. It remains to be seen whether there are any functional differences in ion channel expression or functional mapping of granule inputs to human PCs compared with the mouse that might mitigate the major differences in electronic properties suggested.

    4. Author response:

      We plan to submit a revised version of our manuscript eLife-RP-RA-2024-105013, in which we address all comments raised by the two expert reviewers.

      Below we describe what we like to address in this revision. We understand that the provisional response is not meant to be a point-by-point reply. Therefore, our revision plan more generally summarizes the comments of the reviewers and how we plan to address them.

      Reviewer #1:

      This reviewer is overall very positive and states that our ‘work is likely to become the go-to resource for quantification in this field’. This reviewer raises few weaknesses of the manuscript that are explicitly described as minor.

      Microscopic resolution sufficient to support quantitative spine assessments?

      In the detailed revision, we will provide quantification of microscopic resolution and will relate this to the spine comparisons offered. Where needed, we will add caveats discussing measurement limits.

      Age of the human tissue.

      Most analysis is based on the study of three brains from elderly individuals. For the analysis of dendritic spines, we added measures from a younger brain (37 years-old). We will make it more clear, which datasets contained these measures and what the results of our comparative analysis have been.

      Genetic diversity contributing to species differences?

      We provide an updated discussion on this interesting topic.

      Reviewer #2:

      This reviewer also expresses a largely positive view of the manuscript, noting that ‘..the data will be of widespread interest to the cerebellar field…’. 

      Microscopic resolution:

      see above.

      Figure panels / Fig. 3:

      We will make sure that the figures are readable and will provide a clarification of gray scales used in Fig. 3.

      Vertical vs horizontal dendrite orientation:

      This is a point that requires clarification. Per our definition, all dendrites fall either into the vertical or horizontal category. We will make sure that this is defined sufficiently well.

    1. eLife Assessment

      Combining experiments in microfluidic devices and computer simulation, this study provides a valuable analysis of the relevant parameters that determine the motility of (multicellular) magnetotactic bacteria in sediment-like environments. Despite the limitations imposed by the specific experimental design of the pores, the study presents convincing evidence that there is an optimum in the biological parameters for motile life under such conditions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors track the motion of multiple consortia of Multicellular Magnetotactic Bacteria moving through an artificial network of pores and report a discovery of a simple strategy for such consortia to move fast through the network: an optimum drift speed is attained for consortia that swim a distance comparable to the pore size in the time it takes to align the with an external magnetic field. The authors rationalize their observations using dimensional analysis and numerical simulations. Finally, they argue that the proposed strategy could generalize to other species by demonstrating the positive correlation between the swimming speed and alignment time based on parameters derived from literature.

      Strengths:

      The underlying dimensional analysis and model convincingly rationalize the experimental observation of an optimal drift velocity: the optimum balances the competition between the trapping in pores at large magnetic fields and random pore exploration for weak magnetic fields.

      Weaknesses:

      The convex pore geometry studied here creates convex traps for cells, which I expect enhances their trapping. The more natural concave geometries, resulting from random packing of spheres, would create no such traps. In this case, whether a non-monotonic dependence of the drift velocity on the Scattering number would persist is unclear.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have made microfluidic arrays of pores and obstacles with a complex shape and studied the swimming of multicellular magnetotactic bacteria through this system. They provide a comprehensive discussion of the relevant parameters of this system and identify one dimensionless parameter, which they call the scattering number and which depends on the swimming speed and magnetic moment of the bacteria as well as the magnetic field and the size of the pores, as the most relevant. They measure the effective speed through the array of pores and obstacles as a function of that parameter, both in their microfluidic experiments and in simulations, and find an optimal scattering number, which they estimate to reflect the parameters of the studied multicellular bacteria in their natural environment. They finally use this knowledge to compare different species to test the generality of this idea.

      Strengths:

      This is a beautiful experimental approach and the observation of an optimal scattering number (likely reflecting an optimal magnetic moment) is very convincing. The results here improve on similar previous work in two respects: On the one hand, the tracking of bacteria does not have the limitations of previous work, and on the other hand, the effective motility is quantified. Both features are enabled by choices of the experimental system: the use the multicellular bacteria which are larger than the usual single-celled magnetotactic bacteria and the design of the obstacle array which allows the quantification of transition rates due to the regular organization as well as the controlled release of bacteria into this array through a clever mechanism.

      Weaknesses:

      Some of the reported results are not as new as the authors suggest, specifically trapping by obstacles and the detrimental effect of a strong magnetic field have been reported before as has the hypothesis that the magnetic moment may be optimized for swimming in a sediment environment where there is a competition of directed swimming and trapping. Other than that, some of the key experimental choices on which the strength of the approach is based also come at a price and impose some limitations, namely the use of a non-culturable organism and the regular, somewhat unrealistic artificial obstacle array.

    4. Author response:

      Response to Referee 1

      We agree that convex walls increase the time that consortia remain trapped in pores at high magnetic fields. Since the non-monotonic behavior of the drift velocity with the Scattering number arises largely due to these long trapping times, we agree that experiments using concave pores are likely to show a peak drift velocity that is diminished or erased.

      However, we disagree that a random packing of spheres or similar particles provides an appropriate model for natural sediment, which is not composed exclusively of hard particles in a pure fluid. Pore geometry is also influenced by clogging. Biofilms growing within a network of convex pillars in two-dimensional microfluidic devices have been observed to connect neighboring pillars, thereby forming convex pores. Similar pore structures appear in simulations of biofilm growth between spherical particles in three dimensions. Moreover, the salt marsh sediment in which MMB live is more complex than simple sand grains, as cohesive organic particles are abundant. Experiments in microfluidic channels show that cohesive particles clog narrow passageways and form pores similar to those analyzed here. Thus, we expect convex pores to be present and even common in natural sediment where clogging plays a role.

      The concentration of convex pores in the experiments presented here is almost certainly much higher than in nature. Nonetheless, since magnetotactic bacteria continuously swim through the pore space, they are likely to regularly encounter such convexities. Efficient navigation of the pore space thus requires that magnetotactic bacteria be able to escape these traps. In the original version of this manuscript, this reasoning was reduced to only one or two sentences. That was a mistake, and we thank the reviewer for prompting us to expand on this point. As the reviewer notes, this reasoning is central to the analysis and should have been featured more prominently. In the final version, we will devote considerable space to this hypothesis and provide references to support the claims made above.

      The reviewer suggests that the generality of this work depends on our finding a "positive correlation between the swimming speed and alignment [rate] based on parameters derived from literature." We wish to emphasize that, in addition to predicting this correlation, our theory also predicts the function that describes it. The black line in Figure 3 is not fitted to the parameters found in the literature review; it is a pure prediction.

      Response to Referee 2

      In the "Recommendations for the Authors," this reviewer drew our attention to a manuscript that absolutely should have been prominently cited. As the reviewer notes, our manuscript meaningfully expands upon this work. We are pleased to learn that the phenomena discussed here are more general than we initially understood. It was an oversight not to have found this paper earlier. The final version will better contextualize our work and give due credit to the authors. We sincerely appreciate the reviewer for bringing this work to our attention.

      We disagree that the use of non-culturable organisms and our unrealistic array should be considered serious weaknesses. While any methodological choice comes with trade-offs, we believe these choices best advance our aims. First, the goal of our research, both within and beyond this manuscript, is to understand the phenotypes of magnetotactic bacteria in nature. While using pure cultures enables many useful techniques, phenotypic traits may drift as strains undergo domestication. We therefore prioritize studying environmental enrichments.

      Clearly, an array of obstacles does not fully represent natural heterogeneity. However, using regular pore shapes allows us to average over enough consortium-wall collisions to enable a parameter-free comparison between theory and experiment. Conducting an analysis like this with randomly arranged obstacles would require averaging over an ensemble of random environments, which is practically challenging given the experimental constraints. Since we find good agreement between theory and experiment in simple geometries, we are now in a position to justify extending our theory to more realistic geometries. Additionally, we note that a microfluidic device composed of a random arrangement of obstacles would also be a poor representation of environmental heterogeneity, as pore shape and network topology differ between two and three dimensions.

    1. eLife Assessment

      This valuable study assesses epigenetic clocks across ancestries, including in the context of accelerated aging in Alzheimer's Disease patients. It provides convincing evidence for population differences in age estimation accuracy across a variety of epigenetic clocks, but the degree to which these differences reflect continuous variation in ancestry, and/or are confounded by environmental or power differences is not entirely clear; consequently, the evidence that reduced portability is rooted in genetics is incomplete. Given the accelerating use of epigenetic clocks across fields, this study is nevertheless likely to be of interest to researchers working on human genetic and epigenetic variation or who apply epigenetic clocks to diverse human populations.

    2. Reviewer #1 (Public review):

      Summary:

      Cruz-González and colleagues draw on DNA methylation and paired genetic data from 621 participants (n=308 controls; n=313 participants with Alzheimer's Disease). The authors generate a panel of epigenetic biomarkers of aging with a primary focus on the Horvath multi-tissue clock. The authors find weaker correlations between predicted epigenetic age and chronological age in subgroups with higher African ancestry than within a subgroup identified as White. The authors then examine genetic variation as a potential source for between-group differences in epigenetic clock performance. The authors draw on a large collection of publicly available methylation quantitative trait loci datasets and find evidence for substantial overlap between clock CpGs located within the Horvath clock and methQTLs. Going further, the authors show that methQTLs that overlap with Horvath clock CpGs show greater allelic variation in African ancestral groups pointing to a potential explanation for poorer clock performance within this group.

      Strengths:

      This is an interesting dataset and an important research question. The authors cite issues of portability regarding polygenic risk scores as a motivation to examine between-group differences in the performance of a panel of epigenetic clocks. The authors benefit from a diverse cohort of individuals with paired genetic data and focus on a clinical phenotype, Alzheimer's disease, of clear relevance for studies evaluating age-related biomarkers.

      Weaknesses:

      While the authors tackle an important question using a diverse cohort the current manuscript is lacking some detail that may diminish the potential impact of this paper. For example:

      (1) Information on chronological ages across groups should be reported to ensure there are no systematic differences in ages or age ranges between groups (see point below).

      (2) The authors compare correlations between chronological age and epigenetic age in sub-groups within to correlations reported by Horvath (2013). Attempting to draw comparisons between these two datasets is problematic. The current study has a much smaller N (particularly for sub-group analyses) and has a more restricted age range (60-90yrs versus 0-100 yrs). Thus, is an alternative explanation simply that any weaker correlations observed in this study are driven by sample size and a restricted age range? Reporting the chronological ages (and ranges) across subgroups in the current study would help in this regard. Similarly, given the lack of association between AD status and epigenetic age (and very small effect in the white group), it may be of interest to examine the correlation between chronological age and epigenetic age in each group including the AD participants: would the between-group differences in correlations between chronological age and epigenetic be altered by increasing the sample size?

      (3) The correlation between chronological age and epigenetic age, while helpful is not the most informative estimate of accuracy. Median absolute error (and an analysis of MAE across subgroups) would be a helpful addition.

      (4) More information should be provided about how DNAm data were generated. Were samples from each ancestral group randomized across plates/slides to ensure ancestry and batch are not associated? How were batch effects considered? Given the relatively small sample sizes, it would be important to consider the impact of technical variation on measures of epigenetic age used in the current study. The use of principal Component-based versions of these clocks (Higgins Chen et al., 2023; Nature Aging https://doi.org/10.1038/s43587-022-00248-2) may help address concerns such concerns.

      (5) Marioni et al., (2015) found a very weak cross-sectional association between DNAm Age and cognitive function (r~0.07) in a cohort of >900 participants. Given these effect sizes, I would not interpret the absence of an effect in the current study to reflect issues of portability of epigenetic biomarkers.

      6) The methQTL analyses presented are suggestive of potential genetic influence on DNAm at some Horvath CpGs. Do authors see differences in DNAm across ancestral groups at these potentially affected CpGs? This seems to be a missing piece together (e.g., estimating the likely impact of methQTL on clock CpG DNAm).

    3. Reviewer #2 (Public review):

      Summary:

      This paper seeks to characterize the portability of methylation clocks across groups. Methylation clocks are trained to predict biological aging from DNA methylation but have largely been developed in datasets of individuals with primarily European ancestries. Given that genetic variation can influence DNA methylation, the authors hypothesize that methylation clocks might have reduced accuracy in non-European ancestries.

      Strengths:

      The authors evaluate five methylation clocks in 621 individuals from the MAGENTA study. This includes approximately 280 individuals sampled in Puerto Rico, Cuba, and Peru, as well as approximately 200 self-identified African American individuals sampled in the US. To understand how methylation clock accuracy varies with proportion of non-European ancestry, the authors inferred local ancestry for the Puerto Rican, Cuban, Peruvian, and African American cohorts. Overall, this paper presents solid evidence that methylation clocks have reduced accuracy in individuals with non-European ancestries, relative to individuals with primarily European ancestries. This should be of great interest to those researchers who seek to use methylation clocks as predictors of age-related, late-onset diseases and other health outcomes.

      Weaknesses:

      One clear strength of this paper is the ability to do more sophisticated analyses using the local ancestry calls for the MAGENTA study. It would be valuable to capitalize on this strength and assess portability across the genetic ancestry spectrum, as was recently advocated by Ding et al. in Nature (2023). For example, the authors could regress non-European local ancestry fraction on measures of prediction accuracy. This could paint a clearer picture of the relationship between genetic ancestry and clock accuracy, compared to looking at overall correlations within each cohort.

      The authors present two possible reasons that methylation clocks might have reduced accuracy in individuals with non-European ancestries: genetic variants disrupting methylation sites (i.e. "disruptive variants"), and genetic variants influencing methylation sites (i.e. meQTLs). The authors conclude disruptive variants do not contribute to poor methylation clock portability, but the evidence in support of this conclusion is incomplete. The site frequency spectrum of disruptive variants in Figure 4 is estimated from all gnomAD individuals, and gnomAD is comprised of primarily European individuals. Thus, the observation that disruptive variants are generally rare in gnomAD does not rule them out as a source of poor clock portability in admixed individuals with non-European ancestries.

      It is also unclear to what extent meQTLs impact methylation clock portability. The authors find that the frequency of meQTLs is higher in African ancestry populations, but this could reflect the fact that some of the analyzed meQTLs were ascertained in African Americans. The number of meQTL-affected methylation sites also varies widely between clocks, ranging from 6 to 271; thus, meQTLs likely impact the portability of different clocks in different ways. Overall, the paper would benefit from a more quantitative assessment of the extent to which meQTLs influence clock portability.

      The paper implies that methylation clocks have an inferior ability to predict AD risk in admixed populations relative to white individuals, but the difference between white AD patients and controls is not significant when correcting for multiple testing. This nuance should be made more explicit.

      Finally, this paper overlooks the possibility that environmental exposures co-vary with genetic ancestry and play a role in decreasing the accuracy of methylation clocks in genetically admixed individuals. Quantifying the impact of environmental factors is almost certainly outside of the scope of this paper. However, it is worth acknowledging the role of environmental factors to provide the field with a more comprehensive overview of factors influencing methylation clock portability. It is also essential to avoid the assumption that correlations with genetic ancestry necessarily arise from genetic causes.

    4. Reviewer #3 (Public review):

      This manuscript examines the accuracy of DNA methylation-based epigenetic clocks across multiple cohorts of varying genetic ancestry. The authors find that clocks were generally less accurate at predicting age in cohorts with large proportions of non-European (especially African) ancestry, compared to cohorts with high European ancestry proportions. They suggest that some of this effect might be explained by meQTLs that occur near CpG sites included in clocks, because these variants may be at higher frequencies (or at least different frequencies) in cohorts with high proportions of non-European ancestry relative to the training set. They also provide discussions of potential paths forward to alleviate bias and improve portability for future clock algorithms.

      The topic is timely due to the increasing popularity of DNA methylation-based clocks and the acknowledgment that many algorithms (e.g., polygenic risk scores) lack portability when applied to cohorts that substantially differ in ancestry or other characteristics from the training set. This has been discussed to some degree for DNA methylation-based clocks, but could of course use more discussion and empirical attention which the authors nicely provide using an impressive and diverse collection of data.

      The manuscript is clear and well-written, however, some key background was missing (e.g., what we know already about the ancestry composition of clock training sets) and most importantly several analyses would benefit from being taken one step further. For example, the main argument of the paper is that ancestry impacts clock predictions, but this is determined by subsetting the data by recruitment cohort rather than analyzing ancestry as a continuous variable. Extending some of the analyses could really help the authors nail down their hypothesized sources of lack of portability, which is critical for making recommendations to the community and understanding the best paths forward.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      Cruz-González and colleagues draw on DNA methylation and paired genetic data from 621 participants (n=308 controls; n=313 participants with Alzheimer's Disease). The authors generate a panel of epigenetic biomarkers of aging with a primary focus on the Horvath multi-tissue clock. The authors find weaker correlations between predicted epigenetic age and chronological age in subgroups with higher African ancestry than within a subgroup identified as White. The authors then examine genetic variation as a potential source for between-group differences in epigenetic clock performance. The authors draw on a large collection of publicly available methylation quantitative trait loci datasets and find evidence for substantial overlap between clock CpGs located within the Horvath clock and methQTLs. Going further, the authors show that methQTLs that overlap with Horvath clock CpGs show greater allelic variation in African ancestral groups pointing to a potential explanation for poorer clock performance within this group. 

      Thank you for this summary.

      Strengths:  

      This is an interesting dataset and an important research question. The authors cite issues of portability regarding polygenic risk scores as a motivation to examine between-group differences in the performance of a panel of epigenetic clocks. The authors benefit from a diverse cohort of individuals with paired genetic data and focus on a clinical phenotype, Alzheimer's disease, of clear relevance for studies evaluating age-related biomarkers.  

      Weaknesses:  

      While the authors tackle an important question using a diverse cohort the current manuscript is lacking some detail that may diminish the potential impact of this paper. For example:  

      (1) Information on chronological ages across groups should be reported to ensure there are no systematic differences in ages or age ranges between groups (see point below).  

      Thank you for pointing out this omission. The age ranges are similar across cohorts. No individuals under 60 were considered, and the average ages per cohort ranged from 72 to 76. Neither average age nor age range was consistently higher or lower in the admixed cohorts for which the clocks had lower performance compared to the White cohort. We will report the age distributions in supplementary material in the revision.

      (2) The authors compare correlations between chronological age and epigenetic age in sub-groups within to correlations reported by Horvath (2013). Attempting to draw comparisons between these two datasets is problematic. The current study has a much smaller N (particularly for sub-group analyses) and has a more restricted age range (6090yrs versus 0-100 yrs). Thus, is an alternative explanation simply that any weaker correlations observed in this study are driven by sample size and a restricted age range? Reporting the chronological ages (and ranges) across subgroups in the current study would help in this regard. Similarly, given the lack of association between AD status and epigenetic age (and very small effect in the white group), it may be of interest to examine the correlation between chronological age and epigenetic age in each group including the AD participants: would the between-group differences in correlations between chronological age and epigenetic be altered by increasing the sample size?

      Our conclusions about the reduced accuracy of the clocks in admixed individuals are based on comparisons within the MAGENTA cohorts, not on the comparisons to previous reports. We show significantly reduced accuracy on African American and Puerto Rican cohorts in MAGENTA compared to the White MAGENTA cohort. The reviewer is correct that the lower correlation in each of the cohorts compared to those in the Horvath study is due to the older age range of our cohort. Indeed, other studies applying the Horvath clock have seen similar correlations to those observed on the White MAGENTA cohort (Marioni et al., 2015, Horvath 2013, and Shireby et al., 2020). Following the suggestion to increase sample size, we conducted the chronological age vs. epigenetic age correlation analysis with the inclusion of AD cases. The significantly lower performance of the clock on Puerto Ricans and African Americans relative to White individuals remains after including all individuals in each cohort. We will include these results on the full cohorts in MAGENTA in the revision.

      (3) The correlation between chronological age and epigenetic age, while helpful is not the most informative estimate of accuracy. Median absolute error (and an analysis of MAE across subgroups) would be a helpful addition.  

      We used correlation because this is commonly used to evaluate the performance of epigenetic age clocks, but we agree that direct error quantification provides a complementary perspective. We confirm that the African American and Puerto Rican cohorts have higher error than the White cohort, and we will report these comparisons in the revision.

      (4) More information should be provided about how DNAm data were generated. Were samples from each ancestral group randomized across plates/slides to ensure ancestry and batch are not associated? How were batch effects considered? Given the relatively small sample sizes, it would be important to consider the impact of technical variation on measures of epigenetic age used in the current study. The use of principal Component-based versions of these clocks (Higgins Chen et al., 2023; Nature Aging https://doi.org/10.1038/s43587-022-00248-2) may help address concerns such concerns.  

      Thank you for pointing out the need for additional context on data generation. All omics data from the MAGENTA study were generated using protocols that aim to minimize technical artifacts and batch effects. We will add detailed protocol information will be detailed in the revision. We also thank the reviewer for their suggestion on applying the principal component clock to account for potential technical variation. We are planning to perform these analyses and include them in the revision.

      (5) Marioni et al., (2015) found a very weak cross-sectional association between DNAm Age and cognitive function (r~0.07) in a cohort of >900 participants. Given these effect sizes, I would not interpret the absence of an effect in the current study to reflect issues of portability of epigenetic biomarkers. 

      We agree that previous links between DNAm Age and AD/cognitive function have been small in magnitude. For example, the PhenoAge paper (Levine et al., 2018) and a study using the Horvath clock (Levine et al., 2015) found age acceleration of less than a year in AD patients relative to non-demented individuals. These effects have been detected in studies with relatively small sample sizes (e.g., 700 for Levine et al. 2015 and 604 for Levine et al. 2018). Our study is of similar size, but the cohort-specific analyses have lower power. Nonetheless, we replicate the modest, but significant association with AD in the white MAGENTA cohort. We have performed power calculations and find that we have 26% power to detect an effect of this size in the Cubans, 46% for the Peruvians, 66% for the Whites, 74% for the Puerto Ricans, and 84% for the African Americans. Given the relatively high power in the Puerto Rican and African American cohorts, we suggest that the reduced accuracy of the clocks contributes to the lack of association. We will also add caveats about power and the small sample size in the revision.

      6) The methQTL analyses presented are suggestive of potential genetic influence on DNAm at some Horvath CpGs. Do authors see differences in DNAm across ancestral groups at these potentially affected CpGs? This seems to be a missing piece together (e.g., estimating the likely impact of methQTL on clock CpG DNAm). 

      Thank you for this excellent suggestion. We will add this analysis in the revision. This will enable us to test for further evidence for our hypothesis about the role of ancestryspecific meQTL on clock accuracy.  

      Reviewer #2 (Public review):

      Summary:  

      This paper seeks to characterize the portability of methylation clocks across groups. Methylation clocks are trained to predict biological aging from DNA methylation but have largely been developed in datasets of individuals with primarily European ancestries. Given that genetic variation can influence DNA methylation, the authors hypothesize that methylation clocks might have reduced accuracy in non-European ancestries.  

      Strengths:  

      The authors evaluate five methylation clocks in 621 individuals from the MAGENTA study. This includes approximately 280 individuals sampled in Puerto Rico, Cuba, and Peru, as well as approximately 200 self-identified African American individuals sampled in the US. To understand how methylation clock accuracy varies with proportion of nonEuropean ancestry, the authors inferred local ancestry for the Puerto Rican, Cuban, Peruvian, and African American cohorts. Overall, this paper presents solid evidence that methylation clocks have reduced accuracy in individuals with non-European ancestries, relative to individuals with primarily European ancestries. This should be of great interest to those researchers who seek to use methylation clocks as predictors of agerelated, late-onset diseases and other health outcomes.

      Thank you for this summary.

      Weaknesses:  

      One clear strength of this paper is the ability to do more sophisticated analyses using the local ancestry calls for the MAGENTA study. It would be valuable to capitalize on this strength and assess portability across the genetic ancestry spectrum, as was recently advocated by Ding et al. in Nature (2023). For example, the authors could regress non-European local ancestry fraction on measures of prediction accuracy. This could paint a clearer picture of the relationship between genetic ancestry and clock accuracy, compared to looking at overall correlations within each cohort. 

      Thank you for this excellent suggestion. We agree that modeling portability across genetic ancestry as a spectrum would help support our conclusions. We will add this to the revision.

      The authors present two possible reasons that methylation clocks might have reduced accuracy in individuals with non-European ancestries: genetic variants disrupting methylation sites (i.e., "disruptive variants") and genetic variants influencing methylation sites (i.e., meQTLs). The authors conclude disruptive variants do not contribute to poor methylation clock portability, but the evidence in support of this conclusion is incomplete. The site frequency spectrum of disruptive variants in Figure 4 is estimated from all gnomAD individuals, and gnomAD is comprised of primarily European individuals. Thus, the observation that disruptive variants are generally rare in gnomAD does not rule them out as a source of poor clock portability in admixed individuals with non-European ancestries. 

      Thank you for this question. The allele frequencies were so low that even if they all occurred in individuals of non-European ancestries, they would still be incredibly rare. Nonetheless, in the revision, we will make this clear by reporting ancestry-specific allele frequencies.

      It is also unclear to what extent meQTLs impact methylation clock portability. The authors find that the frequency of meQTLs is higher in African ancestry populations, but this could reflect the fact that some of the analyzed meQTLs were ascertained in African Americans. The number of meQTL-affected methylation sites also varies widely between clocks, ranging from 6 to 271; thus, meQTLs likely impact the portability of different clocks in different ways. Overall, the paper would benefit from a more quantitative assessment of the extent to which meQTLs influence clock portability. 

      We agree that the meQTL likely influence the clocks in different ways and that the ascertainment of the meQTLs in different populations makes direct comparisons challenging. To provide mechanistic insights into the ways that meQTL influence the methylation clocks, we plan to leverage the individual-level genetic data generated for the MAGENTA individuals. This will allow us to explore whether the individuals who have the specified clock-influencing meQTL receive less accurate predictions from the methylation clocks. In addition, the new analysis of whether individuals from different cohorts have different methylation levels at clock CpGs with ancestry-variable meQTLs will help establish the differences between groups (see response to Reviewer #1 point 6). Finally, to resolve potential bias due to ascertaining some of the meQTL in African Americans, we will conduct the same analyses from the manuscript, holding out the set of meQTL from African Americans. These results will be included in the revision.

      The paper implies that methylation clocks have an inferior ability to predict AD risk in admixed populations relative to white individuals, but the difference between white AD patients and controls is not significant when correcting for multiple testing. This nuance should be made more explicit. 

      We agree that the signal is not particularly strong in the white cohort, but the effect size is in line with previous studies. We will add power calculations and discussion to help the interpretation of these results (see response to Reviewer #1 point 5).  

      Finally, this paper overlooks the possibility that environmental exposures co-vary with genetic ancestry and play a role in decreasing the accuracy of methylation clocks in genetically admixed individuals. Quantifying the impact of environmental factors is almost certainly outside of the scope of this paper. However, it is worth acknowledging the role of environmental factors to provide the field with a more comprehensive overview of factors influencing methylation clock portability. It is also essential to avoid the assumption that correlations with genetic ancestry necessarily arise from genetic causes.  

      We entirely agree about the importance of discussing environmental exposures. We did not intend to discount them in our manuscript. We will clarify their potential role and the scope of our analyses in the revision. We expect that environmental factors certainly contribute to differences between groups. The revisions outlined above may help us better quantify the genetic contribution.

      Reviewer #3 (Public review):

      This manuscript examines the accuracy of DNA methylation-based epigenetic clocks across multiple cohorts of varying genetic ancestry. The authors find that clocks were generally less accurate at predicting age in cohorts with large proportions of nonEuropean (especially African) ancestry, compared to cohorts with high European ancestry proportions. They suggest that some of this effect might be explained by meQTLs that occur near CpG sites included in clocks, because these variants may be at higher frequencies (or at least different frequencies) in cohorts with high proportions of non-European ancestry relative to the training set. They also provide discussions of potential paths forward to alleviate bias and improve portability for future clock algorithms.  

      The topic is timely due to the increasing popularity of DNA methylation-based clocks and the acknowledgment that many algorithms (e.g., polygenic risk scores) lack portability when applied to cohorts that substantially differ in ancestry or other characteristics from the training set. This has been discussed to some degree for DNA methylation-based clocks, but could of course use more discussion and empirical attention which the authors nicely provide using an impressive and diverse collection of data.  

      The manuscript is clear and well-written, however, some key background was missing (e.g., what we know already about the ancestry composition of clock training sets) and most importantly several analyses would benefit from being taken one step further. For example, the main argument of the paper is that ancestry impacts clock predictions, but this is determined by subsetting the data by recruitment cohort rather than analyzing ancestry as a continuous variable. Extending some of the analyses could really help the authors nail down their hypothesized sources of lack of portability, which is critical for making recommendations to the community and understanding the best paths forward.  

      Thank you for these suggestions. As noted in our response to reviewer #2, we will analyze ancestry as a continuous variable in the revision. We will also add details on the training of previous clocks and previous work on clock accuracy.

    1. eLife Assessment

      Using several hundreds of samples and cutting-edge genomic methods, including BioNano, PacBio HiFi, and advanced bioinformatic pipelines, the authors identify six large chromosomal inversions segregating in over 100 species of Lake Malawi cichlids. This important study provides compelling evidence for the presence of these six inversions, their differential distribution among populations, and the association of chromosome 10 inversion with a sex-determination locus. This work also provides a starting point for further investigating the role of these inversions with respect to local adaptation, speciation, sex determination, hybridization, and incomplete lineage sorting in cichlids, which represent ~5% of the extant vertebrate species and are one of the most prominent examples of adaptive radiations.

    2. Reviewer #1 (Public review):

      Summary:

      Using high-quality genomic data (long-reads, optical maps, short-reads) and advanced bioinformatic analysis, the authors aimed to document chromosomal rearrangements across a recent radiation (Lake Malawi Cichlids). Working on 11 species, they achieved a high-resolution inversion detection and then investigated how inversions are distributed within populations (using a complementary dataset of short-reads), associated with sex, and shared or fixed among lineages. The history and ancestry of the inversions is also explored.

      On one hand, I am very enthusiastic about the global finding (many inversions well-characterized in a highly diverse group!) and impressed by the amount of work put into this study. On the other hand, I have struggled so much to read the manuscript that I am unsure about how much the data supports some claims. I'm afraid most readers may feel the same and really need a deep reorganisation of the text, figures, and tables. I reckon this is difficult given the complexity brought by different inversions/different species/different datasets but it is highly needed to make this study accessible.

      The methods of comparing optical maps, and looking at inversions at macro-evolutionary scales can be useful for the community. For cichlids, it is a first assessment that will allow further tests about the role of inversions in speciation and ecological specialisation. However, the current version of the manuscript is hardly accessible to non-specialists and the methods are not fully reproducible.

      Strengths:

      (1) Evidence for the presence of inversion is well-supported by optical mapping (very nice analysis and figure!).

      (2) The link between sex determination and inversion in chr 10 in one species is very clearly demonstrated by the proportion in each sex and additional crosses. This section is also the easiest to read in the manuscript and I recommend trying to rewrite other result sections in the same way.

      (3) A new high-quality reference genome is provided for Metriaclima zebra (and possibly other assemblies? - unclear).

      (4) The sample size is great (31 individuals with optical maps if I understand well?).

      (5) Ancestry at those inversions is explored with outgroups.

      (6) Polymorphism for all inversions is quantified using a complementary dataset.

      Weaknesses:

      (1) Lack of clarity in the paper: As it currently reads, it is very hard to follow the different species, ecotypes, samples, inversions, etc. It would be useful to provide a phylogeny explicitly positioning the samples used for assembly and the habitat preference. Then the text would benefit from being organised either by variant or by subgroups rather than by successive steps of analysis.

      (2) Lack of information for reproducibility: I couldn't find clearly the filters and parameters used for the different genomic analyses for example. This is just one example and I think the methods need to be re-worked to be reproducible. Including the codes inside the methods makes it hard to follow, so why not put the scripts in an indexed repository?

      (3) Further confirmation of inversions and their breakpoints would be valuable. I don't understand why the long-reads (that were available and used for genome assembly) were not also used for SV detection and breakpoint refinement.

      (4) Lack of statistical testing for the hypothesis of introgression: Although cichlids are known for high levels of hybridization, inversions can also remain balanced for a long time. what could allow us to differentiate introgression from incomplete lineage sorting?

      (5) The sample size is unclear: possibly 31 for Bionano, 297 for short-reads, how many for long-reads or assemblies? How is this sample size split across species? This would deserve a table.

      (6) Short read combines several datasets but batch effect is not tested.

      (7) It is unclear how ancestry is determined because the synteny with outgroups is not shown.

      (8) The level of polymorphism for the different inversions is difficult to interpret because it is unclear whether replicated are different species within an eco-group or different individuals from the same species. How could it be that homozygous references are so spread across the PCA? I guess the species-specific polymorphism is stronger than the ancestral order but in such a case, wouldn't it be worth re-doing the PCa on a subset?

    3. Reviewer #2 (Public review):

      Summary:

      Chromosomal inversions have been predicted to play a role in adaptive evolution and speciation because of their ability to "lock" together adaptive alleles in genomic regions of low recombination. In this study, the authors use a combination of cutting-edge genomic methods, including BioNano and PacBio HiFi sequencing, to identify six large chromosomal inversions segregating in over 100 species of Lake Malawi cichlids, a classic example of adaptive radiation and rapid speciation. By examining the frequencies of these inversions present in species from six different linages, the authors show that there is an association between the presence of specific inversions with specific lineages/habitats. Using a combination of phylogenetic analyses and sequencing data, they demonstrate that three of the inversions have been introduced to one lineage via hybridization. Finally, genotyping of wild individuals as well as laboratory crosses suggests that three inversions are associated with XY sex determination systems in a subset of species. The data add to a growing number of systems in which inversions have been associated with adaptation to divergent environments. However, like most of the other recent studies in the field, this study does not go beyond describing the presence of the inversions to demonstrate that the inversions are under sexual or natural selection or that they contribute to adaptation or speciation in this system.

      Strengths:

      All analyses are very well done, and the conclusions about the presence of the six inversions in Lake Malawi cichlids, the frequencies of the inversions in different species, and the presence of three inversions in the benthic lineages due to hybridization are well-supported. Genotyping of 48 individuals resulting from laboratory crosses provides strong support that the chromosome 10 inversion is associated with a sex-determination locus.

      Weaknesses:

      The evidence supporting a role for the chromosome 11 inversion and the chromosome 9 inversion in sex determination is based on relatively few individuals and therefore remains suggestive. The authors are mostly cautious in their interpretations of the data. However, there are a few places where they state that the inversions are favored by selection, but they provide no evidence that this is the case and there is no consideration of alternative hypotheses (i.e. that the inversions might have been fixed via drift).

    4. Reviewer #3 (Public review):

      This is a very interesting paper bringing truly fascinating insight into the genomic processes underlying the famous adaptive radiation seen in cichlid fishes from Lake Malawi. The authors use structural and sequence information from species belonging to distinct ecotypic categories, representing subclades of the radiation, to document structural variation across the evolutionary tree, infer introgression of inversions among branches of the clade, and even suggest that certain rearrangements constitute new sex-determining loci. The insight is intriguing and is likely to make a substantial contribution to the field and to seed new hypotheses about the ecological processes and adaptive traits involved in this radiation.

      I think the paper could be clarified in its prose, and that the discussion could be more informative regarding the putative roles of the inversions in adaptation to each ecotypic niche. Identifying key, large inversions shared in various ways across the different taxa is really a great step forward. However, the population genomics analysis requires further work to describe and decipher in a more systematic way the evolutionary forces at play and their consequences on the various inversions identified.

      The model of evolution involving multiple inversions putatively linking together co-adapted "cassettes" could be better spelled out since it is not entirely clear how the existing theory on the recruitment of inversions in local adaptation (e.g. Kirkpatrick and Barton) operates on multiple unlinked inversions. How such loci correspond to distinct suites of integrated traits, or not, is not very easy to envision in the current state of the manuscript.

      The role of one inversion in sex determination is apparent and truly intriguing. However, the implication of such locus on ecological adaptation is somewhat puzzling. Also, whether sex determination loci can flow across species via introgression seems quite important as a route to chromosomal sex determination, so this could be discussed further.

    5. Author response:

      We thank the reviewers for the careful review of our manuscript. Overall, they were positive about our use of cutting-edge methods to identify six inversions segregating in Lake Malawi. Their distribution in ~100 species of Lake Malawi species demonstrated that they were differentially segregating in different ecogroups/habitats and could potentially play a role in local adaptation, speciation, and sex determination. Reviewers were positive about our finding that the chromosome 10 inversion was associated with sex-determination in a deep benthic species and its potential role in regulating traits under sexual selection. They agree that this work is an important starting point in understanding the role of these inversions in the amazing phenotypic diversity found in the Lake Malawi cichlid flock.

      There were two main criticisms that were made which we summarize:

      (1) Lack of clarity. It was noted that the writing could be improved to make many technical points clearer. Additionally, certain discussion topics were not included that should be.

      We will rewrite the text and add additional figures and tables to address the issues that were brought up in a point-by-point response. We will improve/include (1) the nomenclature to understand the inversions in different lineages, (2) improved descriptions for various genomic approaches, (3) a figure to document the samples and technologies used for each ecogroup, and 4) integration of LR sequences to identify inversion breakpoints to the finest resolution possible.

      (2) We overstate the role that selection plays in the spread of these inversions and neglect other evolutionary processes that could be responsible for their spread.

      We agree with the overarching point. We did not show that selection is involved in the spread of these inversions and other forces can be at play. Additionally, there were concerns with our model that the inversions introgressed from a Diplotaxodon ancestor into benthic ancestors and incomplete lineage sorting or balancing selection (via sex determination) could be at play. Overall, we agree with the reviewers with the following caveats. 1. Our analysis of the genetic distance between Diplotaxodons and benthic species in the inverted regions is more consistent with their spread through introgression versus incomplete lineage sorting or balancing selection. 2. This question of selection is much more complicated in the context of the Lake Malawi cichlid radiation with ~800 different species. We believe the role of these inversions must be considered in a species- and time-specific way. In other words, the evolutionary forces acting on these inversions at the time of their formation are likely different than the role of the evolutionary forces acting now. Further the role of these inversions is likely different in different species. For example, the inversion of 10 and 11 play a role in sex determination in some species but not others and the potential pressures acting on the inverted and non-inverted haplotypes will be very different. These are very interesting and important questions booth for understanding the adaptive radiations in Lake Malawi and in general, and we are actively studying crosses to understand the role of these inversions in phenotypic variation between two species. We will modify the text to make all of these points clearer.

    1. eLife Assessment

      This important work advances our understanding of intraflagellar transport, ciliogenesis, and ciliary-based signaling, by identifying the interactions of IFT172 with IFT-A components, ubiquitin-binding, and ubiquitination, mediated by IFT172 C-terminus and its role in ciliogenesis and ciliary signaling. The results of the structural analysis of the IFT172 C-terminus and the evidence for the interaction between IFT172 and IFT-A components are convincing. However, the analysis of ubiquitin-binding and ubiquitination mediated by IFT172 is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      Zacharia and colleagues investigate the role of the C-terminus of IFT172 (IFT172c), a component of the IFT-B subcomplex. IFT172 is required for proper ciliary trafficking and mutations in its C-terminus are associated with skeletal ciliopathies. The authors begin by performing a pull-down to identify binding partners of His-tagged CrIFT172968-C in Chlamydomonas reinhardtii flagella. Interactions with three candidates (IFT140, IFT144, and a UBX-domain containing protein) are validated by AlphaFold Multimer with the IFT140 and IFT144 predictions in agreement with published cryo-ET structures of anterograde and retrograde IFT trains. They present a crystal structure of IFT172c and find that a part of the C-terminal domain of IFT172 resembles the fold of a non-canonical U-box domain. As U-box domains typically function to bind ubiquitin-loaded E2 enzymes, this discovery stimulates the authors to investigate the ubiquitin-binding and ubiquitination properties of IFT172c. Using in vitro ubiquitination assays with truncated IFT172c constructs, the authors demonstrate partial ubiquitination of IFT172c in the presence of the E2 enzyme UBCH5A. The authors also show a direct interaction of IFT172c with ubiquitin chains in vitro. Finally, the authors demonstrate that deletion of the U-box-like subdomain of IFT172 impairs ciliogenesis and TGFbeta signaling in RPE1 cells.

      However, some of the conclusions of this paper are only partially supported by the data, and presented analyses are potentially governed by in vitro artifacts. In particular, the data supporting autoubiquitination and ubiquitin-binding are inconclusive. Without further evidence supporting a ubiquitin-binding role for the C-terminus, the title is potentially misleading.

      Strengths:

      (1) The pull-down with IFT172 C-terminus from C. reinhardtii cilia lysates is well performed and provides valuable insights into its potential roles.

      (2) The crystal structure of the IFT172 C-terminus is of high quality.

      (3) The presented AlphaFold-multimer predictions of IFT172c:IFT140 and IFT172c:IFT144 are convincing and agree with experimental cryo-ET data.

      Weaknesses:

      (1) The crystal structure of HsIFT172c reveals a single globular domain formed by the last three TPR repeats and C-terminal residues of IFT172. However, the authors subdivide this globular domain into TPR, linker, and U-box-like regions that they treat as separate entities throughout the manuscript. This is potentially misleading as the U-box surface that is proposed to bind ubiquitin or E2 is not surface accessible but instead interacts with the TPR motifs. They justify this approach by speculating that the presented IFT172c structure represents an autoinhibited state and that the U-box-like domain can become accessible following phosphorylation. However, additional evidence supporting the proposed autoinhibited state and the potential accessibility of the U-box surface following phosphorylation is needed, as it is not tested or supported by the current data.

      (2) While in vitro ubiquitination of IFT172 has been demonstrated, in vivo evidence of this process is necessary to support its physiological relevance.

      (3) The authors describe IFT172 as being autoubiquitinated. However, the identified E2 enzymes UBCH5A and UBCH5B can both function in E3-independent ubiquitination (as pointed out by the authors) and mediate ubiquitin chain formation in an E3-independent manner in vitro (see ubiquitin chain ladder formation in Figure 3A). In addition, point mutation of known E3-binding sites in UBCH5A or TPR/U-box interface residues in IFT172 has no effect on the mono-ubiquitination of IFT172c1. Together, these data suggest that IFT172 is an E3-independent substrate of UBCH5A in vitro. The authors should state this possibility more clearly and avoid terminology such as "autoubiquitination" as it implies that IFT172 is an E3 ligase, which is misleading. Similarly, statements on page 10 and elsewhere are not supported by the data (e.g. "the low in vitro ubiquitination activity exhibited by IFT172" and "ubiquitin conjugation occurring on HsIFT172C1 in the presence of UBCH5A, possibly in coordination with the IFT172 U-box domain").

      (4) Related to the above point, the conclusion on page 11, that mono-ubiquitination of IFT172 is U-box-independent while polyubiquitination of IFT172 is U-box-dependent appears implausible. The authors should consider that UBCH5A is known to form free ubiquitin chains in vitro and structural rearrangements in F1715A/C1725R variants could render additional ubiquitination sites or the monoubiquitinated form of IFT172 inaccessible/unfavorable for further processing by UBCH5A.

      (5) Identification of the specific ubiquitination site(s) within IFT172 would be valuable as it would allow targeted mutation to determine whether the ubiquitination of IFT172 is physiologically relevant. Ubiquitination of the C1 but not the C2 or C3 constructs suggests that the ubiquitination site is located in TPRs ranging from residues 969-1470. Could this region of TPR repeats (lacking the IFT172C3 part) suffice as a substrate for UBCH5A in ubiquitination assays?

      (6) The discrepancy between the molecular weight shifts observed in anti-ubiquitin Western blots and Coomassie-stained gels is noteworthy. The authors show the appearance of a mono-ubiquitinated protein of ~108 kDa in anti-ubiquitin Western blots. However, this molecular weight shift is not observed for total IFT172 in the corresponding Coomassie-stained gels (Figures 3B, D, F). Surprisingly, this MW shift is visible in an anti-His Western blot of a ubiquitination assay (Fig 3C). Together, this raises the concern that only a small fraction of IFT172 is being modified with ubiquitin. Quantification of the percentage of ubiquitinated IFT172 in the in vitro experiments could provide helpful context.

      (7) The authors propose that IFT172 binds ubiquitin and demonstrate that GST-tagged HsIFT172C2 or HsIFT172C3 can pull down tetra-ubiquitin chains. However, ubiquitin is known to be "sticky" and to have a tendency for weak, nonspecific interactions with exposed hydrophobic surfaces. Given that only a small proportion of the ubiquitin chains bind in the pull-down, specific point mutations that identify the ubiquitin-binding site are required to convincingly show the ubiquitin binding of IFT172.

      (8) The authors generated structure-guided mutations based on the predicted Ub-interface and on the TPR/U-box interface and used these for the ubiquitination assays in Fig 3. These same mutations could provide valuable insights into ubiquitin binding assays as they may disrupt or enhance ubiquitin binding (by relieving "autoinhibition"), respectively. Surprisingly, two of these sites are highlighted in the predicted ubiquitin-binding interface (F1715, I1688; Figure 4E) but not analyzed in the accompanying ubiquitin-binding assays in Figure 4.

      (9) If IFT172 is a ubiquitin-binding protein, it might be expected that the pull-down experiments in Figure S1 would identify ubiquitin, ubiquitinated proteins, or E2 enzymes. These were not observed, raising doubt that IFT172 is a ubiquitin-binding protein.

      (10) The cell-based experiments demonstrate that the U-box-like region is important for the stability of IFT172 but does not demonstrate that the effect on the TGFb pathway is due to the loss of ubiquitin-binding or ubiquitination activity of IFT172.

      (11) The challenges in experimentally validating the interaction between IFT172 and the UBX-domain-containing protein are understandable. Alternative approaches, such as using single domains from the UBX protein, implementing solubilizing tags, or disrupting the predicted binding interface in Chlamydomonas flagella pull-downs, could be considered. In this context, the conclusion on page 7 that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a direct IFT172 interactor" is incorrect as a prediction of an interaction interface with AF-M does not validate a direct interaction per se.

    3. Reviewer #2 (Public review):

      Summary:

      Cilia are antenna-like extensions projecting from the surface of most vertebrate cells. Protein transport along the ciliary axoneme is enabled by motor protein complexes with multimeric so-called IFT-A and IFT-B complexes attached. While the components of these IFT complexes have been known for a while, precise interactions between different complex members, especially how IFT-A and IFT-B subcomplexes interact, are still not entirely clear. Likewise, the precise underlying molecular mechanism in human ciliopathies resulting from IFT dysfunction has remained elusive.

      Here, the authors investigated the structure and putative function of the to-date poorly characterised C-terminus of IFT-B complex member IFT172 using alpha-fold predictions, crystallography and biochemical analyses including proteomics analyses followed by mass spectrometry, pull-down assays, and TGFbeta signalling analyses using chlamydomonas flagellae and RPE cells. The authors hereby provide novel insights into the crystal structure of IFT172 and identify novel interaction sites between IFT172 and the IFT-A complex members IFT140/IFT144. They suggest a U-box-like domain within the IFT172 C-terminus could play a role in IFT172 auto-ubiquitination as well as for TGFbeta signalling regulation.

      As a number of disease-causing IFT72 sequence variants resulting in mammalian ciliopathy phenotypes in IFT172 have been previously identified in the IFT172 C-terminus, the authors also investigate the effects of such variants on auto-ubiquitination. This revealed no mutational effect on mono-ubiquitination which the authors suggest could be independent of the U-box-like domain but reduced overall IFT172 ubiquitination.

      Strengths:

      The manuscript is clear and well written and experimental data is of high quality. The findings provide novel insights into IFT172 function, IFT complex-A and B interactions, and they offer novel potential mechanisms that could contribute to the phenotypes associated with IFT172 C-terminal ciliopathy variants.

      Weaknesses:

      Some suggestions/questions are included in the comments to the authors below.

    4. Reviewer #3 (Public review):

      Summary:

      Zacharia et al report on the molecular function of the C-terminal domain of the intraflagellar transport IFT-B complex component IFT172 by structure determination and biochemical in vitro and cell culture-based assays. The authors identify an IFT-A binding site that mediates a mutually exclusive interaction to two different IFT-A subunits, IFT144 and IFT140, consistent with interactions suggested in anterograde and retrograde IFT trains by previous cryo-electron tomography studies. Additionally, the authors identify a U-box-like domain that binds ubiquitin and conveys ubiquitin conjugation activity in the presence of the UbcH5a E2 enzyme in vitro. RPE1 cell lines that lack the U-box domain show a reduction in ciliation rate with shorter cilia, and heterozygous cells manifest TGF-beta signaling defects, suggesting an involvement of the U-box domain in cilium-dependent signaling.

      Strengths:

      (1) The structural analyses of the C-terminal domain of IFT172 combine crystallography with structure prediction using state-of-the-art algorithms, which gives high confidence in the presented protein structures. The structure-based predictions of protein interactions are validated by further biochemical experiments to assess the specific binding of the IFT172 C-terminal domains with other proteins.

      (2) The finding that the IFT172 C-terminus interactions with the IFT-A components IFT140 and IFT144 appear mutually exclusive confirm a suggested role in mediating the binding of IFT-B to IFT-A in anterograde and retrograde IFT trains, which is of very high scientific value.

      (3) The suggested molecular mechanism of IFT train coordination explains previous findings in Chlamydomonas IFT172 mutants, in particular an IFT172 mutant that appeared defective in retrograde IFT, as well as mutations identified in ciliopathy patients.

      (4) The identification of other IFT172 interactors by unbiased mass spectrometry-based proteomics is very exciting. Analysis of stoichiometries between IFT components suggests that these interactors could be part of IFT trains, either as cargos or additional components that may fulfill interesting functions in cilia and flagella.

      (5) The authors unexpectedly identify a U-box-like fold in the IFT172 C-terminus and thoroughly dissect it by sequence and mutational analyses to reveal unexpected ubiquitin binding and potential intrinsic ubiquitination activity.

      (6) The overall data quality is very high. The use of IFT172 proteins from different organisms suggests a conserved function.

      Weaknesses:

      (1) Interaction studies were carried out by pulldown experiments, which identified more IFT172 interaction partners. Whether these interactions can be seen in living cells remains to be elucidated in subsequent studies.

      (2) The cell culture-based experiments in the IFT172 mutants are exciting and show that the U-box domain is important for protein stability and point towards involvement of the U-box domain in cellular signaling processes. However, the characterization of the generated cell lines falls behind the very rigorous analysis of other aspects of this work.

      Overall, the authors achieved to characterize an understudied protein domain of the ciliary intraflagellar transport machinery and gained important molecular insights into its role in primary cilia biology, beyond IFT. By identifying an unexpected functional protein domain and novel interaction partners the work makes an important contribution to further our understanding of how ciliary processes might be regulated by ubiquitination on a molecular level. Based on this work it will be important for future studies in the cilia community to consider direct ubiquitin binding by IFT complexes.

      Conceptually, the study highlights that protein transport complexes can exhibit additional intrinsic structural features for potential auto-regulatory processes. Moreover, the study adds to the functional diversity of small U-box and ubiquitin-binding domains, which will be of interest to a broader cell biology and structural biology audience.

      Additional comments:

      The authors investigate the consequences of the U-box deletion on ciliary TGF-beta signaling. While a cilium-dependent effect of TGF-beta signaling on the phosphorylation of SMAD2 has been demonstrated, the precise function of cilia in AKT signaling has not been fully established in the field. Therefore, the relevance of this finding is somewhat unclear. It may help to discuss relevant literature on the topic, such as Shim et al., PNAS, 2020.

    5. Author response:

      Reviewer #1:

      Weaknesses:

      (1) The crystal structure of HsIFT172c reveals a single globular domain formed by the last three TPR repeats and C-terminal residues of IFT172. However, the authors subdivide this globular domain into TPR, linker, and U-box-like regions that they treat as separate entities throughout the manuscript. This is potentially misleading as the U-box surface that is proposed to bind ubiquitin or E2 is not surface accessible but instead interacts with the TPR motifs. They justify this approach by speculating that the presented IFT172c structure represents an autoinhibited state and that the U-box-like domain can become accessible following phosphorylation. However, additional evidence supporting the proposed autoinhibited state and the potential accessibility of the U-box surface following phosphorylation is needed, as it is not tested or supported by the current data.

      We thank the reviewer for this comment. IFT172C contains TPR region and Ubox-like region which are admittedly tightly bound to each other. While there is a possibility that this region functions and exists as one domain, below are the reasons why we chose to classify these regions as two different domains.

      (1) TPR and Ubox-like regions are two different structural classes

      (2) TPR region is linked to Ubox-like region via a long linker which seems poised to regulate the relative movement between these regions.

      (3) Many ciliopathy mutations are mapped to the interface of TPR region and the Ubox region hinting at a regulatory mechanism governed by this interface.

      (2) While in vitro ubiquitination of IFT172 has been demonstrated, in vivo evidence of this process is necessary to support its physiological relevance.

      We thank the reviewer for this comment. We are currently working on identifying the substrates of IF172 to reveal the physiological relevant of its ubiquitination activity.

      (3) The authors describe IFT172 as being autoubiquitinated. However, the identified E2 enzymes UBCH5A and UBCH5B can both function in E3-independent ubiquitination (as pointed out by the authors) and mediate ubiquitin chain formation in an E3-independent manner in vitro (see ubiquitin chain ladder formation in Figure 3A). In addition, point mutation of known E3-binding sites in UBCH5A or TPR/U-box interface residues in IFT172 has no effect on the mono-ubiquitination of IFT172c1. Together, these data suggest that IFT172 is an E3-independent substrate of UBCH5A in vitro. The authors should state this possibility more clearly and avoid terminology such as "autoubiquitination" as it implies that IFT172 is an E3 ligase, which is misleading. Similarly, statements on page 10 and elsewhere are not supported by the data (e.g. "the low in vitro ubiquitination activity exhibited by IFT172" and "ubiquitin conjugation occurring on HsIFT172C1 in the presence of UBCH5A, possibly in coordination with the IFT172 U-box domain").

      We now consider this possibility and tone down our statements about the autoubiquitination activity of IFT172 in a revised version of the manuscript.

      (4) Related to the above point, the conclusion on page 11, that mono-ubiquitination of IFT172 is U-box-independent while polyubiquitination of IFT172 is U-box-dependent appears implausible. The authors should consider that UBCH5A is known to form free ubiquitin chains in vitro and structural rearrangements in F1715A/C1725R variants could render additional ubiquitination sites or the monoubiquitinated form of IFT172 inaccessible/unfavorable for further processing by UBCH5A.

      We now consider this possibility and tone down our statements about the autoubiquitination activity of IFT172 in the conclusion on pg. 11.

      (5) Identification of the specific ubiquitination site(s) within IFT172 would be valuable as it would allow targeted mutation to determine whether the ubiquitination of IFT172 is physiologically relevant. Ubiquitination of the C1 but not the C2 or C3 constructs suggests that the ubiquitination site is located in TPRs ranging from residues 969-1470. Could this region of TPR repeats (lacking the IFT172C3 part) suffice as a substrate for UBCH5A in ubiquitination assays?

      We thank the reviewer for raising this important point about ubiquitination site identification. While not included in our manuscript, we did perform mass spectrometry analysis of ubiquitination sites using wild-type IFT172 and several mutants (P1725A, C1727R, and F1715A). As shown in the figure below, we detected multiple ubiquitination sites across these constructs. The wild-type protein showed ubiquitination at positions K1022, K1237, K1271, and K1551, while the mutants displayed slightly different patterns of modification. However, we should note that the MS intensity signals for these ubiquitinated peptides were relatively low compared to unmodified peptides, making it difficult to draw strong conclusions about site specificity or physiological relevance.

      Author response image 1.

      These results align with the reviewer's suggestion that ubiquitination occurs within the TPR-containing region. However, given the technical limitations of the MS analysis and the potential for E3-independent ubiquitination by UBCH5A, we have taken a conservative approach in interpreting these findings.

      (6) The discrepancy between the molecular weight shifts observed in anti-ubiquitin Western blots and Coomassie-stained gels is noteworthy. The authors show the appearance of a mono-ubiquitinated protein of ~108 kDa in anti-ubiquitin Western blots. However, this molecular weight shift is not observed for total IFT172 in the corresponding Coomassie-stained gels (Figures 3B, D, F). Surprisingly, this MW shift is visible in an anti-His Western blot of a ubiquitination assay (Fig 3C). Together, this raises the concern that only a small fraction of IFT172 is being modified with ubiquitin. Quantification of the percentage of ubiquitinated IFT172 in the in vitro experiments could provide helpful context.

      We do acknowledge in the manuscript is that the conjugation of ubiquitins to IFT172C is weak (Page 16). Future experiments of identification of potential substrates and its implications in ciliary regulation will provide further context to our in vitro ubiquitination experiments.

      (7) The authors propose that IFT172 binds ubiquitin and demonstrate that GST-tagged HsIFT172C2 or HsIFT172C3 can pull down tetra-ubiquitin chains. However, ubiquitin is known to be "sticky" and to have a tendency for weak, nonspecific interactions with exposed hydrophobic surfaces. Given that only a small proportion of the ubiquitin chains bind in the pull-down, specific point mutations that identify the ubiquitin-binding site are required to convincingly show the ubiquitin binding of IFT172.

      (8) The authors generated structure-guided mutations based on the predicted Ub-interface and on the TPR/U-box interface and used these for the ubiquitination assays in Fig 3. These same mutations could provide valuable insights into ubiquitin binding assays as they may disrupt or enhance ubiquitin binding (by relieving "autoinhibition"), respectively. Surprisingly, two of these sites are highlighted in the predicted ubiquitin-binding interface (F1715, I1688; Figure 4E) but not analyzed in the accompanying ubiquitin-binding assays in Figure 4.

      We agree that these mutations could provide insights into ubiquitin binding by IFT172. We are currently pursuing further mutagenesis studies on the IFT172-Ub interface based on the AF model. We however have evaluated the ubiquitin binding activity of the mutant F1715A using similar pulldowns, which showed no significant impact for the mutation on the ubiquitin binding activity of IFT172. We are yet to evaluate the impact of alternate amino acid substitutions at these positions. The I1688 mutants we cloned could not be expressed in soluble form, thus could not be used for testing in ubiquitination activity or ubiquitin binding assays.

      (9) If IFT172 is a ubiquitin-binding protein, it might be expected that the pull-down experiments in Figure S1 would identify ubiquitin, ubiquitinated proteins, or E2 enzymes. These were not observed, raising doubt that IFT172 is a ubiquitin-binding protein.

      It is likely that IFT172 only binds ubiquitin with low affinity as indicated by our in vitro pulldowns and the AF interface. In our pull down experiment performed using the Chlamy flagella extracts, we have used extensive washes to remove non-specific interactors. This might have also excluded the identification of weak but bona fide interactors of IFT172. Additionally, we have not used any ubiquitination preserving reagents such as NEM in our pulldown buffers, exposing the cellular ubiquitinated proteins to DUB mediated proteolysis further preventing their identification in our pulldown/MS experiment.

      (10) The cell-based experiments demonstrate that the U-box-like region is important for the stability of IFT172 but does not demonstrate that the effect on the TGFb pathway is due to the loss of ubiquitin-binding or ubiquitination activity of IFT172.

      We acknowledge that our current data cannot distinguish whether the TGFβ pathway defects arise from general protein instability or from specific loss of ubiquitin-related functions. Our experiments demonstrate that the U-box-like region is required for both IFT172 stability and proper TGFβ signaling, but we agree that establishing a direct mechanistic link between these phenomena would require additional evidence. We will revise our discussion to more clearly acknowledge this limitation in our current understanding of the relationship between IFT172's U-box region and TGFβ pathway regulation.

      (11) The challenges in experimentally validating the interaction between IFT172 and the UBX-domain-containing protein are understandable. Alternative approaches, such as using single domains from the UBX protein, implementing solubilizing tags, or disrupting the predicted binding interface in Chlamydomonas flagella pull-downs, could be considered. In this context, the conclusion on page 7 that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a direct IFT172 interactor" is incorrect as a prediction of an interaction interface with AF-M does not validate a direct interaction per se.

      We agree with the reviewer that our AlphaFold-Multimer (AF-M) predictions alone do not constitute experimental validation of a direct interaction. We appreciate the reviewer's understanding of the technical challenges in validating this interaction experimentally. We will revise our text to more precisely state that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a potential direct IFT172 interactor" and will discuss the AF-M predictions as computational evidence that suggests, but does not prove, a direct interaction. This more accurately reflects the current state of our understanding of this potential interaction.

      Reviewer #3:

      Weaknesses:

      (1) Interaction studies were carried out by pulldown experiments, which identified more IFT172 interaction partners. Whether these interactions can be seen in living cells remains to be elucidated in subsequent studies.

      We agree with the reviewer that validation of protein-protein interactions in living cells provides important physiological context. While our pulldown experiments have identified several promising interaction partners and the AF-M predictions provide computational support for these interactions, we acknowledge that demonstrating these interactions in vivo would strengthen our findings. However, we believe our current biochemical and structural analyses provide valuable insights into the molecular basis of IFT172's interactions, laying important groundwork for future cell-based studies.

      (2) The cell culture-based experiments in the IFT172 mutants are exciting and show that the U-box domain is important for protein stability and point towards involvement of the U-box domain in cellular signaling processes. However, the characterization of the generated cell lines falls behind the very rigorous analysis of other aspects of this work.

      We thank the reviewer for noting that the characterization of our cell lines could be more rigorous. In the revised manuscript, we will provide additional characterization of the cell lines, including detailed sequencing information and validation data for the IFT172 mutants. This will bring the documentation of our cell-based experiments up to the same standard as other aspects of our work.

    1. eLife Assessment

      This study presents an important finding that has identified 27 differentially methylated regions as a signature for non-invasive early cancer detection and predicting prognosis for colorectal cancer. The findings demonstrate promising clinical potential, particularly for improving cancer screening and patient monitoring. However, the evidence supporting the claims of the authors is incomplete due to a small sample size and some methodological concerns. The work will be of interest to researchers interested in cancer diagnosis or colorectal cancer monitoring.

    2. Reviewer #1 (Public review):

      Summary:

      Colorectal cancer (CRC) is the third most common cancer globally and the second leading cause of cancer-related deaths. Colonoscopy and fecal immunohistochemical testing are among the early diagnostic tools that have significantly enhanced patient survival rates in CRC. Methylation dysregulation has been identified in the earliest stages of CRC, offering a promising avenue for screening, prediction, and diagnosis. The manuscript entitled "Early Diagnosis and Prognostic Prediction of Colorectal Cancer through Plasma Methylation Regions" by Zhu et al. presents that a panel of genes with methylation pattern derived from cfDNA (27 DMRs), serving as a noninvasive detection method for CRC early diagnosis and prognosis.

      Strengths:

      The authors provided evidence that the 27 DMRs pattern worked well in predicting CRC distant metastasis, and the methylation score remarkably increased in stage III-IV.

      Weaknesses:

      The major concerns are the design of DMR screening, the relatively low sensitivity of this DMR pattern in detecting early-stage CRC, the limited size of the cohorts, and the lack of comparison with the traditional diagnosis test.

    3. Reviewer #2 (Public review):

      This work presents a 27-region DMR model for early diagnosis and prognostic prediction of colorectal cancer using plasma methylation markers. While this non-invasive diagnostic and prognostic tool could interest a broad readership, several critical issues require attention.

      Major Concerns:

      (1) Inconsistencies and clarity issues in data presentation

      a) Sample size discrepancies<br /> - The abstract mentions screening 119 CRC tissue samples, while Figure 1 shows 136 tissues. Please clarify if this represents 119 CRC and 17 normal samples.<br /> - The plasma sample numbers vary across sections: the abstract cites 161 samples, Figure 1 shows 116 samples, and the Supplementary Methods mentions 77 samples (13 Normal, 15 NAA, 12 AA, 37 CRC).

      b) Methodological inconsistencies<br /> - The Supplementary Material reports 477 hypermethylated sites from TCGA data analysis (Δβ>0.20, FDR<0.05), but Figure 1 indicates 499 sites.<br /> - The manuscript states that analyzing TCGA data across six cancer types identified 499 CRC-specific methylation sites, yet Figure 1 shows 477. Please also explain the rationale for selecting these specific cancer types from TCGA.<br /> - "404 CRC-specific DMRs" mentioned in the main text while "404 MCBs" in Figure 1, the authors need to clarify if these terms are interchangeable or how MCBs are defined.

      (2) Methodological documentation

      - The Results section requires a more detailed description of marker identification procedures and justification of methodological choices.<br /> - Figure 3 panels need reordering for sequential citation.

      (3) Quality control and data transparency

      - No quality control metrics are presented for the in-house sequencing data (e.g., sequencing quality, alignment rate, BS conversion rate, coverage, PCA plots for each cohort).<br /> - The analysis code should be publicly available through GitHub or Zenodo.<br /> - At a minimum, processed data should be made publicly accessible to ensure reproducibility.

    4. Reviewer #3 (Public review):

      Summary:

      This article provides a model for early diagnosis and prognostic prediction of Colorectal Cancer and demonstrates its accuracy and usability. However, there are still some minor issues that need to be revised and paid attention to.

      Strengths:

      A large amount of external datasets were used for verification, thus demonstrating robustness and accuracy. Meanwhile, various influencing factors of multiple samples were taken into account, providing usability.

      Weaknesses:

      There are notable language issues that hinder readability, as well as a lack of some key conclusions provided.

    1. eLife Assessment

      This study presents a valuable and simplified classification system for predicting clinical outcomes in RPLS patients. The evidence supporting the claims of the authors is solid, although the elaboration of the marker selection process would have strengthened the study. The work will be of interest to scientists working in the field of retroperitoneal liposarcoma.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Xiao et al. classified retroperitoneal liposarcoma (RPLS) patients into two subgroups based on whole transcriptome sequencing of 88 patients. The G1 group was characterized by active metabolism, while the G2 group exhibited high scores in cell cycle regulation and DNA damage repair. The G2 group also displayed more aggressive molecular features and had worse clinical outcomes compared to G1. Using a machine learning model, the authors simplified the classification system, identifying LEP and PTTG1 as the key molecular markers distinguishing the two RPLS subgroups. Finally, they validated these markers in a larger cohort of 241 RPLS patients using immunohistochemistry. Overall, the manuscript is clear and well-organized, with its significance rooted in the large sample size and the development of a classification method.

      Weakness:

      (1) While the authors suggest that LEP and PTTG1 serve as molecular markers for the two RPLS groups, the process through which these genes were selected remains unclear. The authors should provide a detailed explanation of the selection process.

      (2) To ensure the broader applicability of LEP and PTTG1 as classification markers, the authors should validate their findings in one or two external datasets.

      (3) Since molecular subtyping is often used to guide personalized treatment strategies, it is recommended that the authors evaluate therapeutic responses in the two distinct groups. Additionally, they should validate these predictions using cell lines or primary cells.

    3. Reviewer #2 (Public review):

      Surgical resection remains the most effective treatment for retroperitoneal liposarcoma. However, postoperative recurrence is very common and is considered the main cause of disease-related death. Considering the importance and effectiveness of precision medicine, the identification of molecular characteristics is particularly important for the prognosis assessment and individualized treatment of RPLS. In this work, the authors described the gene expression map of RPLS and illustrated an innovative strategy of molecular classification. Through the pathway enrichment of differentially expressed genes, characteristic abnormal biological processes were identified, and RPLS patients were simply categorized based on the two major abnormal biological processes. Subsequently, the classification strategy was further simplified through nonnegative matrix factorization. The authors finally narrowed the classification indicators to two characteristic molecules LEP and PTTG1, and constructed novel molecular prognosis models that presented obviously a great area under the curve. A relatively interpretable logistic regression model was selected to obtain the risk scoring formula, and its clinical relevance and prognostic evaluation efficiency were verified by immunohistochemistry. Recently, prognostic model construction has been a hot topic in the field of oncology. The interesting point of this study is that it effectively screened characteristic molecules and practically simplified the typing strategy on the basis of ensuring high matching clinical relevance. Overall, the study is well-designed and will serve as a valuable resource for RPLS research.

    1. eLife Assessment

      This work presents a valuable extension of qFit-ligand, a computational method for modeling conformational heterogeneity of ligands in X-ray crystallography and cryo-EM density maps. The evidence presented for improved capabilities through careful validation against the previous version, notably in expanding ligand sampling within the conformational space, is solid yet still incomplete. The enhanced methodology demonstrates practical utility for challenging applications, including macrocyclic compound modeling and crystallographic drug fragment screening.

    2. Reviewer #1 (Public review):

      Summary:

      Flowers et al describe an improved version of qFit-ligand, an extension of qFit. qFit and qFit-ligand seek to model conformational heterogeneity of proteins and ligands, respectively, cryo-EM and X-ray (electron) density maps using multi-conformer models - essentially extensions of the traditional alternate conformer approach in which substantial parts of the protein or ligand are kept in place. By contrast, ensemble approaches represent conformational heterogeneity through a superposition of independent molecular conformations.

      The authors provide a clear and systematic description of the improvements made to the code, most notably the implementation of a different conformer generator algorithm centered around RDKit. This approach yields modest improvements in the strain of the proposed conformers (meaning that more physically reasonable conformations are generated than with the "old" qFit-ligand) and real space correlation of the model with the experimental electron density maps, indicating that the generated conformers also better explain the experimental data than before. In addition, the authors expand the scope of ligands that can be treated, most notably allowing for multi-conformer modeling of macrocyclic compounds.

      Strengths:

      The manuscript is well written, provides a thorough analysis, and represents a needed improvement of our collective ability to model small-molecule binding to macromolecules based on cryo-EM and X-ray crystallography, and can therefore have a positive impact on both drug discovery and general biological research.

      Weaknesses:

      There are several points where the manuscript needs clarification in order to better understand the merits of the described work. Overall the demonstrated performance gains are modest (although the theoretical ceiling on gains in model fit and strain energy are not clear!).

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Flowers et al. aimed to enhance the accuracy of automated ligand model building by refining the qFit-ligand algorithm. Recognizing that ligands can exhibit conformational flexibility even when bound to receptors, the authors developed a bioinformatic pipeline to model alternate ligand conformations while improving fitting and more energetically favorable conformations.

      Strengths:

      The authors present a computational pipeline designed to automatically model and fit ligands into electron density maps, identifying potential alternative conformations within the structures.

      Weaknesses:

      Ligand modeling, particularly in cases of poorly defined electron density, remains a challenging task. The procedure presented in this manuscript exhibits clear limitations in low-resolution electron density maps (resolution > 2.0 Å) and low-occupancy scenarios, significantly restricting its applicability. Considering that the maps used to establish the operational bounds of qFit-ligand were synthetically generated, it's likely that the resolution cutoff will be even stricter when applied to real-world data.<br /> The reported changes in real-space correlation coefficients (RSCC) are not substantial, especially considering a cutoff of 0.1. Furthermore, the significance of improvements in the strain metric remains unclear. A comprehensive analysis of the distribution of this metric across the Protein Data Bank (PDB) would provide valuable insights.<br /> To mitigate the risk of introducing bias by avoiding real strained ligand conformations, the authors should demonstrate the effectiveness of the new procedure by testing it on known examples of strained ligand-substrate complexes.

    1. eLife Assessment

      This important study uses recently developed EEG analysis methods to investigate spatial distractor suppression in a combined visual search/working memory task. The reported results are compelling, although they are open to multiple interpretations. The study will be of interest to cognitive neuroscientists and psychologists working on visual attention and memory.

    2. Reviewer #1 (Public review):

      Summary:

      The authors tested whether learning to suppress (ignore) salient distractors (e.g., a lone colored nontarget item) via statistical regularities (e.g., the distractor is more likely to appear in one location than any other) was proactive (prior to paying attention to the distractor) or reactive (only after first attending the distractor) in nature. To test between proactive and reactive suppression the authors relied on a recently developed and novel technique designed to "ping" the brain's hidden priority map using EEG inverted encoding models. Essentially, a neutral stimulus is presented to stimulate the brain, resulting in activity on a priority map which can be decoded and used to argue when this stimulation occurred (prior to or after attending a distracting item). The authors found evidence that despite learning to suppress the high probability distractor location, the suppression was reactive, not proactive in nature.

      Overall, the manuscript was well-written, tests a timely question, and provides novel insight into a long-standing debate concerning distractor suppression.

      The authors provided a thorough rebuttal and addressed the previous critiques and concerns.

      Strengths (in no particular order):<br /> (1) The manuscript is well-written, clear, and concise (especially given the complexities of the method and analyses).<br /> (2) The presentation of the logic and results is clear and relatively easy to digest.<br /> (3) This question concerning whether location-based distractor suppression is proactive or reactive in nature is a timely question.<br /> (4) The use of the novel "pinging" technique is interesting and provides new insight into this particularly thorny debate over the mechanisms of distractor suppression.

      Weaknesses (in no particular order):

      After revision, the prior weaknesses have been largely addressed.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate the mechanisms supporting learning to suppress distractors at predictable locations, focusing on proactive suppression mechanisms manifesting before the onset of a distractor. They used EEG and inverted encoding models (IEM). The experimental paradigm alternates between a visual search task and a spatial memory task, followed by a placeholder screen acting as a 'ping' stimulus -i.e., a stimulus to reveal how learned distractor suppression affects hidden priority maps. Behaviorally, their results align with the effects of statistical learning on distractor suppression. Contrary to the proactive suppression hypothesis, which predicts reduced memory-specific tuning of neural representations at the expected distractor location, their IEM results indicate increased tuning at the high-probability distractor location following the placeholder and prior to the onset of the search display.

      Strengths:

      Overall, the manuscript is well-written and clear, and the research question is relevant and timely, given the ongoing debate on the roles of proactive and reactive components in distractor processing. The use of a secondary task and EEG/IEM to provide a direct assessment of hidden priority maps in anticipation of a distractor is, in principle, a clever approach. The study also provides behavioral results supporting prior literature on distractor suppression at high-probability locations.

      Weaknesses:

      In response to my comments during the first review, the authors have clarified and further discussed several methodological aspects, limitations, and alternative interpretations, tempering some of their claims and, overall, improving the manuscript. These involved mostly broadening the introduction and discussion of the putative mechanisms in distractor suppression, evaluating alternative explanations due to the dual-task design, clarifying methodological details regarding the inverted encoding model, and discussing the possibility that proactive suppression might actually require enhanced tuning toward the expected feature. While, to some degree, the results may still remain open to alternative explanations, the study, in its current form, presents an interesting paradigm and promising findings that will undoubtedly be useful for future research. I therefore have no major remaining comments.

    4. Reviewer #3 (Public review):

      Summary:

      In this experiment, the authors use a probe method along with time-frequency analyses to ascertain the attentional priority map prior to a visual search display in which one location is more likely to contain a salient distractor.  The main finding is that neural responses to the probe indicate that the high probability location is attended, rather than suppressed, prior to the search display onset.  The authors conclude that suppression of distractors at high probability locations is a result of reactive, rather than proactive, suppression.

      Strengths:

      This was a creative approach to a difficult and important question about attention.  The use of this "pinging" method to assess the attentional priority map has a lot of potential value for a number of questions related to attention and visual search. Here as well, the authors have used it to address a question about distractor suppression that has been the subject of competing theories for many years in the field. The authors have also conducted additional behavioral analyses to examine the relationship between memory and search. The paper is well-written, and the authors have done a good job placing their data in the larger context of recent findings in the field.

      Weaknesses:

      The authors addressed a number of weaknesses in a thorough revision during the review process. The present study raises important questions for future research - this is not a weakness, since one study cannot answer all questions, but points to the importance of the questions raised by this study and the value of additional future research in the area.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors tested whether learning to suppress (ignore) salient distractors (e.g., a lone colored nontarget item) via statistical regularities (e.g., the distractor is more likely to appear in one location than any other) was proactive (prior to paying attention to the distractor) or reactive (only after first attending the distractor) in nature. To test between proactive and reactive suppression the authors relied on a recently developed and novel technique designed to "ping" the brain's hidden priority map using EEG inverted encoding models. Essentially, a neutral stimulus is presented to stimulate the brain, resulting in activity on a priority map which can be decoded and used to argue when this stimulation occurred (prior to or after attending to a distracting item). The authors found evidence that despite learning to suppress the high probability distractor location, the suppression was reactive, not proactive in nature.

      Overall, the manuscript is well-written, tests a timely question, and provides novel insight into a long-standing debate concerning distractor suppression.

      Strengths (in no particular order):

      (1) The manuscript is well-written, clear, and concise (especially given the complexities of the method and analyses).

      (2) The presentation of the logic and results is mostly clear and relatively easy to digest.

      (3) This question concerning whether location-based distractor suppression is proactive or reactive in nature is a timely question.

      (4) The use of the novel "pinging" technique is interesting and provides new insight into this particularly thorny debate over the mechanisms of distractor suppression.

      Weaknesses (in no particular order):

      (1) The authors tend to make overly bold claims without either A) mentioning the opposing claim(s) or B) citing the opposing theoretical positions. Further, the authors have neglected relevant findings regarding this specific debate between proactive and reactive suppression.

      (2) The authors should be more careful in setting up the debate by clearly defining the terms, especially proactive and reactive suppression which have recently been defined and were more ambiguously defined here.

      (3) There were some methodological choices that should be further justified, such as the choice of stimuli (e.g., sizes, colors, etc.).

      (4) The figures are often difficult to process. For example, the time courses are so far zoomed out (i.e., 0, 500, 100 ms with no other tick marks) that it makes it difficult to assess the timing of many of the patterns of data. Also, there is a lot of baseline period noise which complicates the interpretations of the data of interest.

      (5) Sometimes the authors fail to connect to the extant literature (e.g., by connecting to the ERP components, such as the N2pc and PD components, used to argue for or against proactive suppression) or when they do, overreach with claims (e.g., arguing suppression is reactive or feature-blind more generally).

      We thank the reviewer for their insightful feedback and have made several adjustments to address the concerns raised. To provide a balanced discussion, we tempered our claims about suppression mechanisms and incorporated additional references to opposing theoretical positions, including the signal suppression hypothesis, while clarifying the definitions of proactive and reactive suppression based on recent terminology (Liesefeld et al., 2024). We justified methodological choices, such as the slight size differences between stimuli to achieve perceptual equivalence and the randomization of target and distractor colors to mitigate potential luminance biases. We have revised our figure to enhance figure clarity. Lastly, while our counterbalanced design precluded reliable ERP assessments (e.g., N2pc, PD), we discussed their potential relevance for future research and ensured consistency with the broader literature on suppression mechanisms.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the mechanisms supporting learning to suppress distractors at predictable locations, focusing on proactive suppression mechanisms manifesting before the onset of a distractor. They used EEG and inverted encoding models (IEM). The experimental paradigm alternates between a visual search task and a spatial memory task, followed by a placeholder screen acting as a 'ping' stimulus -i.e., a stimulus to reveal how learned distractor suppression affects hidden priority maps. Behaviorally, their results align with the effects of statistical learning on distractor suppression. Contrary to the proactive suppression hypothesis, which predicts reduced memory-specific tuning of neural representations at the expected distractor location, their IEM results indicate increased tuning at the high-probability distractor location following the placeholder and prior to the onset of the search display.

      Strengths:

      Overall, the manuscript is well-written and clear, and the research question is relevant and timely, given the ongoing debate on the roles of proactive and reactive components in distractor processing. The use of a secondary task and EEG/IEM to provide a direct assessment of hidden priority maps in anticipation of a distractor is, in principle, a clever approach. The study also provides behavioral results supporting prior literature on distractor suppression at high-probability locations.

      Weaknesses:

      (1) At a conceptual level, I understand the debate and opposing views, but I wonder whether it might be more comprehensive to present also the possibility that both proactive and reactive stages contribute to distractor suppression. For instance, anticipatory mechanisms (proactive) may involve expectations and signals that anticipate the expected distractor features, whereas reactive mechanisms contribute to the suppression and disengagement of attention.

      This is an excellent point. Indeed, while many studies, including our own, have tried to dissociate between proactive and reactive mechanisms, as if it is one or the other, the overall picture is arguably more nuanced. We have added a paragraph to the discussion on page 19 to address this. At the same time, (for more details see our responses to your comments 3 and 5), we have added a paragraph where we provide an alternative explanation of the current data in the light of the dual-task nature of our experiment.

      (2) The authors focus on hidden priority maps in pre-distractor time windows, arguing that the results challenge a simple proactive view of distractor suppression. However, they do not provide evidence that reactive mechanisms are at play or related to the pinging effects found in the present paradigm. Is there a relationship between the tuning strength of CTF at the high-probability distractor location and the actual ability to suppress the distractor (e.g., behavioral performance)? Is there a relationship between CTF tuning and post-distractor ERP measures of distractor processing? While these may not be the original research questions, they emerge naturally and I believe should be discussed or noted as limitations.

      Thank you for raising these important points. While CTF slopes have been shown to provide spatially and temporally resolved tracking of covert spatial attention and memory representations at the group level, to the best of our knowledge, no study to date has found a reliable correlation between CTFs and behavior. Moreover, the predictive value of the learned suppression effect, while also highly reliable at the group level, has been proven to be limited when it comes to individual-level performance (Ivanov et al. 2024; Hedge et al., 2018). Nevertheless, based on your suggestion, we explored whether there was a correlation between the averaged gradient slope within the time window where the placeholder revived the memory representation and the average distance slope in reaction times for the learned suppression effect. This correlation was not significant (r = .236, p = 0.267), which, considering our sample size and the reasons mentioned earlier, is not particularly surprising. Given that our sample size was chosen to measure group level effects, we decided not to include individual differences analysis it in the manuscript.

      Regarding the potential link between the CTF tuning profile and post-distractor ERP measures like N2pc and Pd, our experimental design presented a specific challenge. To reliably assess lateralized ERP components like N2pc or Pd the high probability location must be restricted to static lateralized positions (e.g., on the horizontal midline). Our counterbalanced design (see also our response to comment 9 by reviewer 1), which was crucial to avoid bias in spatial encoding models, precluded such a targeted ERP analysis.

      (3) How do the authors ensure that the increased tuning (which appears more as a half-split or hemifield effect rather than gradual fine-grained tuning, as shown in Figure 5) is not a byproduct of the dual-task paradigm used, rather than a general characteristic of learned attentional suppression? For example, the additional memory task and the repeated experience with the high-probability distractor at the specific location might have led to longer-lasting and more finely-tuned traces for memory items at that location compared to others.

      Thank you for raising these important points. Indeed, a unique aspect of our study that sets it apart from other studies, is that the effects of learned suppression were not measured directly via an index of distractor processing, but rather inferred indirectly via tuning towards a location in memory. The critical assumption here, that we now make explicit on page 18, is that various sources of attentional control jointly determine the priority landscape, and this priority landscape can be read out by neutral ping displays. An alternative however, as suggested by the reviewer, is that memory representations may have been sharper when they remembered location was at the high probability distractor location. We believe this is unlikely for various reasons. First, at the behavioral level there was no evidence that memory performance differed for positions overlapping high and low probability distractor locations (also see our response to reviewer 3 minor comment 4). Second, there was no hint whatsoever that the memory representation already differed during encoding or maintenance (This is now explicitly indicated in the revised manuscript on page 14), which would have been expected if the spatial distractor imbalance modulated the spatial memory representations.

      Nevertheless, as discussed in more detail in response to comment 5, there is an alternative explanation for the observed gradient modulation that may be specific to the dual nature of our experiment.

      (4) It is unclear how IEM was performed on total vs. evoked power, compared to typical approaches of running it on single trials or pseudo-trials.

      Thank you for pointing out that our methods were not clear. We did not run our analysis on single trials because we were interested in separately examining the spatial selectivity of both evoked alpha power (phase locked activity aligned with stimulus onset) and total alpha power (all activity regardless of signal phase). It is only possible to calculate evoked and total power when averaging across trials. Thus, when we partitioned the data into sets for the IEM analysis, we averaged trials for each condition/stimulus location to obtain a measurement of evoked and total power each condition for each set. This is the same approach used in previous work (e.g. Foster et al., 2016; van Moorselaar et al., 2018).

      We reviewed our method section and can see why this was unclear. In places, we had incorrectly described the dimensions of training and test data as electrodes x trials. To address this, we’ve rewritten the “Time frequency analysis”, “Inverted encoding model” sections, and added a new “Training and test data” section. We hope that these sections are easier to follow.

      (5) Following on point 1. What is the rationale for relating decreased (but not increased) tuning of CTF to proactive suppression? Could it be that proactive suppression requires anticipatory tuning towards the expected feature to implement suppression? In other terms, better 'tuning' does not necessarily imply a higher signal amplitude and could be observable even under signal suppression. The authors should comment on this and clarify.

      We appreciate your highlighting of these highly relevant alternative explanations. In response, we have revised a paragraph in the General Discussion on page 18 to explicitly outline our rationale for associating decreased tuning with proactive suppression. However, in doing so, we now also consider the alternative perspective that proactive suppression might actually require enhanced tuning towards the expected feature to implement suppression effectively.

      It's important to note that both of these interpretations – decreased tuning as a sign of suppression and increased tuning as a preparatory mechanism for suppression – diverge significantly from the commonly held model (including our own initial assumptions) wherein weights at the to-be-suppressed location are simply downregulated.

      Minor:

      (1) In the Word file I reviewed, there are minor formatting issues, such as missing spaces, which should be double-checked.

      Thank you! We have now reviewed the text thoroughly and tried our best to avoid formatting issues.

      (2) Would the authors predict that proactive mechanisms are not involved in other forms of attention learning involving distractor suppression, such as habituation?

      Habituation is a form of non-associative learning where the response to a repetitive stimulus decreases over time. As such, we would not characterize these changes as “proactive”, as it only occurs following the (repeated) exposure to the stimulus. 

      (3) A clear description in the Methods section of how individual CTFs for each location were derived would help in understanding the procedure.

      Thank you. We have now added several sentences on page 27 to clarify how individual CTFs in Figure 3 and distance CTFs in Figure 5 are calculated.

      “The derived channel responses (8 channels × 8 location bins) were then used for the following analyses: (a) calculating individual Channel Tuning Functions (CTFs) based on each of the eight physical location bins (e.g., Figure 3C and 3D); (b) grouping responses according to the distance between each physical location and the high-probability distractor location to calculate distance CTFs (e.g., Figure 5); and (c) averaging across location bins to represent the general strength of spatial selectivity in tracking the memory cue, irrespective of its specific location (e.g., Figure 3A and 3B).”

      (4) Why specifically 1024 resampling iterations?

      Thank you for your question. The statistical analysis was conducted using the permutation_cluster_1samp_test function within the MNE package in Python. We have clarified this on page 25. The choice of 1024 permutations reflects the default setting of the function, which is generally considered sufficient for robust non-parametric statistical testing. This number provides a balance between computational efficiency and the precision of p-value estimation in the context of our analyses.

      Reviewer #3 (Public Review):

      Summary:

      In this experiment, the authors use a probe method along with time-frequency analyses to ascertain the attentional priority map prior to a visual search display in which one location is more likely to contain a salient distractor.  The main finding is that neural responses to the probe indicate that the high probability location is attended, rather than suppressed, prior to the search display onset.  The authors conclude that suppression of distractors at high-probability locations is a result of reactive, rather than proactive, suppression.

      Strengths:

      This was a creative approach to a difficult and important question about attention.  The use of this "pinging" method to assess the attentional priority map has a lot of potential value for a number of questions related to attention and visual search. Here as well, the authors have used it to address a question about distractor suppression that has been the subject of competing theories for many years in the field. The paper is well-written, and the authors have done a good job placing their data in the larger context of recent findings in the field.

      Weaknesses:

      The link between the memory task and the search task could be explored in greater detail. For example, how might attentional priority maps change because of the need to hold a location in working memory? This might limit the generalizability of these findings. There could be more analysis of behavioral data to address this question. In addition, the authors could explore the role that intertrial repetition plays in the attentional priority map as these factors necessarily differ between conditions in the current design. Finally, the explanation of the CTF analyses in the results could be written more clearly for readers who are less familiar with this specific approach (which has not been used in this field much previously).

      We appreciate the reviewer's valuable feedback and have made significant revisions to address the concerns raised. To clarify the connection between the memory and search tasks, we conducted additional analyses to explore the effects of spatial distance between the memory cue location and the high-probability distractor location on behavioral performance. We also investigated the potential influence of intertrial repetition effects on the observed results by removing trials with location repetitions. To enhance clarity, we revised the explanation of the CTF analyses in the Results section and improved figure annotations to ensure accessibility for readers unfamiliar with this approach. Collectively, these updates further discuss how the pattern of CTF slopes reflect the interplay between memory and search tasks while addressing key methodological and interpretative considerations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions/Critiques (in no particular order)

      (1) The authors discuss the tripartite model (bottom-up, top-down, and selection history) but neglect recent and important discussions of why this trichotomy might be unnecessarily complicated (e.g., Anderson, 2024: Trichotomy revisited: A monolithic theory of attentional control). Simply put, one of the 3 pillars (i.e., selection history) likely does not fall into a unitary construct or "box"; instead, it likely contains many subcomponents (e.g., reward associations, stimulus-response habit learning, statistical learning, etc.). Since the focus of the current study is learned distractor suppression based on the statistical regularities of the distractor, the authors should comment on which aspects of selection history are relevant, perhaps by using this monolithic framework.

      We appreciate the reviewer's insightful suggestion regarding theoretical frameworks of attentional control. While Anderson (2024) proposes a monolithic theory that challenges the traditional tripartite model, our study deliberately maintains a pragmatic approach. The main purpose of our experiment is empirically investigating the mechanisms of learned distractor suppression, rather than adjudicating between competing theoretical models.

      We agree that selection history is not a unitary construct but comprises multiple subcomponents, including reward associations, stimulus-response habit learning, and statistical learning. In this context, our study specifically focuses on statistical learning as a key mechanism of distractor suppression. By explicitly acknowledging the multifaceted nature of selection history and referencing Anderson's monolithic perspective, we invite readers to consider the theoretical implications while maintaining our research's primary focus on empirical investigation. To this end, we have modified the manuscript to read (see page 3):

      "The present study investigates the mechanisms underlying statistical learning, specifically learned distractor suppression, which represents one critical subcomponent of selection history. While theoretical models like the tripartite framework and the recent monolithic theory (Anderson, 2024) offer complementary perspectives on attentional control, our investigation focuses on empirically characterizing the statistical learning mechanisms underlying learned distractor suppression."

      (2) The authors discuss previous demonstrations of location-based and feature-based learned distractor suppression. The authors admit that there have been a large number of studies but seem to mainly cite those that were conducted by the authors themselves (with the exception being Vatterott & Vecera, 2012). For example, there are other studies investigating location-based suppression (Feldmann-Wüstefeld et al., 2021; Sauter et al., 2021), feature-based suppression (Gaspelin & Luck, 2018a; Stilwell et al., 2022; Stilwell & Gaspelin, 2021; Vatterott et al., 2018), or both (Stilwell et al., 2019). The authors do not cite Gaspelin and colleagues at all in the manuscript, despite claiming that singleton-based suppression is not proactive.

      We appreciate your pointing out the need for a more comprehensive citation of the literature on learned distractor suppression, particularly with respect to location-based and feature-based suppression. In response to your comment, we have now expanded the reference list on page 4 to include relevant studies that further support our discussion of both location-based and feature-based suppression mechanisms.

      (3) The authors use the terms "proactive" and "reactive" suppression without taking into consideration the recent terminology paper, which one of the current authors, Theeuwes, helped to write (Liesefeld et al., 2024, see Figure 8). The terms proactive and reactive suppression need to be defined relative to a time point. The authors need to be careful in defining proactive suppression as prior to the first shift of attention, but after the stimuli appear and reactive suppression as after the first shift of attention and after the stimuli appear. Thus, the critical time point is the first shift of attention. Does suppression occur before or after the first shift of attention? The authors could alleviate this by using the term "stimulus-triggered suppression" to refer to "suppression that occurs after the distractor appears and before it captures attention" (Liesefeld et al., 2024).

      Thank you for pointing out that this was insufficiently clear in the previous version. In the revised version we specifically refer to the recent terminology paper on page 5 to make clear that suppression could theoretically occur at three distinct moments in time, and that the present paper was designed to dissociate between suppression before or after the first shift of attention.

      (4) Could the authors justify why the circle stimulus (2° in diameter) was smaller than the diamonds (2.3° x 2.3°)? Are the stimuli equated for the area? Or, for width and height? Doesn't this create a size singleton target on half of all trials (whenever the target is a circle) in addition to the lone circle being a shape singleton? Along these lines, could the authors justify why the colors were used and not equiluminant? This version of red is much brighter than this version of green if assessed by a spectrophotometer. Thus, there are sensory imbalances between the colors. Further, the grey used as the ping is likely not equiluminant to both colors. Thus, the grey "ping" is likely dimmer for red items but brighter for green items. Is this a fair "ping"?

      Thank you for raising these important points. We chose, as is customary in this experimental paradigm (e.g., Huang et al., 2023; Duncan et al., 2023), to make the diamond slightly larger (2.3° x 2.3°) than the circle (2° in diameter) to ensure a better visual match in overall size appearance. If the circle and diamond stimuli were equated strictly in terms of size (both at 2°), the diamond would appear visually smaller due to the differences in geometric shape. By adjusting the dimensions slightly, we aimed to minimize any unintentional differences in perceptual salience.

      As for the colors used in the experiment, the reviewer is right that there might be sensory imbalances between the red and green stimuli, with red appearing brighter than green based on measurements such as spectrophotometry. To ensure that any effects couldn’t be explained by sensory imbalance in the displays, we randomized target and distractor colors across trials, meaning that roughly half the trials had a red distractor and half had a green distractor. This randomization should have mitigated any systematic biases caused by color differences.

      We appreciate your feedback and have clarified these points in method section in the revised manuscript on page 22:

      "Please note that although the colors were not equiluminant, the target and distractor colors were randomized across trials such that roughly half the trials had a red distractor, and half had a green distractor. This randomization process should help mitigate any systematic biases this may cause."

      (5) For the eye movement artifact rejection, the authors use a relatively liberal rejection routine (i.e., allowing for eye movements up to 1.2° visual angle and a threshold of 15 μV). Given that every 3.2 μV deviation in HEOG corresponds to ~ ± 0.1° of visual angle (Lins, et al., 1993), the current oculomotor rejection allows for eye movements between 0.5° and 1.2° visual angle to remain which might allow for microsaccades (e.g., Poletti, 2023) to contaminate the EEG signal (e.g., Woodman & Luck, 2003).

      The reviewer correctly points out that our eye rejection procedure, which is the same as in our previous work (e.g., Duncan et al., 2023), still allows for small, but systematic biases in eye position towards the remembered location and potentially towards or away from the high probability distractor location. While we cannot indefinitely exclude this possibility, we believe this is unlikely for the following reasons. First, although there is a link between microsaccades and covert attention, it has been demonstrated that subtle biases in eye position cannot explain the link between alpha activity and the content of spatial WM (Foster et al., 2016, 2017). Specifically, Foster et al. (2017) found no evidence for a gaze-position-related CTF, while an analysis on that same data yielded clear target related CTFs. Similarly, within the present data set there was no evidence that the observed revival induced by the ping display could be attributed to systematic changes in gaze position, as a multivariate cross-session decoding analysis with x,y positions from the tracker did not yield reliable above-chance decoding of the location in memory.

      Author response image 1.

      (6) The authors claim that "If the statistically learned suppression was spatial-based and feature-blind, one would also expect impaired target processing at the high-probability location." (p. 7, lines 194-195). Why is it important that suppression is feature-blind here? Further, is this a fair test of whether suppression is feature-blind? What about inter-trial priming of the previous trial? If the previous trial's singleton color repeated RTs might be faster than if it switched. In other words, the more catastrophic the interference (the target shape, target color, distractor shape, distractor color) change between trials, the more RTs might slow (compared with consistencies between trials, such that the target and distractor shapes repeat and the target and distractor colors repeat). Lastly, given the variability across both the shape and color dimensions, the claim that this type of suppression is feature-blind might be an artifact of the design promoting location-based instead of feature-based suppression.

      Thank you for raising this point. In the past we have used the finding that learned suppression was not specific to distractors, but also generalized to targets to argue in favor of proactive (or stimulus triggered) suppression. However, we agree that given the current experimental parameters it may be an oversimplification to conclude that the effect was feature-blind based on the impaired target processing as observed here. As this argument is also not relevant to our main findings, we have removed this interpretation and simply report that the effect was observed for both distractor and targets. Nevertheless, we would like to point out that while inter-trial priming could influence reaction times, the features of both target and distractors (shape and color) were randomly assigned on each trial. This should mitigate consistent feature repetitions effects. Additionally, previous research has demonstrated that suppression effects persist even when immediate feature repetitions are controlled for or statistically accounted for (e.g., Wang & Theeuwes 2018 JEP:HPP; Huang et al., 2021 PB&R).

      (7) The authors should temper claims such as "suppression occurs only following attentional enhancement, indicating a reactive suppression mechanism rather than proactive suppression." (p. 15, lines 353-353). Perhaps this claim may be true in the current context, but this claim is too generalized and not supported, at least yet. Further, "Within the realm of learned distractor suppression, an ongoing debate centers around the question of whether, and precisely when, visual distractors can be proactively suppressed. As noted, the idea that learned spatial distractor suppression is applied proactively is largely based on the finding that the behavioral benefit observed when distractors appear with a higher probability at a given location is accompanied by a probe detection cost (measured via dot offset detection) at the high probability distractor location (Huang et al., 2022, 2023; Huang, Vilotijević, et al., 2021)." (p. 15, lines 355-361). Again, the authors should either cite more of the opposing side of the debate (e.g., the signal suppression hypothesis, Gaspelin & Luck, 2019 or Luck et al., 2021) and the many lines of converging evidence of proactive suppression) or temper the claims.

      Thank you for your constructive feedback regarding our statements on suppression mechanisms. We acknowledge that our original claim was intended to reflect our specific findings within the context of this study and was not meant to generalize across all research in the field. To prevent any misunderstanding, we have tempered our claims to avoid overgeneralization by clarifying that our findings suggest a tendency toward reactive suppression within the specific experimental conditions we investigated (see page 17).

      Furthermore, learned distractor suppression is multifaceted, encompassing both feature-based suppression (as proposed by the signal suppression hypothesis) and spatial-based suppression (as examined in the current study). The signal suppression hypothesis provides proactive evidence related to the suppression of specific feature values (Gaspelin et al., 2019; Gaspelin & Luck, 2018b; Stilwell et al., 2019). We have incorporated references to these studies to offer a more comprehensive perspective on the ongoing debate at a broader level (see page 17).

      (8) "These studies however, mainly failed to find evidence in support of active preparatory inhibition (van Moorselaar et al., 2020, 2021; van Moorselaar & Slagter, 2019), with only one study observing increased preparatory alpha contralateral to the high probability distractor location (Wang et al., 2019)." (p. 15, lines 367-370). This is an odd phrasing to say "many studies" have shown one pattern (citing 3 studies) and "only" one showing the opposite, especially given these were all from the current authors' labs.

      Agreed. We have rewritten this text on page 17.

      “These studies however, failed to find evidence in support of active preparatory inhibition as indexed via increased alpha power contralateral to the high probability distractor location  (van Moorselaar et al., 2020, 2021; van Moorselaar & Slagter, 2019; but see Wang et al., 2019).”

      (9) Could the authors comment on why total power was significantly above baseline immediately (without clearer timing marks, ~10-50 ms) after the onset of the cue (Figure 3)? Is this an artifact of smearing? Further, it appears that there is significant activity (as strong as the evoked power of interest) in the baseline period of the evoked power when the memory item is presented on the vertical midline in the upper visual field (this is also true, albeit weaker, for the memory cue item presented on the horizontal midline to the right). This concern again appears in Figure 4 where the Alpha CTF slope was significantly below or above the baseline prior to the onset of the memory cue. Evoked Alpha was already significantly higher than baseline in the baseline period. In Figure 5, evoked power is already higher and different for the hpl than the lpls even at the memory cue (and before the memory cue onsets). There are often periods of differential overlap during the baseline period, or significant activity in the baseline period or at the onset of the critical, time-locked stimulus array. The authors should explain why this might be (e.g., smearing).

      Thank you for pointing this out. As suggested by the reviewer, this ‘unexpected’ pre-stimulus decoding is indeed the result of temporal smearing induced by our 5th order Butterworth filter. The immediate onset of reliable tuning (sometimes even before stimulus onset) is then also a typical aspect of studies that track tuning profiles across time in the lower frequency bands such as alpha (van Moorselaar & Slagter 2019; van Moorselaar et al., 2020; Foster et al., 2016).

      Indeed, visual inspection also suggests that evoked activity tracked items at the top of the screen, an effect that is unlikely to result from temporal smearing as it is temporally interrupted around display onset. However, it is important to note that CTFs by location are based on far fewer trials, making them inherently noisier. The by-location plots primarily serve to show that the observed pattern is generally consistent across locations. In any case, given that the high probability distractor location was counterbalanced across participants it did not systematically influence our results.

      (10) Given that EEG was measured, perhaps the authors could show data to connect with the extant literature. For example, by showing the ERP N2pc and PD components. A strong prediction here is that there should be an N2pc component followed by a PD component if there is the first selection of the singleton before it is suppressed.

      Thank you for your great suggestion regarding the analysis of ERP components such as N2pc and Pd. To reliably assess lateralized ERP components like N2pc or Pd the high probability location must be restricted to static lateralized positions (e.g., on the horizontal midline such as Wang et al., 2019). In contrast, our study was designed to utilize an inverted encoding model to investigate the mechanisms underlying spatial suppression. To avoid bias in training the spatial model toward specific spatial locations (see also the previous comment), we counterbalanced the high-probability location across participants, ensuring an equal distribution of high-probability locations within the sample. Given this counterbalanced design, it was not feasible to reliably assess these components within the scope of the current study. Yet, we agreed with the reviewer that it would be of theoretical interest to examine Pd and N2pc evoked by the search display, particularly in this scenario where suppression has been triggered prior to search onset.

      (11) Figure 2 (behavioral results) is difficult to see (especially the light grey and white bars). A simple fix might be to outline all the bars in black.

      Thank you! We have incorporated your suggestion by outlining all the bars on page 10.

      Reviewer #3 (Recommendations For The Authors):<br /> (1) I'm wondering about the link between the memory task and the search task.  I think the interpretation of the data should include more discussion of the fact that much of the search literature doesn't involve simultaneously holding an unrelated location in memory.  How might that change the results?

      For example - what happens behaviorally on the subset of trials in which the location to be held in memory is near the high probability distractor location?  All the behavioral data is more or less compartmentalized, but I think some behavioral analysis of this and related questions might be quite useful.  I know there are comparisons of behavior in single vs. dual-task cases (for the memory task at least), but I think the analyses could go deeper.

      Thank you for your great suggestion. To investigate the potential interactions between the spatial memory task and the visual search task, we conducted additional analyses on the behavioral data. First, we examined whether memory recall was influenced by the spatial distance (dist0 to dist4) between the memory cue location and the high-probability distractor location. As shown in the figure below, memory recall is not systematically biased either toward or away from the high-probability distractor location (p = .562, ηp<sup>2</sup> = .011).

      We also assessed how the memory task might affect search performance. Specifically, we plotted reaction times as a function of the spatial overlap between the memory cue location and any of the search items, separating trials by distractor-present (match-target, match-distractor, match-neutral) and distractor-absent (match-target, match-neutral) conditions. Although visually the result pattern seems to suggest that search performance was facilitated when the memory cue spatially overlapped with the target and interfered with when it overlapped with the distractor, this pattern did not reach statistical significance (distractor-present: p = .249, ηp<sup>2</sup> = .002; distractor-absent: p = .335, ηp<sup>2</sup> = .002). We have now included these analyses in our supplemental material.

      Beyond additional data analyses, there are also theoretical questions to be asked.  For example, one could argue that in order to maintain a location near or at the high probability distractor location in working memory, the priority map would have to shift substantially. This doesn't necessarily mean that proactive suppression always occurs in search when there is a high probability location. Instead, one could argue that when you need to maintain a high probability location in memory but also know that this location might contain a distractor, the representation necessarily looks quite different than if there were no memory tasks.  Maybe there are reasons against this kind of interpretation but more discussion could be devoted to it in the manuscript. I guess another way to think of this question is - how much is the ping showing us about attentional priority for search vs. attentional priority for memory, or is it simply a combination of those things, and if so, how might that change if we could ping the attentional priority map without a simultaneous memory task?

      Thank you for this valuable suggestion. The aim of our study was to explore how the CTFs elicited by the memory cue were influenced by the search task. We employed a simultaneous memory task because directly measuring CTFs in relation to the search task was not feasible, as the HPL typically does not vary within individual participants. Consequently, CTFs locked to placeholder onsets could reflect arbitrary differences between (subgroups of) participants rather than true differences in the HPL. To address this, we combined the search task with a VWM task, leveraging the fact that location-specific CTFs can reliably be elicited by a memory cue and that the location of this cue relative to the HPL can be systematically varied within participants (Foster et al., 2016, 2017; van Moorselaar et al., 2018). This approach allowed us to examine the CTFs elicited by the memory cue and how these were modulated by their distance from the HPL.

      While it is theoretically possible that the observed changes resulted from alterations in how the memory cue was maintained in memory only, this explanation seems unlikely, for memory performance (recall) did not vary as a function of the cue's distance from the HPL, suggesting that the distance-related changes in the CTFs are reflections of both tasks. Moreover, distractor learning typically occurs without awareness (Gao & Theeuwes 2022; Wang & Theeuwes 2018). It is difficult to understand how such unconscious processes could lead to anticipations in the memory task and subsequently modulate the representation of the consciously remembered memory cue only. We therefore believe that if we would have pinged the attentional priority map without a simultaneous memory task, the results would have been similar to those obtained in the present experiment, indicating stronger tuning at the HPL. Yet, this work still needs to be done.

      To address this comment, we have added a paragraph on p. 18:

      “However, two alternative explanations warrant consideration. First, one could argue that observed modulations in the revived CTFs do not provide insight into the mechanisms underlying distractor suppression but instead reflect changes in the memory representation itself, potentially triggered by the anticipation of the HPL in the search task. According to this view, the changes in the revived CTFs would be unrelated to how search performance (in particular distractor suppression) was achieved. While this is theoretically possible, we believe it to be unlikely. Memory performance (recall) did not vary as a function of the cue's distance from the HPL, whereas the revived CTFs did, indicating that these changes likely reflect contributions from both tasks. Additionally, distractor learning typically occurs without conscious awareness (Gao & Theeuwes 2022; Wang & Theeuwes 2018). It is difficult to conceive how such unconscious processes could produce anticipatory effects in the memory task and selectively modulate the representation of the consciously remembered memory cue. Second, the apparent lack of suppression and the presence of a pronounced tuning at the high-probability distractor location could actually reflect a proactive mechanism that manifests in a way that seems reactive due to the dual-task nature of our experiment.”

      (2) When the distractor appears at a particular location with a high probability it necessarily means that intertrial effects differ between high and low probability distractor locations.  Consecutive trials with a distractor at the same location are far more frequent in the high probability condition.  You may not have enough power to look at this, and I know this group has analyzed this behaviorally in the past, but I do wonder how much that influences the EEG data reported here.  Are CTFs also sensitive to distractors/targets from the most recent trial?  And does that contribute to the overall patterns observed here?

      Thank you for your thoughtful comment. Indeed, Statistical distractor learning studies naturally involve a higher proportion of intertrial effects for high-probability distractors compared to low-probability ones. Previous research, including the present study, has demonstrated that while distractor location improves performance—shown by faster response times (t(23) = 6.32, p < .001, d = 0.33) and increased accuracy (t(23) = 4.21, p < .001, d = 0.86)—intertrial effects alone cannot fully account for the learned suppression effects induced by spatial distractor imbalances. This analysis in now reflected in the revised manuscript on page 9.

      However, as noted by the reviewer, this leaves uncertain to what extent the neural indices of statistical learning, in this case the modulation of channel tuning functions, capture the effects of interest beyond the contributions of intertrial priming. To address this issue, one possible approach is to rerun the CTF analysis after excluding trials with location repetitions. Since the distractor location is unknown to participants at the time the CTF is revived by the placeholder, we removed trials where the memory cue location repeated the distractor location from the preceding trial, rather than trials with distractor location repetitions between consecutive trials. Our analyses indicate that after trials removal (~ 9% of overall trials), the spatial gradient pattern in the CTF slopes remains similar. However, the cluster-based permutation analysis fails to reveal any significant findings, and a one-sample t-test on the slopes averaged within the 100 ms time window of interest yields a p-value of 0.106. While this could suggest that the current pattern is influenced by distractor-cue repetition, it is more likely that the trial removal resulted in an underpowered analysis. To investigate this, we randomly removed an equivalent number of trials (9%), which similarly resulted in insignificant findings, although the overall result pattern remained comparable (p = 0.066 for the one-sample t-test on the slopes average within the interested time window of 100 ms).

      Author response image 2.

      Also, in our previous pinging study we observed that, despite the trial imbalance, decoding was approximately equal between high probability trailing (i.e., location intertrial priming) and non-trailing trials, suggesting that the ping is able to retrieve the priority landscape that build up across longer timescales.

      (3) Maybe there is too much noise in the data for this, but one could look at individual differences in the magnitude of the high probability distractor suppression and the magnitude of the alpha CTF slope.  If there were a correlation here it would bolster the argument about the relationship between priority to the distractor location and subsequent behavior reduction of interference from that distractor.  

      Thank you for this valuable suggestion. We investigated whether there was a correlation between the average gradient slope during the time window in which the placeholder revived the memory representation and the average distance slope in reaction times for the learned suppression effect. This correlation was not significant (r = .236, p = 0.267), which is perhaps expected given the potential noise levels, as noted by the reviewer. Furthermore, while the learned suppression effect is robust at the group level, its predictive value for individual-level performance has been shown to be limited (Ivanov et al., 2024; Hedge et al., 2018). Consequently, we chose not to include this analysis in the manuscript (see also our response to comment 2 by reviewer 2).

      (4) The results sections are a bit dense in places, especially starting at the bottom of page 11.  For readers who are familiar with the general questions being asked but less so with the particular time-frequency analyses and CTF approaches being used (like myself), I think a bit more time could be spent setting up these analyses within the results section to make extra clear what's going on.

      Thank you for your feedback regarding the clarity of our Results section. We have revised this section to make it more understandable and easier to follow, especially for readers who may be less familiar with the specific time-frequency analyses and modeling approaches used in our study. Specifically, we have provided additional interpretations alongside the reported results from page 10 to page 13 to aid comprehension and ensure that the methodology and findings are accessible to a broader audience. Additionally, we have revised the figure notes to further enhance clarity and understanding.

      Other comments:

      Abstract: "a neutral placeholder display was presented to probe how hidden priority map is reconfigured..."  i think the word "the" is missing before "priority map"

      Thank you. We have added the word “the” before “hidden priority map”.

      p. 4, Müller's group also has a number of papers that demonstrate how learned distractor regularities impact search (From the ~2008-2012 range, probably others as well), it might be worth citing a few here.

      Thank you for your suggestion. In the revised manuscript, we have added citations to several key papers from Muller’s group on page 4 as well as other research groups.

      p.5 - Chang et al. (2023) seems highly relevant to the current study (and consistent with its results) - depending on word limits, it might make sense to expand the description of this in the introduction to make clear how the present study builds upon it

      Thank you! We have expanded the discussion of Chang et al. (2023) on page 5 to provide more detailed elaboration of their study and its relevance to our work.

      p. 7 - maybe not for the current study, but I do wonder whether the distortion of spatial memory by the presence of the search task occurs only when there is a relevant regularity in the search task. In other words, if the additional singleton task had completely unpredictable target and distractor locations, would there be memory distortions?  Possibly for the current dataset, the authors could explore whether the behavioral distortion is systematically towards or away from the high probability distractor location.

      Thank you for your insightful suggestion. Following your recommendation, we conducted an additional analysis to examine memory recall as a function of the distance between the memory cue location and the high-probability distractor location. Figure S1A illustrates the results, depicting memory recall deviation across various distances (dist0 to dist4) from the high-probability distractor location.

      Our statistical analysis indicates that memory recall is not systematically biased either towards or away from the high-probability distractor location (p = .562, η<sub>p</sub><sup>2</sup> = .011). This finding suggests that spatial memory recall remains relatively stable and is not heavily influenced by the presence of regularities in the distractor locations.

      p. 7 - in addition to stats it would be helpful to report descriptive statistics for the high probability vs. other distractor location comparisons

      Thank you! We have added descriptive statistics on page 8 and page 9.

      p. 19, "64%" repeated unnecessarily - also, shouldn't it be 65% if it's 5% at each of the other seven locations?

      Thank you. This is now corrected in the revised manuscript.

      p. 20 "This process continued until participants demonstrated a thorough understanding of the assigned tasks" Were there objective criteria to measure this?

      Thank you for pointing out this issue. To clarify, objective criteria were indeed used to assess participants’ readiness to proceed. Specifically:

      For the training phase practice trials, participants were required to achieve an average memory recall deviation of less than 13°.

      For the test phase practice trials, participants needed to demonstrate a minimum of 65% accuracy in the search task. In addition, participants were asked to verbally confirm their understanding of the task goals with the experimenter before proceeding.

      We have revised the manuscript to clearly indicate these criteria on p. 23.

      p. 21 "P-values were Greenhouse-Geiser corrected in case where the..." I think "case" should be "cases"

      Thank you. We have corrected this in the revised manuscript.

    1. eLife Assessment

      This study offers a valuable treatment of how the population of excitatory and inhibitory neurons integrates principles of energy efficiency in their coding strategies. The convincing analysis provides a comprehensive characterisation of the model, highlighting the structured connectivity between excitatory and inhibitory neurons. The role of the many free parameters are discussed and studied in depth.

    2. Reviewer #1 (Public review):

      Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, which leads to the experimentally observed phenomenon of feature competition. The authors also examine how various (hyper)parameters-such as adaptation timescale, the excitatory-to-inhibitory cell ratio, regularization strength, and background current-affect the model. These findings add biological realism to a specific implementation of efficient coding. They show that efficient coding explains, or at least is consistent with, multiple experimentally observed properties of excitatory and inhibitory neurons.

      As discussed in the first round of reviews, the model's ability to replicate biological observations such as the 4:1 ratio of excitatory vs. inhibitory neurons hinges on somewhat arbitrary hyperparameter choices. Although this may limit the model's explanatory power, the authors have made significant efforts to explore how these parameters influence their model. It is an empirical question whether the uncovered relationships between, e.g., metabolic cost and the fraction of excitatory neurons are biologically relevant.

      The revised manuscript is also more transparent about the model's limitations, such as the lack of excitatory-excitatory connectivity.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength.

      Strengths:

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models. In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important. Lastly, though several of the observations have been reported and studied before, this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Weaknesses:

      This work is the latest among a line of research papers studying the properties of efficient spiking networks. Many of the characteristics and findings here have been discussed before, thereby limiting the new insights that this work can provide. Thus, the conclusions of this work should be considered and understood in the context of those previous works, as the authors state. Furthermore, the number of assumptions and free parameters in the model, though necessary to bring the model closer to biophysical reality, make it more difficult to understand and to draw clear conclusions from. As the authors state, many of the optimality claims depend on these free parameters, such as the dimensionality of the input signal (M=3), the relative weighting of encoding error and metabolic cost, and several others. This raises the possibility that it is not the case that the set of biophysical properties measured in the brain are accounted for by efficient coding, but rather that theories of efficient coding are flexible enough to be consistent with this regime. With this in mind, some of the conclusions made in the text may be overstated and should be considered in this light.

      Conclusions, Impact, and additional context:

      Notions of optimality are important for normative theories, but they are often studied in simple models with as few free parameters as possible. Biophysically detailed and mechanistic models, on the other hand, will often have many free parameters by their very nature, thereby muddying the connection to optimality. This tradeoff is an important concern in neuroscientific models. Previous efficient spiking models have often been criticized for their lack of biophysically-plausible characteristics, such as large synaptic weights, dense connectivity, and instantaneous communication. This work is an important contribution in showing that such networks can be modified to be much closer to biophysical reality without losing their essential properties. Though the model presented does suffer from complexity issues which raise questions about its connections to "optimal" efficient coding, the extensive study of various parameter dependencies offers a good characterization of the model and puts its conclusions in context.

    4. Reviewer #3 (Public review):

      Summary:

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work.

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs.

      They then investigate in depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and show the networks can operate in a biologically realistic regime.

      Strengths:

      * The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field<br /> * They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly<br /> * They put sensible constraints on their networks, while still maintaining the good properties these networks should have

      Weaknesses:

      * One of the core goals of the paper is to make a more biophysically realistic network than previous work using similar optimization principles. One of the important things they consider is a split into E and I neurons. While this works fine, and they consider the coding consequences of this, it is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. This would be out of scope for the current paper however.<br /> * The theoretical advances in the paper are not all novel by themselves, as most of them (in particular the split into E and I neurons and the use of biophysical constants) had been achieved in previous models. However, the authors discuss these links thoroughly and do more in-depth follow-up experiments with the resulting model.

      Assessment and context:

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporate aspects of energy efficiency. For computational neuroscientists this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers the model provides a clearer link of efficient coding spiking networks to known experimental constraints and provides a few predictions.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, which leads to the experimentally observed phenomenon of feature competition. The authors also examine how various (hyper)parameters-such as adaptation timescale, the excitatory-to-inhibitory cell ratio, regularization strength, and background current-affect the model. These findings add biological realism to a specific implementation of efficient coding. They show that efficient coding explains, or at least is consistent with, multiple experimentally observed properties of excitatory and inhibitory neurons. 

      As discussed in the first round of reviews, the model's ability to replicate biological observations such as the 4:1 ratio of excitatory vs. inhibitory neurons hinges on somewhat arbitrary hyperparameter choices. Although this may limit the model's explanatory power, the authors have made significant efforts to explore how these parameters influence their model. It is an empirical question whether the uncovered relationships between, e.g., metabolic cost and the fraction of excitatory neurons are biologically relevant.

      The revised manuscript is also more transparent about the model's limitations, such as the lack of excitatory-excitatory connectivity. Further improvements could come from explicitly acknowledging additional discrepancies with biological data, such as the widely reported weak stimulus tuning of inhibitory neurons in the primary sensory cortex of untrained animals.

      We thank the Reviewer for their insightful characterization of our paper and for further suggestions on how to improve it. We have now further improved the transparency about model’s limitations and we explicitly acknowledged the discrepancy with biological data about connection probability and about the selectivity of inhibitory neurons (pages 4 and 15).

      Reviewer #2 (Public review): 

      Summary: 

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength.

      Strengths: 

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models. In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important. Lastly, though several of the observations have been reported and studied before, this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Weaknesses: 

      This work is the latest among a line of research papers studying the properties of efficient spiking networks. Many of the characteristics and findings here have been discussed before, thereby limiting the new insights that this work can provide. Thus, the conclusions of this work should be considered and understood in the context of those previous works, as the authors state. Furthermore, the number of assumptions and free parameters in the model, though necessary to bring the model closer to biophysical reality, make it more difficult to understand and to draw clear conclusions from. As the authors state, many of the optimality claims depend on these free parameters, such as the dimensionality of the input signal (M=3), the relative weighting of encoding error and metabolic cost, and several others. This raises the possibility that it is not the case that the set of biophysical properties measured in the brain are accounted for by efficient coding, but rather that theories of efficient coding are flexible enough to be consistent with this regime. With this in mind, some of the conclusions made in the text may be overstated and should be considered in this light.

      Conclusions, Impact, and additional context: 

      Notions of optimality are important for normative theories, but they are often studied in simple models with as few free parameters as possible. Biophysically detailed and mechanistic models, on the other hand, will often have many free parameters by their very nature, thereby muddying the connection to optimality. This tradeoff is an important concern in neuroscientific models. Previous efficient spiking models have often been criticized for their lack of biophysically-plausible characteristics, such as large synaptic weights, dense connectivity, and instantaneous communication. This work is an important contribution in showing that such networks can be modified to be much closer to biophysical reality without losing their essential properties. Though the model presented does suffer from complexity issues which raise questions about its connections to "optimal" efficient coding, the extensive study of various parameter dependencies offers a good characterization of the model and puts its conclusions in context.

      We thank the Reviewer for their thorough and accurate assessment of our paper.  

      Reviewer #3 (Public review): 

      Summary: 

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work. 

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs. 

      They then investigate in depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and show the networks can operate in a biologically realistic regime.

      Strengths: 

      * The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field

      * They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly

      * They put sensible constraints on their networks, while still maintaining the good properties these networks should have

      Weaknesses: 

      * One of the core goals of the paper is to make a more biophysically realistic network than previous work using similar optimization principles. One of the important things they consider is a split into E and I neurons. While this works fine, and they consider the coding consequences of this, it is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. This would be out of scope for the current paper however.

      * The theoretical advances in the paper are not all novel by themselves, as most of them (in particular the split into E and I neurons and the use of biophysical constants) had been achieved in previous models. However, the authors discuss these links thoroughly and do more in-depth follow-up experiments with the resulting model. 

      Assessment and context: 

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporate aspects of energy efficiency. For computational neuroscientists this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers the model provides a clearer link of efficient coding spiking networks to known experimental constraints and provides a few predictions.

      We thank the Reviewer for a positive assessment and for pointing out the merits of our work.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The authors have addressed my previous concerns, and I agree that the manuscript has improved. However, I believe they could still do more to acknowledge two notable mismatches between the model and experimental data.

      (1) Stimulus selectivity of excitatory and inhibitory neurons 

      In the model, excitatory and inhibitory neurons exhibit similar stimulus selectivity, which appears inconsistent with most experimental findings. The authors argue that whether inhibitory neurons are less selective remains an open question, citing three studies in support. However, only one of these studies (Ranyan) was conducted in primary sensory cortex and it is, to my knowledge, one of the few papers showing this (indeed, it's often cited as an exception). The other two studies (Kuan and Najafi) recorded from the parietal cortex of mice trained on decision making tasks, and therefore seem less relevant to the model.

      In contrast to the cited studies, the overwhelming majority of the work has found that inhibitory neurons in sensory cortex, in particular those expressing Parvalbumin, are less stimulus selective than excitatory cells. And this is indeed the prevailing view, as summarized by the review from Hu et al. (Science, 2014): "PV+ interneurons exhibit broader orientation tuning and weaker contrast specificity than pyramidal neurons." This view emerged from numerous classical studies, including Sohya et al. (J. Neurosci., 2007), Cardin (J. Neurosci., 2007), Nowak (Cereb. Cortex, 2008), Niell et al. ( J. Neurosci., 2008), Liu (J. Neurosci., 2009), Kerlin (Neuron, 2010), Ma et al. (J. Neurosci., 2010), Hofer et al. (Nature Neurosci. 2011), and Atallah et al. (Neuron 2012). Weak inhibitory tuning has been confirmed by recent studies, such as Sanghavi & Kar (biorxiv 2023), Znamenskiy et al. (Neuron 2024), and Hong et al. (Nature, 2024).

      The authors should acknowledge this consensus and cite the conflicting evidence. Failing to do so is cherry picking from the literature. Since training can increase the stimulus selectivity of PV+ neurons to that of Pyr levels, also in primary visual cortex (Khan et al. Neuron 2018), a favourable interpretation of the model is that it represents a highly optimized, if not overtrained, state.

      We have carefully considered the literature cited by the Reviewer. We agree with the interpretation that stimulus selectivity of inhibitory neurons in our model is higher than the stimulus selectivity of Parvalbumin-positive inhibitory neurons in the primary sensory cortex of naïve animals. We have edited the text in Discussion (page 14).

      (2) Connection probability 

      The manuscript claims that "rectification sets the overall connection probability to 0.5, consistent with experimental results (Pala & Petersen; Campagnola et al.)." However, the cited studies, and others, report significantly lower probabilities, except for Pyr-PV (E-I connections in the model). For example, Campagnola et al. measured PV-Pyr connectivity at 34% in L2/3 and 20% in L5.

      It's perfectly acceptable that the model cannot replicate every detail of biological circuits. But it's important to be cautious when claiming consistency with experimental data.

      Here as well, we agree with the Reviewer that the connection probability of 0.5 is consistent with reported connectivity of Pyr-PV neurons, but less so with reported connectivity of PV-Pyr neurons. We have now qualified our claim about compatibility of the connection probability in our model with empirical observations more precise (page 4).

      Reviewer #2 (Recommendations for the authors): 

      I commend the authors for an extremely thorough and detailed rebuttal, and for all of the additional work put in to address the reviewer concerns. For the most part, I am satisfied with the current state of the manuscript. 

      We thank the Reviewer for recognizing our effort to address the first round of Reviews to our best ability.

      Here are some small points still remaining that I think the authors should address: 

      (1) Pg. 8, "We verified the robustness of the model to small deviations from the optimal synaptic weights" - while the authors now cite Calaim et al. 2022 in the discussion, its relevance to several of the results justify its inclusion in other places. Here is one place where the authors test something that was also studied in this previous paper.

      The Reviewer is correct that Calaim et al. (eLife 2022) addressed the robustness of synaptic weights, and we now cited this study when describing our results on jiVering of synaptic connections (page 8).

      (2) Pg. 9, "In our optimal E-I network we indeed found that optimal coding efficiency is achieved in absence of within-neuron feedback or with weak adaptation in both cell types" Pg. 10, "the absence of within-neuron feedback or the presence of weak and short-lasting spike-triggered adaptation in both E and I neurons are optimally efficient solutions" The authors seem to state that both weak adaptation and no adaptation at all are optimal. In contrast to the rest of the results presented, this is very vague and does not give a particular level of adaptation as being optimal. The authors should make this more clear. 

      We agree that the text about optimal level of adaptation was unclear. The optimal solution is no adaptation, while weak and short-lasting adaptation define a slightly suboptimal, yet still efficient, network state, as now stated on page 10.

      (3) Pg. 13, "In summary our analysis suggests that optimal coding efficiency is achieved with four times more E neurons than I neurons and with mean I-I synaptic efficacy about 3 times stronger..." --- claims such as these are still too strong, in my opinion. It is rather the case that the particular ratio of E to I neurons and connections strengths can be made consistent with an optimally efficient regime.

      We agree here as well. We have revised the text (page 13) to beVer explain our results.

      (4) Pg. 14, "firing rates in the 1CT model were highly sensitive to variations in the metabolic constant" (Fig. 8I, as compared to Fig. 6C). This difference between the 1CT and E-I networks is striking, and I would suspect it is due to some idiosyncrasies in the difference between the two models (e.g., the relative amount of delay that it takes for lateral inhibition to take effect, or the fact that E-E connections have not been removed in this model). The authors should ideally back up this result with some justified explanation. 

      We agree with Reviewer that the delay for lateral inhibition in the E-I model is twice that of the 1CT model and that the E-I model gains stability from the lack of E-E connectivity. Furthermore, the tuning is stronger in I compared to E neurons in the E-I model, which contributes to making the E-I network inhibition-dominated (Fig. 1H). In contrast, the average excitation and inhibition in the 1CT model are of exactly the same magnitude. The property of being inhibition-dominated makes the E-I model more stable. We report these observations in the revised text (pages 14-15). 

      Reviewer #3 (Recommendations for the authors): 

      Overall my points were very well responded to and I removed most of my weaknesses.

      I appreciate the authors implementing my suggested analysis change for Figure 8, and I find the result very clear. I would further suggest they add a bit of text for the reader as to why this is done. For a new reader without much knowledge of these networks at first it seems the inhibitory population is very good at representation in fig 8G: so why is it not further considered in fig 8H?

      We thank the reviewer for providing further suggestions. We now clarified in the text why only the excitatory population of the E-I model is considered in E-I vs 1 cell type model comparison (page 14). 

      Thanks for sharing the code. From a quick browse through it looks very manageable to implement for follow up work, although some more guidance for how to navigate the quite complicated codebase and how to reproduce specific paper results would be helpful.

      We have also updated the code repository, where we have included more complete instructions on how to reproduce results of each figure. We renamed the folders with the computer code so that they point to a specific figure in the paper. The repository has been completed with the output of the numerical simulations we run, which allows immediate replot of all figures. We have deposited the repository at Zenodo to have the final version of the code associated with the DOI ttps://doi.org/10.5281/zenodo.14628524. This is mentioned in the section Code availability (page 17).

    1. eLife Assessment

      This short manuscript uses mutation counts in phylogenies of millions of SARS-CoV-2 genomes to show that mutation rates systematically differ between regions that are paired or unpaired in the predicted RNA secondary structure of the viral genome. Such an effect of pairing state is not unexpected, but its systematic demonstration using millions of viral genomes is valuable and convincing.

    2. Reviewer #1 (Public review):

      Summary:

      This very short paper shows a greater likelihood of C->U substitutions at sites predicted to be unpaired in the SARS-CoV-2 RNA genome, using previously published observational data on mutation frequencies in SARS-CoV-2 (Bloom and Neher, 2023).

      General comments:

      A preference for unpaired bases as target for APOBEC-induced mutations has been demonstrated previously in functional studies so the finding is not entirely surprising. This of course assumes that A3A or other APOBEC is actually the cause of the majority of C->U changes observed in SARS-CoV-2 sequences.

      I'm not sure why the authors did not use the published mutation frequency data to investigate other potential influences on editing frequencies, such as 5' and 3' base contexts. The analysis did not contribute any insights into the potential mechanisms underlying the greater frequency of C->U (or G->U) substitutions in the SARS-CoV-2 genome.

      Comments on revisions:

      The revisions have addressed my main comments in my review.

    3. Reviewer #2 (Public review):

      Hensel investigated the implications of SARS-CoV-2 RNA secondary structure in synonymous and nonsynonymous mutation frequency. The analysis integrated estimates of mutational fitness generated by Bloom and Neher (from publicly available patient sequences) and a population-averaged model of RNA base-pairing from Lan et al (from DMS mutational profiling with sequencing, DMS-MaPseq)

      The results show that base-pairing limits the frequency of some synonymous substitutions (including the most common C→T), but not all: G→A and A→G substitutions seem unaffected by base-pairing.

      The author then addressed nonsynonymous C→T substitutions at basepaired positions. While there is still a generally higher estimated mutational fitness at unpaired positions, they propose a coarse adjustment to disentangle base-pairing from inherent mutational fitness at a given position. This adjustment reveals that nonsynonymous substitutions at base-paired positions, which define major variants, have higher mutational fitness.

      Overall, this manuscript highlights the importance of considering RNA secondary structure in viral evolution studies.

      The conclusions of this work are generally well supported by the data presented. Particularly, the author acknowledges most limitations of the analyses and addresses them. Even though no new sequencing results were generated, the author used available data generated from the analysis of roughly seven million sequenced patient samples. Finally, the author discusses ways to improve the current available models.

      There are a number of limitations of this work that should be highlighted, specifically in regard to the secondary structure data used in this paper. The Lan et al. dataset was generated using a multiplicity of infection (MOI) of 0.05, 24 hours post-infection (h.p.i.). At such a low MOI and late timepoint, viral replication is not synchronous and sequencing artifacts might be generated by cell debris and viral RNA degradation, therefore impacting the population-averaged results. In addition, the nonsynonymous base-paired positions in Figure 2 have relatively high population-averaged DMS reactivity, which suggests those positions are dynamic. Therefore, the proposed adjustment could result in an incorrect estimation of their inherent mutational fitness.

      Additionally, like all such RNA probing experiments within cells, it remains difficult to deconvolve DMS/SHAPE low reactivity with RNA accessibility (e.g. from protein binding).

      This work presents clear methods and an easy-to-access bioinformatic pipeline, which can be applied to other RNA viruses. Of note, it can be readily implemented in existing datasets. Finally, this study raises novel mechanistic questions on how mutational fitness is not correlated to secondary structure in the same way for every substitution.

      Overall, this work highlights the importance of studying mutational fitness beyond an immune evasion perspective. On the other hand, it also adds to the viral intrinsic constraints to immune evasion.

      Comments on revisions:

      Following revision by the author, our concerns have been addressed. The additional analysis strengthens the conclusions & the revisions to the text have improved the manuscript for a general audience.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 Public Review:

      Summary

      This very short paper shows a greater likelihood of C->U substitutions at sites predicted to be unpaired in the SARS-CoV-2 RNA genome, using previously published observational data on mutation frequencies in  SARS-CoV-2 (Bloom and Neher, 2023).

      General comments

      A preference for unpaired bases as a target for APOBEC-induced mutations has been demonstrated previously in functional studies so the finding is not entirely surprising. This of course assumes that A3A or other APOBEC is actually the cause of the majority of C->U changes observed in SARS-CoV-2 sequences.

      I'm not sure why the authors did not use the published mutation frequency data to investigate other potential influences on editing frequencies, such as 5' and 3' base contexts. The analysis did not contribute any insights into the potential mechanisms underlying the greater frequency of C->U (or G->U) substitutions in the SARS-CoV-2 genome.

      I have added additional discussions of mechanisms focusing on the question of whether basepairing bias is  primarily driven by secondary structure dependence of underlying mutation rates or by conservation of  secondary structure (Discussion lines 178–192) and I added a brief analysis of the 5′ and 3′ contexts of the  relationship between being basepaired in a secondary structure model and apparent mutational fitness  (Figures S1 and S2, Results lines 85–97). I found that the 5′ context of unpaired, but not paired basepairs  influences apparent mutational fitness (preference for 5′ U), and that the  is also . Additionally, there is a 3′  preference for G, indicating some CpG suppression. This contrasts to some degree with another analysis  based on counting lineage frequencies that may have lacked power to detect relatively small effects  (Simmonds  mBio  2024).

      Reviewer 1 Author recommendations:

      There are at least 5 publications describing the mapping/prediction of SARS-CoV-2 RNA secondary structure from 2022-2023 and their predictions are not entirely consistent. Why did the authors only refer to the Lan et al. paper?

      I have added comparisons when the Lan et al secondary structure model is replaced by one of two others  derived from SHAPE data (Results lines 110–122). Unsurprisingly, similar secondary structure models give  similar results and performance is modestly higher for the models from Lan et al. This is consistent  with  their observations that DMS reactivities performed better as classifiers of SL5 and ORF1 secondary  structure (the reason I compared to this secondary structure model and reactivity data set rather than  others), but I did not go into detail on this in the revision since there are many differences in methods  beyond class of reactivity probe. For example, somewhat stronger correlation for the Vero than the Huh7  dataset in Lan  et al  could arise from combining data  from two replicates, from cell type, or from differences  in data analysis methods. It’s also a small difference and cannot be confidently distinguished from noise.

      I conducted a preliminary comparison of the performance of DMS and SHAPE data for predicting mutations  where DMS data is available, but I opted against including this analysis in the manuscript for the same  reasons. Instead, I included in results and discussion comments on how, in general, reactivity data contains  information that is predictive of substitution rates that is not captured by binary secondary structure models.  I also discuss how multiple data sources can potentially be integrated to more accurately predict the impact  of a substitution on fitness (Discussion lines 195–201).

      Specific substitutions are referred to as C->T and C29085T for example, but as the genome of SARS-CoV-2 is RNA, and T should be a U.

      I agree and I have changed all “T” to “U” in the paper and analysis scripts. The choice of “T” was motivated  by what seemed to appear most frequently in papers on SARS-CoV-2 mutational spectra, but “U” is nearly  universal in papers on secondary structure and mutation mechanisms, so I agree it makes more sense in  this paper.

      The C29085T substitution is somewhat non-canonical as it is a single base bulge in a longer duplex section of dsRNA, very unlike the favoured sites for mutation in the Nakata et al paper.

      I have added a discussion of Nakata  et al ( NAR 2023) ( Introduction lines 29–32). I did not go into this depth  in the revision, but the analysis of ~2M patient sequences in Nakata  et al  also noted a high rate of UUC→UUU substitution, so the UUUC context of C29095 (shared by 3 of the 10 positions highlighted in  Nakata  et al  that had high mutation frequencies with  exogenous APOBEC3A expression) could be  interesting to investigate further.

      High C29095U substitution frequency is indeed somewhat at odds with the results in that work, which found  that UC→UU substitutions to be elevated in longer single-stranded regions than the context of C29095U in  SARS-CoV-2 secondary structure models (a single unpaired base opposing three unpaired bases in an  asymmetric internal loop).

      I'm not sure why DMS reactivity is considered a separate variable from pairing likelihood as one informs the other.

      The intent here, which was not clear, was to show that a binary basepairing model that uses DMS  reactivities as constraints does not capture all of the information available. I have clarified this in as  described above discussing information in different reactivy datasets.

      The C29095U substitution is also relavent to the consideration of DMS reactivity in addition to the resulting  secondary structure model. These are not considered as separate predictors and the reason for showing  both is mentioned in the paper: “DMS reactivity was more strongly correlated with estimated mutational  fitness than basepairing when analysis was limited to positions with detectable DMS reactivity.” I have  clarified this in the revised manuscript and also it is relevant to the discussion of a potential model  integrating all available datasets.

      Reviewer 2 Public Review:

      Hensel investigated the implications of SARS-CoV-2 RNA secondary structure in synonymous and nonsynonymous mutation frequency. The analysis integrated estimates of mutational fitness generated by Bloom and Neher (from publicly available patient sequences) and a population-averaged model of RNA basepairing from Lan et al (from DMS mutational profiling with sequencing, DMS-MaPseq).

      The results show that base-pairing limits the frequency of some synonymous substitutions (including the most common CT), but not all: GA and AG substitutions seem unaffected by base-pairing.

      The author then addressed nonsynonymous CT substitutions at base-paired positions. While there is still a generally higher estimated mutational fitness at unpaired positions, they propose a coarse adjustment to disentangle base-pairing from inherent mutational fitness at a given position. This adjustment reveals that nonsynonymous substitutions at base-paired positions, which define major variants, have higher mutational fitness.

      Overall, this manuscript highlights the importance of considering RNA secondary structure in viral evolution studies.

      The conclusions of this work are generally well supported by the data presented. Particularly, the author acknowledges most limitations of the analyses, and addresses them. Even though no new sequencing results were generated, the author used available data generated from the analysis of roughly seven million sequenced patient samples. Finally, the author discusses ways to improve the current available models.

      There are a number of limitations of this work that should be highlighted, specifically in regard to the secondary structure data used in this paper. The Lan et al. dataset was generated using a multiplicity of infection (MOI) of 0.05, 24 hours post-infection (h.p.i.). At such a low MOI and late timepoint, viral replication is not synchronous and sequencing artifacts might be generated by cell debris and viral RNA degradation, therefore impacting the population-averaged results. In addition, the nonsynonymous base-paired positions in Figure 2 have relatively high population-averaged DMS reactivity, which suggests those positions are dynamic. Therefore, the proposed adjustment could result in an incorrect estimation of their inherent mutational fitness.

      I would go further than this to say that the proposed adjustmentment  will usually  result in an incorrect  estimate. My intent is to propose an improved, but still likely incorrect, estimate by utilizing  in  vitro  data to  refine baseline mutation rates in order to obtain improved, but only coarsely adjusted, estimates of  mutational fitness. I added a note in the discussion that  in vitro  reactivities (and, consequently, secondary  structure models) may not reflect secondary structures  in vivo ( Discussion lines 204–205). I did not go  into  detail regarding the specific technical considerations mentioned here because they are outside the scope of  my expertise.

      I am not sure that top-ranked non-synonymous C→U positions have particularly high DMS values after  coarse adjustment for basepairing (labeled amino acid mutations in Figure 2). Of the six common mutations  used as examples, three have minimum values in the dataset considered (which is processed  normalized/filtered data rather than raw data) and three do not have very high DMS reactivity.

      However, there is clearly information in base reactivity that is not captured by a binary basepairing model,  which is indicated by residual positive correlation between DMS reactivity and mutational fitness after  adjustment. I now include a figure demonstrating this for synonymous C→U substitutions as Figure S3, and  I have tried to clarify the language throughout the manuscript to make it clear that a more accurate  adjustment is possible.

      Additionally, like all such RNA probing experiments within cells, it remains difficult to deconvolve DMS/SHAPE low reactivity with RNA accessibility (e.g. from protein binding).

      I agree, and in revising this manuscript it was interesting to see that Nakata  et al ( discussed above)  identified relatively large single-stranded regions with enhanced UC→UU substitution frequencies with  exogenous APOBEC3A expression, while C29095U, for example, is a single unpaired base with high DMS  reactivity and high empirical C→U substitution frequency (discussed briefly in the introduction of the revised  manuscript). Future analyses could consider heterogeneity in secondary structure as well as secondary  structures with low heterogeneity where strained conformations could have higher reactivity.

      This work presents clear methods and an easy-to-access bioinformatic pipeline, which can be applied to other RNA viruses. Of note, it can be readily implemented in existing datasets. Finally, this study raises novel mechanistic questions on how mutational fitness is not correlated to secondary structure in the same way for every substitution.

      Overall, this work highlights the importance of studying mutational fitness beyond an immune evasion perspective. On the other hand, it also adds to the viral intrinsic constraints to immune evasion.

      Reviewer 2 Author recommendations:

      Even though the experiment was not performed in this manuscript, it would be helpful for the readers if it was briefly explained how secondary structure is inferred from DMS reactivity, as this technique is not broadly used.

      It is not objective to refer to the Lan et al. model of RNA structure as "high quality" given the limitations of their experimental approach (low MOI, asynchronous infection, DMS-only, no long-range interactions) and the lack of external validation of the structure of the genome they propose.

      I removed “high-quality” from the abstract. Since a result of the paper is that secondary structure correlates  with synonymous substitution rates, this is an observation that can be used to retrospectively compare the  quality of secondary structure models in this respect. I updated the manuscript to include such a  comparison, and did not find a large difference between secondary structure models (Results lines  110–122). I added a discussion of how multiple data sources can potentially be integrated to more  accurately predict the impact of a substitution of viral fitness.

      I have also added a brief discussion of constraints on how much we can confidently infer from these  experiments given limitations of the experimental approach. I note that DMS and SHAPE data provide  information that can be combined to make a stronger model, and that predictions can be rapidly tested  given observations by Gout (Symonds?) et al that  in  vitro  substitution rates correlate with those observed  during the pandemic (Discussion lines 195–201).

      Mutational fitness from Bloom & Neher was derived throughout the pandemic, much of which came from a period with the most active surveillance (Delta / Omicron waves). Consequently, these viruses differ from the WA1 strain used by Lan et al. far more than the 3 nt differences between lineage A and B that the author refers to. The following sentence should therefore be revised to avoid misleading the reader:

      "Additionally, note that DMS data was obtained in experiments using the WA1 strain in Lineage A, which differs from the more common Lineage B at 3 positions and could have different secondary structure."

      Revised:

      “Additionally, note that DMS data was obtained in experiments using the WA1 strain in Lineage A,  which differs from the more common Lineage B at 3 positions and could have different secondary  structure. Furthermore, mutational fitness is estimated from the phylogenetic tree of published  sequences (the public UShER tree (Turakhia et al., 2021) additionally curated to filter likely artifacts  such as branches with numerous reversions) that are typically far more divergent and subsequently  will have somewhat different secondary structures. Since the dataset used for mutational fitness  aggregates data across viral clades, my analysis will not capture secondary structure variation  between clades or indels and masked sites that were not considered in that analysis (Bloom and  Neher, 2023).”

      To determine the extent to which the results depend on the single RNA structure model, it would be informative "turn the crank again" on the analysis with one of the other RNA structure datasets for SARS-CoV-2 (though most other datasets suffer from similar problems of asynchronicity of infection).

      I have added comparisons when the Lan  et al  secondary  structure model is replaced by one of two others  derived from SHAPE data as described above. Also, I conducted preliminary comparisons of underlying  DMS and SHAPE reactivity data as described above, but I opted not to include these in the revised  manuscript given that methods different beyond the chemical probe used. I also discuss how multiple data  sources can potentially be integrated to more accurately predict the impact of a substitution of viral fitness.

      In Figure 1 it would be helpful to add the values of the unpaired/basepaired ratios in the plot for clarity.

      Furthermore, a similar analysis using the substitution frequency, which strengthens the conclusions, is mentioned in the text, however, it is not shown. It could be shown as part of Figure 1, or as a supplementary figure.

      This was a good suggestion since numbers around 1 are not perceived as being very significant. I added  the ratio of median unpaired:paired rates to Figure 1, updated the corresponding manuscript text and the  figure caption, and note that the numbers are somewhat changed from the first version of my manuscript  because of updating to use the most up-to-date mutational fitness estimates.

      It is not clear how the two constants were calculated to obtain the "adjusted mutational fitness". It could be shown as part of Figure 2, or as a supplementary figure.

      I added dashed lines and arrows to Figure 2 showing median paired/unpaired mutational fitnesses and the  adjustment made to normalize to the overall median. I also added Figure S3 showing this for synonymous  substitutions, where it is more clear given the lower fraction of mutations with substantial fitness impacts.

      Minor comments

      Statements like "the current fast-growing lineage JN.1.7" never age well... please revise to state the period of time to which this refers.

      Revised:

      “…lineage JN.1.7, which had over 20% global prevalence in Spring 2024…”

      Also, I checked the list of mutations and the examples given remain in the top 15 ranked basepaired,  non-synonymous C→U mutations (BA.2-defining C26060U is added to the list, but I did not update to  include this). It replaces C9246U, which was not mentioned in the first version of the manuscript.

      Similarly, please provide context for the reader in the phrase: "This was one mutation that characterized the B.1.177 lineage" (e.g. add its early reference as "EU1" and that it predominated in Europe in autumn 2020, prior to the emergence of the Alpha variant).

      Revised to add detail:

      This was one of the mutations that characterized the B.1.177 lineage. This lineage, also known as  EU1, characterized a majority of sequences in Spain in summer 2020 and eventually in several  other countries in Europe prior to the emergence of the Alpha variant. However, it was unclear  whether or this lineage had higher fitness than other lineages or if A222V specifically conferred a  fitness advantage.

      "massive sequencing of SARS-CoV-2" - the meaning of the word "massive" is unclear. Revise.

      Revised  “…millions of patient SARS-CoV-2 sequences published during the pandemic…”

    1. eLife Assessment

      This phenomenological study reported that cold exposure induced mRNA expression of genes related to lipid metabolism in the paraventricular nucleus of the hypothalamus (PVH). While the paper does not address cell-type specificity or the functional role of lipids in PVH, the findings might still serve as a useful basis for others to explore their relevance to brain responses to cold. In the revised manuscript, the authors made adequate editions, such as new immunostaining and immunoblotting of AGTL and HSL in the PVH, and pharmacological inhibition of lipid peroxidation and lipolysis. The authors also increased the sample size of some experiments and revised the text to limit their data interpretation. Thus, the reviewers considered that these studies are solid in conclusively describing how the PVH is reprogrammed at the level of gene expression by cold exposure.

    2. Reviewer #1 (Public review):

      Summary:

      This study focuses on metabolic changes in the paraventricular hypothalamic (PVH) region of the brain during acute periods of cold exposure. The authors point out that in comparison to the extensive literature on the effects of cold exposure in peripheral tissues, we know relatively little about its effects on the brain. They specifically focus on the hypothalamus, and identify the PVH as having changes in Atgl and Hsl gene expression changes during cold exposure. They then go on to show accumulation of lipid droplets, increased Fos expression, and increased lipid peroxidation during cold exposure. Further, they show that neuronal activation is required for the formation of lipid droplets and lipid peroxidation.

      Strengths:

      A strength of the study is trying to better understand how metabolism in the brain is a dynamic process, much like how it has been viewed in other organs. The authors also use a creative approach to measuring in vivo lipid peroxidation via delivery of BD-C11 sensor through a cannula to the region in conjunction with fiber photometry to measure fluorescence changes deep in the brain.

      Comments on revised version:

      The authors have attempted to address concerns brought to their attention in the initial review. They have performed one or two additional experiments to address concerns (e.g. adding fiber photometry of PVH neurons and trying to manipulate lipid peroxidation) though many of the concerns from the original review stand. The authors have also revised the text to limit the extent of their claims and to improve clarity, which is appreciated.

    3. Author response:

      The following is the authors’ response to the original reviews.

      We were pleased that many of the critical comments of the reviewers have allowed us to improve our manuscript. In addition to revise the originally submitted figures, we performed new experiments (e.g. new Fig.2, Fig.3, Fig.4, and Fig.6) and revised the manuscript substantially following the reviewers’ comments and suggestions to our initial submission. A point-by-point response to the reviewers’ critiques are summarized below, and new supportive data are provided in this revised manuscript. Per the Reviewers’ comments and revisions, we revised the title to be “Cold induces brain region-selective cell activity-dependent lipid metabolism”. 

      Reviewer #1:

      Strengths:

      A strength of the study is trying to better understand how metabolism in the brain is a dynamic process, much like how it has been viewed in other organs. The authors also use a creative approach to measuring in vivo lipid peroxidation via delivery of a BD-C11 sensor through a cannula to the region in conjunction with fiber photometry to measure fluorescence changes deep in the brain.

      We thank the Reviewer so much for the positive comments on this interesting study on metabolism in the brain.

      Weaknesses:

      One weakness was many of the experiments were done in a manner that could not distinguish between the contributions of neurons and glial cells, limiting the extent of conclusions that could be made. While this is not easily doable for all experiments, it can be done for some. For example, the Fos experiments in Figure 3 would be more conclusive if done with the labeling of neuronal nuclei with NeuN, as glial cells can also express Fos. To similarly show more conclusively that neurons are being activated during cold exposure, the calcium imaging experiments in Figure S3 can be done with cold exposure. 

      We agreed with the Reviewers’ comments. We revised the original Figure 3 (new Figure 6) and Figure S3 (new Figure S4). Our data show that cold increased Fos-positive cells in the PVH (Figure 6) and increased neuronal Ca2+ signals (new Figure S4). As it is difficult to exclude the involvements of astrocytes in the cold-induced lipid metabolism, and to address this reviewer’s questions, we revised the title and the text with replacing “neuronal” with “‘cell” activity, and we concluded that cold induced lipid metabolism depending on “cell activity” instead of “neuronal activity”. Studying cell type-specific contributions to the cold-induced effects on lipid metabolism will require many efforts beyond the scope of this study, to which we assumed that both neurons and glial cells contribute.

      Additionally, many experiments are only done with the minimal three animals required for statistics and could be more robust with additional animals included.  

      We thank this reviewer for the comments. We added the sample sizes accordingly in this revised manuscript.

      Another weakness is that the authors do not address whether manipulating lipid droplet accumulation or lipid peroxidation has any effect on PVH function (e.g. does it change neuronal activity in the region?).

      We thank this reviewer for bringing up this interesting point. The focus of this study was to examine how cold modulates lipid metabolism in the brain, while it is another interesting project studying how brain lipid metabolism (e.g. manipulating LD accumulation or lipid peroxidation) modulates neuronal activity, which however will require many efforts beyond the scope of this study. Manipulating LD or peroxidation would affect multiple cellular signaling pathways and physiological experimental conditions need to be developed. However, to address this reviewer’s questions, we performed preliminary studies with treating brain slices with the lipid peroxidation inhibitor a-TP and recorded PVH neurons, but did not observe differences in firing rates in a-TPtreated brain slices and controls (Data not shown).  

      Reviewer #2:

      Strengths:

      A set of relatively novel and interesting observations. Creative use of several in vivo sensors and techniques.

      We thank the Reviewer so much for the positive comments on our studies in both concept and techniques. 

      Weaknesses:  

      (1) The physiological relevance of lipolysis and thermogenesis genes in the PVH. The authors need to provide quantitative and substantial characterizations of lipid metabolism in the brain beyond a panel of qPCRs, especially considering these genes are likely expressed at very low levels. mRNA and protein level quantification of genes in Fig 1, in direct comparison to BAT/iWAT, should be provided. Besides bulk mRNA/protein, IHC/ISH-based characterization should be added to confirm to cellular expression of these genes.

      We agreed with the Reviewer’s comments and thank this reviewer for the constructive suggestions. To address this reviewer’s comments and suggestions, we performed additional experiments to verify cold-induced expressions of lipid lipolytic genes and proteins. For example, we stained ATGL and HSL in both neurons and astrocytes in the PVH. Matching with the increased gene expressions, cold increased protein expressions of ATGL (new Figure 2) and HSL (new Figure 3) in both neurons and astrocytes. We also performed western blots of p-HSL and HSL and observed that cold increased the expression level of p-HSL (new Figure 4). These new results support our conclusions and further demonstrate that cold increases lipid metabolism in the PVH.   

      (2) The fiberphotometry work they cited (Chen 2022, Andersen 2023, Sun 2018) used well-established, genetically encoded neuropeptide sensors (e.g., GRABs). The authors need to first quantitatively demonstrate that adapting BD-C11 and EnzCheck for in vivo brain FP could effectively and accurately report peroxidation and lipolysis. For example, the sensitivity, dynamic range, and off-time should all be calibrated with mass spectrometry measurements before any conclusions can be made based on plots in Figures 4, 5, and 6. This is particularly important because the main hypothesis heavily relies on this unvalidated technique.

      We thank this reviewer’s comments. Fiber photometry has been well demonstrated to detect fluorescent-labelled biomolecules in my laboratory and other labs, as indicated in the above stated publications. In this study, we combined photometry with the well commercially developed and validated lipid metabolic fluorescent-labelled biomarkers to monitor lipid metabolic dynamics in vivo. We indeed verified this approach in both brain (this study) and peripheral adipose tissues (another project). Particularly, our data in this study show that lipid peroxidation inhibitor a-TP blocked the cold-induced lipid peroxidation signals (Fig. 7A-C) and the pan-lipase inhibitor DEUP blocked the cold-induced lipolytic signals (Fig. 8A-C). These results demonstrate that the signals detected by photometry indeed reflect lipid peroxidation and lipolysis respectively in the brain. Meanwhile, we agreed with the reviewer’s suggestions on mass spectrometry measurements, while it is not feasible for us to perform the spectrometry in the brain in vivo at this moment.       

      (3) Generally, the histology data need significant improvement. It was not convincing, for example, in Figure 3, how the Fos+ neurons can be quantified based on the poor IF images where most red signals were not in the neurons. 

      We thank this reviewer for this comment. We performed additional experiments to add sample size and presented high quality images. 

      (4) The hypothesis regarding the direct role of brain temperature in cold-induced lipid metabolism is puzzling. From the introduction and discussion, the authors seem to suggest that there are direct brain temperature changes in responses to cold, which could be quite striking. However, this was not supported by any data or experiments. The authors should consolidate their ideas and update a coherent hypothesis based on the actual data presented in the manuscript. 

      We thank this reviewer for bringing up this comment and constructive suggestions. To make this study more concise on the cold-induced lipid metabolism, we removed the statements related to the brain temperature.

      Reviewer #1 (Recommendations For The Authors):

      An additional minor weakness is that the authors are redundant in their discussion, sometimes repeating sections from the introduction (e.g. this line in the discussion "Evidence shows that the brain's energy expenditure efficiency largely depends on the temperature (Yu et al., 2012), and temperature gradients between different brain regions exist (Anderson and Moser, 1995; Delgado and Hanai, 1966; Hayward and Baker, 1968; McElligott and Melzak, 1967; Moser and Mathiesen, 1996; Thornton, 2003)"). 

      We thank the Reviewer for these comments. We revised the text following the suggestions accordingly and removed the statements and references related to brain temperatures.

    1. eLife Assessment

      This important study describes a first-in-human trial of autologous p63+ stem cells in patients with idiopathic pulmonary fibrosis, a lethal condition for which effective treatments are lacking. The authors provide convincing evidence that P63+ progenitor cell therapy can be safely delivered in patients with ILD, warranting movement to a Phase 2. However, given that this is a Phase 1 study with a small sample size, conclusions regarding efficacy should not yet be made.

    2. Reviewer #1 (Public review):

      Summary:

      IPF is a disease lacking regressive therapies which has a poor prognosis, and so new therapies are needed. This ambitious phase 1 study builds on the authors 2024 experience in Sci Tran Med with positive results with autologous transplantation of P63 progenitor cells in patients with COPD. The current study suggests P63+ progenitor cell therapy is safe in patients with ILD. The authors attribute this to acquisition of cells from a healthy upper lobe site, removed from the lung fibrosis. There are currently no cell based therapies for ILD and in this regard the study is novel with important potential for clinical impact if validated in Phase 2 and 3 clinical trials.

      Strengths:

      This study addresses the need for an effective therapy for interstitial lung disease. It offers good evidence the cell used for therapy are safe. In so doing it addresses a concern that some P63+ progenitor cells may be proinflammatory and harmful, as has been raised in the literature (articles which suggested some P63+ cells can promote honeycombing fibrosis; ref 26 &35). The authors attribute the safety they observed (without proof) to the high HOPX expression of administered cells (a marker found in normal Type 1 AECs. The totality of the RNASeq suggests the cloned cells are not fibrogenic. They also offer exploratory data suggesting a relationship between clone roundness and PFT parameters (and a negative association of patient age and clone roundness).

      Weaknesses:

      The authors can conclude they can isolate, clone, expand and administer P63+ progenitor cells safely; but with the small sample size and lack of placebo group no efficacy should be implied.

      Comments on revisions:

      The paper is meritorious as I noted initially

      However, the authors did not directly address several of my concerns-i.e. their responses to the initial review were polite but did not translate into much change in the manuscript.

      (1) Do these progenitor cells exert their beneficial effects by a paracrine mechanism vs transforming into lung AECs? Based on work in the field of bronchopulmonary dysplasia I suspect the benefits are mediated by a paracrine mecahnism and arguably media from these cells should be tested as an alternative to administering the cells themselves. In any case, for the revision a Discussion of the possibility of differentiation vs paracrine mechanisms, citing relevant literature, would be expected. I suggest that you add such a paragraph to a limitation section.

      (2) Please address that potential implications of the fact that 5 patients had essentially normal DLCO/VA values. Saying that the "criterion for entry was DLCO" does not take away from the fact that DLCO/VA is a valid measure of lung diffusion capacity. In the absence of placebo an enrollment of mildly diseased patients would favor positive results (including stability in study endpoint parameters even without treatment). Thus, I suggest again that the limitations section should be more forthright in this regard.

      (3) The authors acknowledge the lack of a placebo group but in a study of mild IPF, I worry that without a placebo group the only robust findings are those related to technique of transplantation and the safety of cell therapy. The paper still reads as if there is a clinical benefit...I would advise you further soften this (while understanding the desire to emphasize a hopeful observation). The price for not having a placebo group must be avoidance of claims of efficacy. The improvements in DLCO and CT in several cases speaks for the need for the planned phase 2 trial, which if positive will be the time to claim efficacy signals.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      IPF is a disease lacking regressive therapies which has a poor prognosis, and so new therapies are needed. This ambitious phase 1 study builds on the authors' 2024 experience in Sci Tran Med with positive results with autologous transplantation of P63 progenitor cells in patients with COPD. The current study suggests that P63+ progenitor cell therapy is safe in patients with ILD. The authors attribute this to the acquisition of cells from a healthy upper lobe site, removed from the lung fibrosis. There are currently no cell-based therapies for ILD and in this regard the study is novel with important potential for clinical impact if validated in Phase 2 and 3 clinical trials. 

      Strengths: 

      This study addresses the need for an effective therapy for interstitial lung disease. It offers good evidence that the cells used for therapy are safe. In so doing it addresses a concern that some P63+ progenitor cells may be proinflammatory and harmful, as has been raised in the literature (articles which suggested some P63+ cells can promote honeycombing fibrosis; references 26 &35). The authors attribute the safety they observed (without proof) to the high HOPX expression of administered cells (a marker found in normal Type 1 AECs. The totality of the RNASeq suggests the cloned cells are not fibrogenic. They also offer exploratory data suggesting a relationship between clone roundness and PFT parameters (and a negative association between patient age and clone roundness). 

      We thank the reviewer for the important comments.

      Weaknesses: 

      The authors can conclude they can isolate, clone, expand, and administer P63+ progenitor cells safely; but with the small sample size and lack of a placebo group, no efficacy should be implied.

      We thank the reviewer for the suggestion and agree that we should be more cautious to discuss the efficacy of current study. 

      Specific points: 

      (1) The authors acknowledge most study weaknesses including the lack of a placebo group and the concurrent COVID-19 in half the subjects (the high-dose subjects). They indicate a phase 2 trial is underway to address these issues. 

      N/A

      (2) The authors suggest an efficacy signal on pages 18 (improvement in 2 subjects' CT scans) and 21 (improvement in DLCO) but with such a small phase 1 study and such small increases in DLCO (+5.4%) the authors should refrain from this temptation (understandable as it is). 

      We believe that exploring potential efficacy signal is also one aim of this study. All these efficacy endpoint analyses had been planned in prior to the start of clinical trials (as registered in ClinicalTrial.gov) and the data need be analyzed anyhow.

      (3) Likewise most CT scans were unchanged and those that improved were in the mid-dose group (albeit DLCO improved in the 2 patients whose CT scans improved). 

      Yes, it is.

      (4) The authors note an impressive 58m increase in 6MWTD in the high-dose group but again there is no placebo group, and the low-dose group has no net change in 6MWTD at 24 weeks. 

      Yes.

      (5) I also raise the question of the enrollment criteria in which 5 patients had essentially normal DLCO/VA values. In addition there is no discussion as to whether the transplanted stem cells are retained or exert benefit by a paracrine mechanism (which is the norm for cell-based therapies).

      Thank you for your detailed feedback.  The enrollment criteria are based on DLCO instead of DLCO/VA. And we would like to further discuss the possible benefit by paracrine mechanism in the revised manuscript.

      Recommendations for the authors: 

      (1) Four of the enrolled subjects had normal DLCO/VA (% of predicted) (>90% of predicted). This raises questions about the severity of their illness see: Table 1: Subjects 103, 105, 112, and 204 have DLCO/VA % predicted >90% of predicted and would appear not to qualify for the study. While technically enrollment criteria for DLCO are satisfied, DLCO/VA is an equally valid measure of ILD severity, and these 4 cases seem very mild. 

      Thank you for your detailed feedback. Yes, the current inclusion criteria is based on DLCO but not DLCO/VA.  And we believe improvement of DLCO and DLCO/VA is both meaningful. In future trial, we will consider DLCO/VA as inclusion criteria as well.

      (2) The authors state "Resolution of honeycomb lesion was also observed in patients of higher dose groups". This appears inaccurate as only 2 subjects in the study showed CT improvement and they were not in the highest dose group. This statement is an overreach for a Phase 1 study and should be removed from the abstract and more balance inserted in the text. The phase 2 study they are doing will answer these questions. 

      Thank you. We changed our statement about efficacy in the abstract part.

      a) Under exclusion criteria: More detail is required as to what defines "subjects who cannot tolerate cell therapy". 

      Those patients cannot tolerate previous cell therapy, for example mesenchymal stem cell transplantation, would not be included in the current trial.

      b) Figure S6 is important and should be in the main manuscript. This Figure shows that many (6) subjects had COVID at some trial measurement time points. This is an unfortunate confounder for efficacy signals (but efficacy is not the point of this study). Second, Figure 6 (in my view) shows little efficacy signal, which is a reminder to the authors that efficacy should not be implied in a study that was not powered to detect efficacy. 

      We agreed that the efficacy should be discussed very carefully.

      (3) Figure S3: It appears at some does there is a significant rise in monocytes (1M cells) and neutrophils (3 M cells). 

      Thank you for your reasonable concerns regarding the safety of the treatment. The monocyte counts in the S3 patients, even after an increase, remains within the reference range, and therefore we consider this elevation to be clinically meaningless. One patient exhibited a significant increase in neutrophils at 24 weeks, which was attributed to a grade II adverse event, acute bronchitis, which was unrelated to cell therapy. The symptoms resolved within three days following treatment with appropriate medication.

      (4) Figure 3: I wonder about the statistical significance of the 6MWD. Was this done by repeat measure ANOVA? The analysis suggests a p=0.08 but all error bars between low and high dose overlap and the biggest difference is at 24 weeks, and that appears to be labelled as not significant.

      Thank you for your kind reminding. The 6MWD result with a p-value of 0.008 was derived to compare the improvement in 6MWD at the 24-week time point versus baseline within the higher group. Therefore, a paired t-test was used for this analysis. In the revised version, we label them more clearly.

      Reviewer #2 (Public review):

      Summary: 

      This manuscript describes a first-in-human clinical trial of autologous stem cells to address IPF. The significance of this study is underscored by the limited efficacy of standard-of-care anti-fibrotic therapies and increasing knowledge of the role p63+ stem cells in lung regeneration in ARDS. While models of acute lung injury and p63+ stem cells have benefited from widespread and dynamic DAD and immune cell remodeling of damaged tissue, a key question in chronic lung disease is whether such cells could contribute to the remodeling of lung tissue that may be devoid of acute and dynamic injury. A second question is whether normal regions of the lung in an otherwise diseased organ can be identified as a source of "normal" p63+ stem cells, and how to assess these stem cells given recently identified p63+ stem cell variants emerging in chronic lung diseases including IPF. Lastly, questions of feasibility, safety, and efficacy need to be explored to set the foundation for autologous transplants to meet the huge need in chronic lung disease. The authors have addressed each of these questions to different extents in this initial study, which has yielded important if incomplete information for many of them. 

      Strengths: 

      As with a previous study from this group regarding autologous stem cell transplants for COPD (Ref. 24), they have shown that the stem cells they propagate do not form colonies in soft agar or cancers in these patients. While a full assessment of adverse events was confounded by a wave of Covid19 infections in the study participants, aside from brief fevers it appears these transplants are tolerated by these patients. 

      We thank the reviewer for the important comments.

      Weaknesses: 

      The source of stem cells for these autologous transplants is generally bronchoscopic biopsies/brushings from 5th-generation bronchi. Although stem cells have been cloned and characterized from nasal, tracheal, and distal airway biopsies, the systematic cloning and analysis of p63+ stem cells across the bronchial generations is less clear. For instance, p63+ stem cells from the nasal and tracheal mucosa appear committed to upper airway epithelia marked by 90% ciliated cells and 10% goblet cells (Kumar et al., 2011. Ref. 14). In contrast, p63+ stem cells from distal lung differentiate to epithelia replete with Club, AT2, and AT1 markers. The spectrum of p63+ stem cells in the normal bronchi of any generation is less studied. In the present study, cells are obtained by bronchoscopy from 3-5 generation bronchi and expanded by in vitro propagation. Single-cell RNA-seq identifies three clusters they refer to as C1, C2, and C3, with the major C1 cluster said to have characteristics of airway basal cells and C2 possibly the same cells in states of proliferation. Perhaps the most immediate question raised by these data is the nature of the C1/C2 cells. Whereas they are clearly p63/Krt5+ cells as are other stem cells of the airways, do they display differentiation character of "upper airway" marked by ciliated/goblet cell differentiation or those of the lung marked by AT2 and AT1 fates? This could be readily determined by 3-D differentiation in so-called airliquid interface cultures pioneered by cystic fibrosis investigators and should be done as it would directly address the validity of the sourcing protocol for autologous cells for these transplants. This would more clearly link the present study with a previous study from the same investigators (Shi et al., 2019, Ref. 9) whereby distal airway stem cells mitigated fibrosis in the murine bleomycin model. The authors should also provide methods by which the autologous cells are propagated in vitro as these could impact the quality and fate of the progenitor cells prior to transplantation. 

      We totally agree that the sub-population of the progenitor cells should be further analyzed. We would try this in the revised manuscript. And the methods to expand P63+ lung progenitor cells have been described in full details by Frank McKeon/Wa Xian group (Rao, et.al., STAR Protocols, 2020), which is adapted to pharmaceutical-grade technology patented by Regend Therapeutics, Ltd.

      The authors should also make a more concerted effort to compare Clusters 1, 2, and 3 with the variant stem cell identified in IPF (Wang et al., 2023, Ref. 27). While some of the markers are consistent with this variant stem cell population, others are not. A more detailed informatics analysis of normal stem cells of the airways and any variants reported could clarify whether the bronchial source of autologous stem cells is the best route to these transplants.  

      We thank for reviewer for the good suggestion and would like to make more detailed comparison in the revised manuscript.

      Other than these issues the authors should be commended for these firstin-human trials for this important condition.

      Thank you so much for the kind compliment.

      Recommendations for the authors: 

      Described in the review text but the authors need to be clear about how they propagated autologous stem cells in vitro.

      (1) Perhaps the most immediate question raised by these data is the nature of the C1/C2 cells. Whereas they are clearly p63/Krt5+ cells as are other stem cells of the airways, do they display differentiation character of "upper airway" marked by ciliated/goblet cell differentiation or those of the lung marked by AT2 and AT1 fates?

      The differentiation potential of the P63+/KRT5+ basal progenitor cells have been analyzed in multiple previous literatures, which are mentioned in the revised introduction part. Basically, the human P63+ progenitor cells can differentiate into airway epithelial cells in the airway area, while give rise to immature, but functional AT1 cells in alveolar area.

      (2) The authors should also provide methods by which the autologous cells are propagated in vitro as these could impact the quality and fate of the progenitor cells prior to transplantation.

      The methods to expand P63+ lung progenitor cells have been described in full details by Frank McKeon/Wa Xian group (Rao, et.al., STAR Protocols, 2020), which is adapted to pharmaceutical-grade technology patented by Regend Therapeutics, Ltd.

      (3) A more detailed informatics analysis of normal stem cells of the airways and any variants reported could clarify whether the bronchial source of autologous stem cells is the best route to these transplants.

      We thank the reviewer for the kind suggestion and have included the comparative analysis in revised Figure S2.

    1. eLife Assessment

      The authors implement a valuable multi-tissue approach to dissect the physiologic consequences of JNK inhibition in parallel with dietary perturbation via sucrose. The conclusions of disrupted liver, muscle and adipose metabolism being central to these effects are solid, as they are supported by a combination of experimental dissection and network modeling approaches.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, authors have investigated the effects of JNK inhibition on sucrose-induced metabolic dysfunction in rats. They used multi-tissue network analysis to study the effects of the JNK inhibitor JNK-IN-5A on metabolic dysfunction associated with excessive sucrose consumption. Their results show that JNK inhibition reduces triglyceride accumulation and inflammation in the liver and adipose tissues while promoting metabolic adaptations in skeletal muscle. The study provides new insights into how JNK inhibition can potentially treat metabolic dysfunction-associated fatty liver disease (MAFLD) by modulating inter-tissue communication and metabolic processes.

      Strengths:

      The study has several notable strengths:

      Comprehensive Multi-Tissue Analysis: The research provides a thorough multi-tissue evaluation, examining the effects of JNK inhibition across key metabolically active tissues, including the liver, visceral white adipose tissue, skeletal muscle, and brain. This comprehensive approach offers valuable insights into the systemic effects of JNK inhibition and its potential in treating MAFLD.

      Robust Use of Systems Biology: The study employs advanced systems biology techniques, including transcriptomic analysis and genome-scale metabolic modeling, to uncover the molecular mechanisms underlying JNK inhibition. This integrative approach strengthens the evidence supporting the role of JNK inhibitors in modulating metabolic pathways linked to MAFLD.

      Potential Therapeutic Insights: By demonstrating the effects of JNK inhibition on both hepatic and extrahepatic tissues, the study offers promising therapeutic insights into how JNK inhibitors could be used to mitigate metabolic dysfunction associated with excessive sucrose consumption, a key contributor to MAFLD.

      Behavioral and Metabolic Correlation: The inclusion of behavioral tests alongside metabolic assessments provides a more holistic view of the treatment's effects, allowing for a better understanding of the broader physiological implications of JNK inhibition.

      Weaknesses:

      The authors have adequately addressed all my concerns, and the revisions have significantly improved the manuscript's clarity and impact.

    3. Reviewer #2 (Public review):

      Excessive sucrose is a possible initial factor for the development of metabolic dysfunction-associated fatty liver disease (MAFLD). To investigate the possibility that intervention with JNK inhibitor could lead to the treatment of metabolic dysfunction caused by excessive sucrose intake, the authors performed multi-organ transcriptomics analysis (liver, visceral fat (vWAT), skeletal muscle, and brain) in a rat model of MAFLD induced by sucrose overtake (+ JNK inhibitor treatment).

      The major strengths and weakness of this study are as follows.

      Strengths:

      ・It has been previously reported that inhibition of JNK signalling can contribute to the prevention of hepatic steatosis (HS) and related metabolic syndrome in other models, but the role of JNK signalling in the metabolic disruption caused by excessive intake of sucrose, a possible initial factor for the development of MAFLD, has not been well understood, and the authors have addressed this point.<br /> ・This study is also important because pharmacological therapy for MAFLD has not yet been established.<br /> ・By obtaining transcriptomic data in multiple organs and comprehensively analyzing the data using gene co-expression network (GCN) analysis and genome-scale metabolic models (GEM), the authors showed the multi-organ interaction in not only in the pathology of MAFLD caused by excessive sucrose intake but also in the treatment effects by JNK-IN-5A.<br /> ・Since JNK signalling has diverse physiological functions in many organs, the authors effectively assessed possible side effects with a view to the clinical application of JNK-IN-5A.

      Weaknesses:

      ・The metabolic process activities were evaluated using RNA-seq results in Figure 7, but direct data such as metabolite measurements are lacking.<br /> ・There is a lack of consistency in the data between JNK-IN-5A_D1 and _D2, and there is no sufficient data-based explanation for why the effects observed in D1 were inconsistent in the D2 samples.<br /> ・Although it is valuable that the authors were able to suggest the possibility of JNK inhibitor as a therapeutic strategy for MAFLD, the evaluation of the therapeutic effect was limited to evaluation of plasma TG, LDH, and gene expression changes. As there was no evaluation of liver tissue images, it is unclear what changes were brought about in the liver by the excessive sucrose intake and the treatment with JNK-IN-5A.

      As mentioned in the Weakness section, biological data is insufficient, such as the lack of metabolite measurements and a histological evaluation of the liver. However, overall, the authors successfully provided the valuable insights that the JNK inhibitor has a cross-organ therapeutic effect on their MAFLD model induced by sucrose overtake. Their insist is supported by convincing data, comprehensively analysing the transcriptomic data obtained from multiple organs using GCN (gene co-expression network) analysis and GEM (genome-scale metabolic modelling).

      Their comprehensive transcriptomic analysis in multiple organs, including the brain, has demonstrated that the effects of drugs are more widespread than just on specific tissues thought to be the main target, indicating the importance of focusing on tissue interactions when we assess the effects of drugs. Also, the data set in this study will be useful for comparative evaluation with transcriptomics data for other MALFD models.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, authors have investigated the effects of JNK inhibition on sucrose-induced metabolic dysfunction in rats. They used multi-tissue network analysis to study the effects of the JNK inhibitor JNK-IN-5A on metabolic dysfunction associated with excessive sucrose consumption. Their results show that JNK inhibition reduces triglyceride accumulation and inflammation in the liver and adipose tissues while promoting metabolic adaptations in skeletal muscle. The study provides new insights into how JNK inhibition can potentially treat metabolic dysfunction-associated fatty liver disease (MAFLD) by modulating inter-tissue communication and metabolic processes.

      Strengths:

      The study has several notable strengths:

      Comprehensive Multi-Tissue Analysis: The research provides a thorough multi-tissue evaluation, examining the effects of JNK inhibition across key metabolically active tissues, including the liver, visceral white adipose tissue, skeletal muscle, and brain. This comprehensive approach offers valuable insights into the systemic effects of JNK inhibition and its potential in treating MAFLD.

      Robust Use of Systems Biology: The study employs advanced systems biology techniques, including transcriptomic analysis and genome-scale metabolic modeling, to uncover the molecular mechanisms underlying JNK inhibition. This integrative approach strengthens the evidence supporting the role of JNK inhibitors in modulating metabolic pathways linked to MAFLD.

      Potential Therapeutic Insights: By demonstrating the effects of JNK inhibition on both hepatic and extrahepatic tissues, the study offers promising therapeutic insights into how JNK inhibitors could be used to mitigate metabolic dysfunction associated with excessive sucrose Behavioral and Metabolic Correlation: The inclusion of behavioral tests alongside metabolic assessments provides a more holistic view of the treatment's effects, allowing for a better understanding of the broader physiological implications of JNK inhibition.

      Weaknesses:

      While the study provides a comprehensive evaluation of JNK inhibitors in mitigating MAFLD conditions, addressing the following points will enhance the manuscript's quality:

      The authors should explicitly mention and provide a detailed list of metabolites affected by sucrose and JNK inhibition treatment that have been previously associated with MAFLD conditions. This will better contextualize the findings within the broader field of metabolic disease research.

      We fully agreed on this constructive suggestion to improve our understanding of the metabolic effect of JNK inhibition under sucrose overconsumption. While technical limitations made it challenging to directly analyze metabolites in the current study, we employed genome-scale metabolic modeling—a robust approach for studying metabolism—to predict the metabolic pathways potentially impacted by the interventions (Fig. 7 and Data S8). Additionally, as part of this revision, we conducted an extensive literature review to identify metabolites previously reported to be affected by sucrose consumption in MAFLD rodent models and MASLD patients. A detailed summary of these metabolites is now presented in attached Table 1 and several of these metabolites have been incorporated into the revised results section (Lines 308-314) to support some of the predicted metabolic activities.

      “Some of the predicted metabolic changes align with previous findings in rodents subjected to sucrose overconsumption. For example, Öztürk et al. reported altered tryptophan metabolism, including decreased serum levels of kynurenic acid and kynurenine, in rats consuming 10% sucrose in drinking water. Similarly, increased triglyceride-bound oleate, palmitate, and stearate were observed in the livers of rats fed a 10% sucrose solution, indicating JNK-IN-5A treatment may regulate lipid metabolism by modulating these metabolic activities.”

      It is important to note, however, that data on metabolites specifically affected by JNK inhibition in MASLD contexts remains lacking in the literature. The predicted metabolites and associated metabolic pathways in the current study could provide a starting point for such exploration in future studies. We have emphasized this in the revised manuscript and highlighted the need for further studies to explore these mechanisms in greater detail.

      Author response table 1.

      Metabolites associated with sucrose overconsumption in MASLD.

      The limitations of the study should be clearly stated, particularly the lack of evidence on the effects of chronic JNK inhibitor treatment and potential off-target effects. Addressing these concerns will offer a more balanced perspective on the therapeutic potential of JNK inhibition.

      Thank you for this constructive comment. We have acknowledged limitations of the current study in Discussion section (Lines 397-406) of the revised manuscript:

      “Nevertheless, several limitations warrant consideration. First, while we observed transcriptional adaptations in skeletal muscle tissue following treatment, the exact molecular mechanisms underlying these changes and their roles in skeletal muscle function and systemic metabolic homeostasis remain unclear. Further investigation is warranted to elucidate the muscle-specific effects of JNK inhibition. Second, our study did not investigate the dosedependent or potential off-target effects of JNK-IN-5A, particularly its activity on other members of the kinase family and associated signaling pathways. Lastly, the long-term effects of JNKIN-5A administration remain unexplored. Understanding its prolonged impact across different stages of MAFLD, including advanced MASH, is crucial for assessing the full therapeutic potential of JNK inhibition in the treatment of MAFLD.“

      The potential risks of using JNK inhibitors in non-MAFLD conditions should be highlighted, with a clear distinction made between the preventive and curative effects of these therapies in mitigating MAFLD conditions. This will ensure the therapeutic implications are properly framed.

      Thank you for this insightful suggestion. The potential risks of using JNK inhibitors in nonMAFLD conditions have been considered and are now highlighted in Lines 369-390 of the revised discussion

      “Although overactivated JNK activity presents an attractive opportunity to combat MAFLD, inhibition of JNK presents substantial challenges and potential risks due to its broad and multifaceted roles in many cellular processes. One key challenge is the dual role of JNK signaling (Lamb et al., 2003). For instance, long-term JNK inhibition may disrupt liver regeneration, as JNK plays a critical role in liver repair by regulating hepatocyte proliferation and survival following injury or stress (Papa and Bubici, 2018). In HCC, it has been reported that JNK acts as both a tumor promoter, driving inflammation, fibrosis, and metabolic dysregulation, and a tumor suppressor, facilitating apoptosis and cell cycle arrest in damaged hepatocytes. Its inhibition, therefore, carries the risk of inadvertently promoting tumor progression under certain conditions (Seki et al., 2012). Furthermore, the differential roles of JNK isoforms (JNK1, JNK2, JNK3) and a lack of specificity of JNK inhibitors present another layer of complexity. Given these challenges, while our study demonstrated the potential of JNK-IN-5A in mitigating early metabolic dysfunction in the liver and adipose tissues, JNK targeting strategies should be carefully tailored to the disease stage under investigation. For curative approaches targeting advanced MAFLD, such as MASH, future studies are warranted to address considerations related to dosing, tissue specificity, and the long-term effects.”

      The statistical analysis section could be strengthened by providing a justification for the chosen statistical tests and discussing the study's power. Additionally, a more detailed breakdown of the behavioral test results and their implications would be beneficial for the overall conclusions of the study.

      We would like to thank you for this constructive suggestion. In this study, differences among more than two groups were tested using ANOVA or Kruskal-Wallis test based on the normality testing (Shapiro–Wilk test) on the data (continuous variables from different measurements). Pairwise comparisons, were performed using Tukey’s post hoc test following ANOVA or Dunn’s multiple comparisons post hoc test following the Kruskal-Wallis test, as appropriate. 

      The study used 11 animals per group, a group size widely used in preclinical animal research [13]. To evaluate the power of this study design to detect group differences, we conducted a power analysis using G*Power 3.1 software [14], with ANOVA used as an example. The power analysis revealed the following:

      - For a small effect size (partial eta.sq = 0.01), the power was 7.5% at 𝑝<0.05.

      - For a medium effect size (partial eta.sq = 0.06), the power was 23.7% at 𝑝<0.05.

      - For a large effect size (partial eta.sq = 0.14), the power is 55.4% at 𝑝<0.05

      Bonapersona et al. reported that the median statistical power in animal studies is often between 15–22% [15], the achieved power of the current study design is within the range observed in most exploratory animal research. However, we acknowledge that the power for detecting smaller effects within groups is limited, which is also a common challenge in animal research due to ethical considerations on increasing sample sizes.

      As suggested, we’ve revised the ‘Statistical Analysis’ and ‘Result’ sections to improve clarity:

      “Statistical Analysis:

      Data were shown as mean ± standard deviation (SD), unless stated otherwise. The assumption of normality for continuous variables from behavior test, biometric measurements, and plasm biochemistry was determined using the Shapiro–Wilk test. Differences among multiple groups were tested by ANOVA or, for data that were not normally distributed, the non-parametric Kruskal-Wallis test. Pairwise comparisons were performed using Tukey’s post hoc test following the ANOVA or Dunn’s multiple comparisons post hoc test following the Kruskal-Wallis test, as appropriate. The Jaccard index was used to evaluate the similarity and diversity of two gene sets, and a  hypergeometric test was used to test the significance of their overlap. All results were considered statistically significant at p < 0.05, unless stated otherwise.”

      Behavior tests (Lines 150-157):

      “We found no significant differences among groups in retention latencies, a measure of learning and memory abilities in passive avoidance test (Data S3). Additionally, the locomotor activity test was used to analyze behaviors such as locomotion, anxiety, and depression in rat. No significant differences were observed among groups in stereotypical movements, ambulatory activity, rearing, resting percentage, and distance travelled (Data S4). Similarly, the elevated plus maze test (Walf and Frye, 2007), an assay for assessing anxiety-like behavior in rodents, showed that rats in all groups had comparable open-arm entries and durations (Data S5). Collectively, the behavior tests indicate the JNK-IN-5A-treated rats exhibit no evidence of anxiety and behavior disorders.”

      Reviewer #2 (Public review):

      Summary:

      Excessive sucrose is a possible initial factor for the development of metabolic dysfunctionassociated fatty liver disease (MAFLD). To investigate the possibility that intervention with JNK inhibitor could lead to the treatment of metabolic dysfunction caused by excessive sucrose intake, the authors performed multi-organ transcriptomics analysis (liver, visceral fat (vWAT), skeletal muscle, and brain) in a rat model of MAFLD induced by sucrose overtake (+ a selective JNK2 and JNK3 inhibitor (JNK-IN-5A) treatment). Their data suggested that changes in gene expression in the vWAT as well as in the liver contribute to the pathogenesis of their MAFLD model and revealed that the JNK inhibitor has a cross-organ therapeutic effect on it.

      Strengths:

      (1)It has been previously reported that inhibition of JNK signaling can contribute to the prevention of hepatic steatosis (HS) and related metabolic syndrome in other models, but the role of JNK signaling in the metabolic disruption caused by excessive intake of sucrose, a possible initial factor for the development of MAFLD, has not been well understood, and the authors have addressed this point.

      (2)This study is also important because pharmacological therapy for MAFLD has not yet been established.

      (3)By obtaining transcriptomic data in multiple organs and comprehensively analyzing the data using gene co-expression network (GCN) analysis and genome-scale metabolic models (GEM), the authors showed the multi-organ interaction in not only in the pathology of MAFLD caused by excessive sucrose intake but also in the treatment effects by JNK-IN-5A.

      (4) Since JNK signaling has diverse physiological functions in many organs, the authors effectively assessed possible side effects with a view to the clinical application of JNK-IN-5A.

      Weaknesses:

      (1) The metabolic process activities were evaluated using RNA-seq results in Figure 7, but direct data such as metabolite measurements are lacking.

      Thank you for these valuable insights. We fully agree that direct metabolite measurements would provide a deeper understanding of the metabolic impact of sucrose overconsumption and JNK-IN-5A administration. Unfortunately, due to technical limitations, we were unable to directly measure metabolites in this study. To address this, we supported our genome-scale metabolic modeling predictions with an extensive literature review, which is summarized in attached Table 1. This table highlights key metabolites and associated metabolic pathways that have been previously associated with sucrose overconsumption in MAFLD contexts. We incorporated some of these metabolites into the revised results section (Lines 308–314) to demonstrate the consistency between our predicted metabolic changes and experimental findings from the literature. For instance, studies have reported altered tryptophan metabolism, including decreased serum kynurenic acid and kynurenine levels, as well as increased triglyceride-bound oleate, palmitate, and stearate in sucrose-fed rodents. These findings align with our predictions of altered metabolic activities in fatty acid oxidation, fatty acid synthesis, and tryptophan metabolism.

      (2) There is a lack of consistency in the data between JNK-IN-5A_D1 and _D2, and there is no sufficient data-based explanation for why the effects observed in D1 were inconsistent in the D2 samples.

      Thank you for raising this important point regarding the differences between the two dosages. As this was not the primary focus of the current study and we do not have sufficient data to fully explain these observations. Our speculation is that this may arise from pharmacokinetic differences associated with the dosing of this small molecule inhibitor, including potential saturation of transport mechanisms, alter tissue distribution, or off-target effects.

      (3) Although it is valuable that the authors were able to suggest the possibility of JNK inhibitor as a therapeutic strategy for MAFLD, the evaluation of the therapeutic effect was limited to the evaluation of plasma TG, LDH, and gene expression changes. As there was no evaluation of liver tissue images, it is unclear what changes were brought about in the liver by the excessive sucrose intake and the treatment with JNK-IN-5A.

      We acknowledge that the lack of histological evaluations may limit to having a complete picture of the interventions' effects. However, as you noted, our transcriptional and systems-wide investigation across multiple tissues provides novel and significant insights into the molecular and systemic impacts of JNK-IN-5A treatment.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) It would be useful to explain why the authors conducted their research using female rats but not male rats.

      Thank you for raising this insightful point. We chose female rats for the current study was based on several considerations. 1) Previous research has demonstrated that female rats exhibit metabolic dysfunction (e.g., hypertriglyceridemia, liver steatosis, insulin resistance) in response to dietary factors, such as high-sucrose feeding [16-19]. These metabolic characteristics made them an appropriate model for assessing the in vivo effects of JNK inhibition under high-sucrose conditions. 2) It is also reported that female rats show resilience to high-sucrose-induced metabolic dysfunction due to the protective effects of estrogen [8], we aimed to determine whether JNK inhibition could provide therapeutic benefits in this context. This allows us to evaluate the effect of JNK inhibition even in metabolically advantaged groups. 3) Our results from the tolerance test (Fig. 2a) indicated that female rats displayed more fluctuating variation to JNK-IN-5A administration. This variation allowed us to evaluate how JNK inhibition influences metabolic outcomes in a sex that is more responsive to the intervention. Nonetheless, we emphasize the importance of future studies involving male rats to better understand sex-specific responses to JNK inhibition and to provide more comprehensive guidance for the development of JNK-targeting therapies in MAFLD treatment.

      (2) Figure 2C shows that JNK-IN-5A administration reduces the mRNA levels of Mapk8 and Mapk9 in the liver and the SkM. It would be useful to provide the authors' insight into the data. 

      In the liver, the data in Fig. 2c in original submission and the attached Fig. 1 show that sucrose feeding induces opposite alterations in the mRNA expression of Mapk8 (Jnk1, increased, log2FC<sub>SucrosevsControl</sub>= 0.02) and Mapk9 (Jnk2, decreased, log2FC<sub>SucrosevsControl</sub>= -0.43), though these changes do not reach statistical significance. JNK-IN-5A administration reverses these effects, significantly decreasing Mapk8 expression (log2FC<sub>Sucrose+JNK_D1vsSucrose</sub>= -0.37) while increasing Mapk9 expression (log2FC<sub>Sucrose+JNK_D1vsSucrose</sub>= 0.42). This suggests potential differential yet compensatory roles of these two isoforms in regulating JNK activity during these interventions in the liver, keeping in line with the findings from Jnk1- and/or Jnk2-specific knockout studies [20, 21]. Additionally, emerging evidence indicates that Jnk1 plays a major role in diet-induced liver fibrosis and metabolic dysfunction [22-25]. Therefore, the reduced Mapk8 expression following JNK-IN-5A administration may contribute to the observed improvements in liver metabolism.

      Author response image 1.

      The spearman correlation between expression levels of Mapk8

      In skeletal muscle, the primary site for insulin-stimulated glucose uptake, insulin signaling is crucial for maintaining metabolic homeostasis [26]. Numerous studies have demonstrated that JNK activation promotes insulin resistance and targeting JNK might be a promising therapeutic strategy for the treatment of metabolic diseases associated with insulin resistance, such as MAFLD [24]. In our study, while sucrose overconsumption did not significantly alter the mRNA levels of JNK isoforms in this tissue, JNK-IN-5A at dosage 30 mg/kg/day administration significantly reduced the expression of both Jnk1 and Jnk2 as well as genes involved in insulin signaling (Fig. 5). This suggests a potential interplay between JNK inhibition and insulin signaling pathways in the skeletal muscle, where inhibition of JNK activity may improve insulin sensitivity by modulating these pathways. However, it is also crucial  to investigate the longterm effects of JNK-IN-5A administration and its broader impact on many other physiological processes regulated by the JNK pathway. These aspects will be a focus of our future studies.

      (3) The notations a and b in Figure S5 are missing.  

      Thank you for this constructive comment. We have corrected this in the revised figure S5.

      (4) Data S13 described in the figure legend for Figure 7 (lines 630 and 632) seems a mistake and should be Data S8.

      (5) The notations a, b, and c in Figure 7 are incorrect. The figure legend for Figure 7a doesn't seem to match the figure contents.

      We appreciate your attention to details regarding Fig. 7. We have corrected the reference and the figure legend in revised Fig. 7.

      Reference

      (1) Fujii, A., et al., Sucrose Solution Ingestion Exacerbates DinitrofluorobenzeneInduced Allergic Contact Dermatitis in Rats. Nutrients, 2024. 16(12).

      (2) Sun, S., et al., High sucrose diet-induced dysbiosis of gut microbiota promotes fatty liver and hyperlipidemia in rats. J Nutr Biochem, 2021. 93: p. 108621.

      (3) Qi, S., et al., Inositol and taurine ameliorate abnormal liver lipid metabolism induced by high sucrose intake. Food Bioscience, 2024. 60: p. 104368.

      (4) Ramos-Romero, S., et al., The Buckwheat Iminosugar d-Fagomine Attenuates Sucrose-Induced Steatosis and Hypertension in Rats. Mol Nutr Food Res, 2020. 64(1): p. e1900564.

      (5) Ortiz, S.R. and M.S. Field, Sucrose Intake Elevates Erythritol in Plasma and Urine in Male Mice. J Nutr, 2023. 153(7): p. 1889-1902.

      (6) Beckmann, M., et al., Changes in the human plasma and urinary metabolome associated with acute dietary exposure to sucrose and the identification of potential biomarkers of sucrose intake. Mol Nutr Food Res, 2016. 60(2): p. 444-57.

      (7) He, X., et al., High Fat Diet and High Sucrose Intake Divergently Induce Dysregulation of Glucose Homeostasis through Distinct Gut Microbiota-Derived Bile Acid Metabolism in Mice. J Agric Food Chem, 2024. 72(1): p. 230-244.

      (8) Stephenson, E.J., et al., Chronic intake of high dietary sucrose induces sexually dimorphic metabolic adaptations in mouse liver and adipose tissue. Nat Commun, 2022. 13(1): p. 6062.

      (9) Mock, K., et al., High-fructose corn syrup-55 consumption alters hepatic lipid metabolism and promotes triglyceride accumulation. J Nutr Biochem, 2017. 39: p. 32-39.

      (10) Eryavuz Onmaz, D. and B. Ozturk, Altered Kynurenine Pathway Metabolism in Rats Fed Added Sugars. Genel Tıp Dergisi, 2022. 32(5): p. 525-529.

      (11) Gariani, K., et al., Eliciting the mitochondrial unfolded protein response by nicotinamide adenine dinucleotide repletion reverses fatty liver disease in mice. Hepatology, 2016. 63(4): p. 1190-204.

      (12) Togo, J., et al., Impact of dietary sucrose on adiposity and glucose homeostasis in C57BL/6J mice depends on mode of ingestion: liquid or solid. Mol Metab, 2019. 27: p. 22-32.

      (13) Arifin, W.N. and W.M. Zahiruddin, Sample Size Calculation in Animal Studies Using Resource Equation Approach. Malays J Med Sci, 2017. 24(5): p. 101-105.

      (14) Faul, F., et al., G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods, 2007. 39(2): p. 175-91.

      (15) Bonapersona, V., et al., Increasing the statistical power of animal experiments with historical control data. Nat Neurosci, 2021. 24(4): p. 470-477.

      (16) Kendig, M.D., et al., Metabolic EYects of Access to Sucrose Drink in Female Rats and Transmission of Some EYects to Their OYspring. PLoS One, 2015. 10(7): p. e0131107.

      (17) Harris, R.B.S., Source of dietary sucrose influences development of leptin resistance in male and female rats. Am J Physiol Regul Integr Comp Physiol, 2018. 314(4): p. R598-R610.

      (18) Velasco, M., et al., Sexual dimorphism in insulin resistance in a metabolic syndrome rat model. Endocr Connect, 2020. 9(9): p. 890-902.

      (19) Maniam, J., C.P. Antoniadis, and M.J. Morris, The eYect of early-life stress and chronic high-sucrose diet on metabolic outcomes in female rats. Stress, 2015. 18(5): p. 524-37.

      (20) Singh, R., et al., DiYerential eYects of JNK1 and JNK2 inhibition on murine steatohepatitis and insulin resistance. Hepatology, 2009. 49(1): p. 87-96.

      (21) Sabapathy, K., et al., Distinct roles for JNK1 and JNK2 in regulating JNK activity and c-Jun-dependent cell proliferation. Mol Cell, 2004. 15(5): p. 713-25.

      (22) Zhao, G., et al., Jnk1 in murine hepatic stellate cells is a crucial mediator of liver fibrogenesis. Gut, 2014. 63(7): p. 1159-72.

      (23) Czaja, M.J., JNK regulation of hepatic manifestations of the metabolic syndrome. Trends Endocrinol Metab, 2010. 21(12): p. 707-13.

      (24) Solinas, G. and B. Becattini, JNK at the crossroad of obesity, insulin resistance, and cell stress response. Mol Metab, 2017. 6(2): p. 174-184.

      (25) Schattenberg, J.M., et al., JNK1 but not JNK2 promotes the development of steatohepatitis in mice. Hepatology, 2006. 43(1): p. 163-72.

      (26) Sylow, L., et al., The many actions of insulin in skeletal muscle, the paramount tissue determining glycemia. Cell Metab, 2021. 33(4): p. 758-780.

    1. eLife Assessment

      In their important manuscript, Costa et al. establish an in vitro model for dorsal root ganglion (DRG) axonal asymmetry, revealing that central and peripheral axon branches have distinct patterns of microtubule populations that are linked to their differential regenerative capacities. The authors employ creative tissue culture methods to demonstrate how these branches develop uniquely in vitro, offering a potential explanation for long-observed regeneration disparities. The convincing evidence provides a contribution to our understanding of the neuronal cytoskeleton and axonal regeneration.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes a new in vitro model for DRG neurons that recapitulates several key differences between the peripheral and central branches of DRG axons in vivo. These differences include morphology (with one branch being thinner than the other), and regenerative capacity (with the peripheral branch displaying higher regenerative capacity). The authors analyze the abundance of various microtubule associated protein (MAPs) in each branch, as well as the microtubule dynamics in each branch and find significant differences between branches. Importantly, they found that a well-known conditioning paradigm (prior lesion of the peripheral branch improves the regenerative capacity of the central branch) is not only reproduced in this system, but also leads to loss of the asymmetry of MAPs between branches. Zooming in on one MAP that shows differential abundance between the axons, they find that the severing enzyme Spastin is required for the asymmetry in microtubule dynamics and in regenerative capacity following a conditioning lesion

      Strengths:

      The establishment of an experimental system that recapitulates DRG axon asymmetry in vitro is an important step that is likely to be useful for other studies. In addition, identifying key molecular signatures that differ between central and peripheral branches, and determining how they are lost following a conditioning lesion adds to our understanding of why peripheral axons have a better regenerative capacity. Last, the authors use of an in vivo model system to support some of their in vitro findings is a strength of this work.

      Weaknesses:

      One weakness of the manuscript is that to a large degree, one of its main conclusions (MAP symmetry underlies differences in regenerative capacity) relies mainly on a correlation, without firmly establishing a causal link. However, this is weakness is relatively minor because (1) it is partially addressed with the Spastin KO and (2) there isn't a trivial way to show a causal relationship in this case. (3) It is addressed in the Discussion section.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to develop a tissue culture method in which to study the different regenerative abilities of the central and peripheral branch of sensory axons. Neurons developed a small and large branch, which have different regenerative abilities, different transport rates and different microtubule properties. The study provides convincing evidence that the two axonal branches differ in a way to corresponds to in vivo. The different regenerative abilities of the two branches are an important observation, because until now it has not been clear whether this difference is intrinsic to the neuron and axons or due to differences in the environment surrounding the axons. The authors have then looked for molecular explanations of the differences between the branches. They find different transport rates and different microtubule dynamics. The different microtubule dynamics are explained by differing levels of spastin, an enzyme that severs microtubules encouraging dynamics.

      Strengths:

      The differences between the two branches are clearly shown, together with differences in transport, microtubule dynamics and regeneration. The in vitro model is novel and could be widely used. The methods used are robust and generally accepted.

      Weaknesses:

      The revised version of the paper has addressed the weaknesses that were identified.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Costa and colleagues investigate how asymmetry in dorsal root ganglion (DRG) neurons is established. The authors developed an in vitro system that mimics the pseudo-unipolar morphology and asymmetry of DRG neurons during the regeneration of the peripheral and central branch axons. They suggest that central-like DRG axons exhibit a higher density of growing microtubules. By reducing the polymerization of microtubules in these central-like axons, they were able to eliminate the asymmetry in DRG neurons.

      Strengths:

      The authors point out a distinct microtubule-associated protein signature that differentiates between DRG neurons' central and peripheral axonal branches. Experimental results demonstrate that genetic deletion of spastin eliminated the differences in microtubule dynamics and axon regeneration between the central and peripheral branches.

      Weaknesses:

      While some of the data are compelling, experimental evidence does not fully support the main claims.

      In its current form, the study is primarily descriptive and lacks convincing mechanistic insights. It misses important controls and further validation using 3D in vitro models.

      The significance of studying microtubule polymerization to DRG asymmetry in vitro is questionable, especially considering the model's validity. Classifying the central and peripheral-like branches in cultured DRG neurons will require further in-depth characterization. Additional validation using adult DRG neuron cultures not aged in vitro will be required in future studies.

      The comparison of asymmetry associated with a regenerative response between in vitro and in vivo paradigms has significant limitations due to the nature of the in vitro culture system. When cultured in isolation, DRG neurons fail to form functional connections with appropriate postsynaptic target neurons (the central branch) or to differentiate the peripheral domains associated with the innervation of target organs. Rather than growing neurons on a flat, hard surface like glass, more physiologically relevant substrates and/or culturing conditions should be considered. This approach could help eliminate potential artifacts caused by plating adult DRG neurons on a flat surface. Additionally, the authors should consider replicating their findings in a 3D culture model or using dorsal root ganglia explants, where both centrally and peripherally projecting axons are present.

      Panels 5H-J require additional processing with astrocyte markers to accurately define the lesion borders. Furthermore, including a lower magnification would facilitate a direct comparison of the lesion site. The use of cholera toxin subunit B (CTB) to trace dorsal column sensory axons is prone to misinterpretation, as the tracer accumulates at the axon's tip. This limitation makes it extremely challenging to distinguish between regenerating and degenerating axons.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public review)

      Weaknesses: 

      The main weakness of the manuscript is that to a large degree, one of its main conclusions (MAP symmetry underlies differences in regenerative capacity) relies mainly on a correlation, without firmly establishing a causal link. However, this weakness is relatively minor because (1) it is partially addressed with the Spastin KO and (2) there isn't a trivial way to show a causal relationship in this case.

      We thank Reviewer #1 for their positive assessment of our manuscript. To further strengthen the claim that MAP asymmetry underlies differences in regenerative capacity, we could investigate the effect of depleting other MAPs that lose asymmetry after conditioning lesion (CRMP5 and katanin). One would expect that similarly to spastin, this would disrupt the physiological asymmetry of DRG axons and impair axon regeneration. We further discussed this issue in the revised version of the manuscript (page 17, line 381).

      Reviewer #2 (Public review)

      Weaknesses:

      In order for the method to be used it needs to be better described. For instance what proportion of neurons develop just two axonal branches, one of which is different? How selective are the researchers in finding appropriate neurons?

      We thank Reviewer #2 for their positive assessment of our manuscript. As suggested, we included further methodological details on the in vitro system in the revised version of the manuscript. We have previously evaluated the percentage of DRG neurons exhibiting different morphologies in our cultures: multipolar (4±1%), bipolar, (35±8%) bell-shaped (17±5%), and pseudo-unipolar neurons (43±3%). This was included in the revised manuscript on Figure 1B and page 5, line 107.  All the pseudo-unipolar neurons analysed had distinct axonal branches in terms of diameter and microtubule dynamics. For imaging purposes, we selected pseudounipolar neurons with axons unobstructed from other cells or neurites within a distance of at least 20–30 μm from the bifurcation point, to ensure optimal imaging. In the case of laser axotomy experiments, this distance was increased to 100–200 μm to ensure clear analysis of regeneration. These selection criteria is now detailed in the Methods (page 19, line 417, and page 21, line 474).

      Reviewer #3 (Public review):

      (1) Weaknesses:

      While some of the data are compelling, experimental evidence only partially supports the main claims. In its current form, the study is primarily descriptive and lacks convincing mechanistic insights. It misses important controls and further validation using 3D in vitro models.

      We recognize the importance of further exploring the contribution of other MAPs to microtubule asymmetry and regenerative capacity of DRG axons. In future work, we plan to investigate this issue using knockout mice for katanin and CRMP5. Regarding the mechanisms underlying the differential localization of proteins in DRG axons, we performed in-situ hybridization to evaluate the availability of axonal mRNA but no differences were found between central and peripheral DRG axons (Figure 4 – figure supplement 2). To address whether differences in protein transport exist, we attempted to transduce DRG neurons with GFP-tagged spastin both in vitro and in vivo. However, these experiments were inconclusive as very low levels of spastin-GFP were detected. We are actively optimizing these approaches and will address this challenge in future studies. These points were further discussed in the revised manuscript (page 15, line 330 and page 17, line 381).

      (2) Given the heterogeneity of dorsal root ganglion (DRG) neurons, it is unclear whether the in vitro model described in this study can be applied to all major classes of DRG neurons. 

      We acknowledge the diversity of DRG neurons and agree that assessing the presence

      of different DRG subtypes in our culture system will enrich its future use. Despite this heterogeneity, we focused on DRG neuron features that are common to all subtypes i.e, pseudo-unipolarization and higher regenerative capacity of peripheral branches. This point was addressed on page 14, line 309 of the revised manuscript.

      (3) Also unclear is the inconsistency with embryonic DRG cultures with embryonic (E)16 from rats and E13 from mice (spastin knockout and wild-type controls). 

      Given our previous experience in establishing DRG neuron cultures from E16 Wistar rats and E13 C57BL/6 mice, these developmental stages are equivalent, yielding cultures of DRG neurons with similar percentages of different morphologies. Of note, in our colonies, gestation length is ~19 days in C57BL/6 mice (background of the spastin knockout line) and ~22 days in Wistar Han rats. This was further clarified in the Methods (page 18, line 404).

      (4) Furthermore, the authors stated (line 393) that only a small subset of cultured DRG neurons exhibited a pseudo-unipolar morphology. The authors should include the percentage of the neurons that exhibit a pseudo-unipolar morphology.

      We have previously evaluated the percentage of DRG neurons exhibiting different morphologies in our cultures: multipolar (4±1%), bipolar, (35±8%) bell-shaped (17±5%), and pseudo-unipolar neurons (43±3%). This was included in the revised manuscript on Figure 1B and on page 5, line 107. In line 393, we referred specifically to an experimental setup where DRG neuron transduction was done, and 30 transduced neurons were randomly selected for longitudinal imaging. From these, the number of viable pseudo-unipolar DRG neurons was limited by both the random nature of viral transduction and light-induced toxicity throughout continuous imaging over seven consecutive days at hourly intervals. This was clarified in the revised manuscript (page 20, line 438).

      (5) The significance of studying microtubule polymerization to DRG asymmetry in vitro is questionable, especially considering the model's validity. The authors might consider eliminating the in vitro data and instead focus on characterizing DRG asymmetry in vivo both before and after a conditioning lesion. If the authors choose to retain the in vitro data, classifying the central and peripheral-like branches in cultured DRG neurons will require further in-depth characterization. Additional validation should be performed in adult DRG neuron cultures not aged in vitro.

      The in vitro system here presented reliably reproduces several key features of DRG neurons observed in vivo, including asymmetry in axon diameter, regenerative capacity, axonal transport, and microtubule dynamics. Of note, most studies in the field have been done using multipolar DRG neurons that do not recapitulate in vivo morphology and asymmetries. Thus, the current in vitro model serves as a versatile tool for advancing our understanding of DRG biology and associated diseases. This system is particularly suited to study axon regeneration asymmetries, and enables the investigation of mechanisms occurring at the stem axon bifurcation, such as asymmetric protein transport and microtubule dynamics, which are challenging to examine in vivo due to the length of the stem axon and the difficulty of locating the DRG T-junction. It will be important to optimize similar cultures using adult DRG neurons. However, this comes with challenges, such as lower cell viability. This is the case with multiple other neuron types for which the vast majority of cultures are obtained from embryonic tissue. These concerns were addressed in the revised version of the manuscript (page 13, line 296 and page 14 line 302).

      (6) The comparison of asymmetry associated with a regenerative response between in vitro and in vivo paradigms has significant limitations due to the nature of the in vitro culture system. When cultured in isolation, DRG neurons fail to form functional connections with appropriate postsynaptic target neurons (the central branch) or to differentiate the peripheral domains associated with the innervation of target organs. Rather than growing neurons on a flat, hard surface like glass, more physiologically relevant substrates and/or culturing conditions should be considered. This approach could help eliminate potential artifacts caused by plating adult DRG neurons on a flat surface. Additionally, the authors should consider replicating their findings in a 3D culture model or using dorsal root ganglia explants, where both centrally and peripherally projecting axons are present.

      We agree that a more sophisticated system, such as a compartmentalized culture, holds great potential for future research. In this respect, we are currently engaged in developing such models. A compartmentalized system would enable the separation of three compartments: central nervous system neurons, DRG neurons, and peripheral targets. While previous efforts to create compartmentalized DRG cultures have been reported (e.g., PMID: 11275274 and PMID: 37578145), these systems have not demonstrated the development of pseudo-unipolar morphology. Incorporating non-neuronal DRG cells into the DRG neuron compartment, may successfully support the development of a pseudo-unipolar morphology. 

      We also recognize the importance of dimensionality in fostering pseudo-unipolar morphology. Of note, our model provides a 3D-like environment, as DRG glial cells are continuously replicating over the 21 days in culture. In relation to DRG explants, we attempted their use but encountered limitations with confocal microscopy as the axial resolution was insufficient to resolve processes at the DRG T-junction or within individual branches. The above issues are now discussed in the revised manuscript (page 14, line 312).

      (7) Panels 5H-J require additional processing with astrocyte markers to accurately define the lesion borders. Furthermore, including a lower magnification would facilitate a direct comparison of the lesion site. 

      In our study, we relied on the alignment of nuclei to delineate the lesion site as in our accumulated experience, this provides an accurate definition of the lesion boarder. Outside the lesion, the nuclei are well-aligned, while at the lesion site, they become randomly distributed. Additionally, CTB staining further supports the identification of the rostral boarder of the lesion, as most injured central DRG axons stop their growth at the injury site. This was further detailed in the Methods of the revised manuscript (page 32, line 730).

      (8) The use of cholera toxin subunit B (CTB) to trace dorsal column sensory axons is prone to misinterpretation, as the tracer accumulates at the axon's tip. This limitation makes it extremely challenging to distinguish between regenerating and degenerating axons.

      While alternative methods to trace or label regenerating axons exist, CTB is a wellestablished and widely used tracer for central sensory projections, as shown in different studies (PMID: 22681683, PMID: 26831088 and PMID: 33349630). Regarding the concern of possiblebCTB labeling in degenerating axons, we believe this is unlikely to be the case in our system, as in spinal cord injury controls, CTB-positive axons are nearly absent. Also, as regeneration was investigated six weeks after injury, axon degeneration has most likely already occurred as shown in (PMID: 15821747 and PMID: 25937174).

      Recommendations for the authors: 

      Reviewer #1:

      (1) Figure 1 can be improved by adding a quantification of the fraction of neurons at each stage as a function of time.

      We have updated Figure 1 to include the quantification of the percentages of different DRG neuron morphologies at DIV21 (Figure 1B), which corresponds to the stage at which all in vitro experiments were conducted.

      (2) Figure 3A: why are retrograde transport events not shown?

      Retrograde transport events are not displayed as results did not reach statistical significance.

      (3) Figure 3 and 4: Combine the quantifications of with/without lesion, such that not only the differences between branches are apparent, but also the differences induced in each branch by the lesion.

      As requested, only combined quantifications of microtubule dynamics for naive and conditioning lesion are provided in the revised version of Figure 3 (Figures 3H and 3K), to highlight both branch-specific differences and lesion-induced changes. However, for Figure 4, as the western blots for naive and conditioning lesion were performed on separate gels, it is unfeasible to combine their quantification.

      (4) Figure 5: does spastin KO lead to a difference in the "MAP signature" of each branch? Also, if in addition to MAPs there are other known molecules (and an antibody is available) that show differential localization to peripheral/central branches, it would be nice to check if this asymmetry is also lost in spastin KO.

      Evaluating the MAP signature in DRG axons from spastin KO mice will be important to explore in future experiments. Despite some scattered reports in the literature, our study is the first to identify a distinct protein signature of central and peripheral DRG axons. This is especially relevant in the case of Tau, as irrespective of the experimental conditions, its levels are always increased in the peripheral DRG axon.

      Reviewer #2:

      (1) Please provide a more complete description of the culture method. Do all neurons develop two asymmetric branches or just a few, and how are they selected? Does the timing of the events in vitro correspond with what is happening to the neurons in embryos?

      We have included the percentages of the various DRG neuron morphologies at DIV21 in the revised manuscript (Figure 1B and on page 5, line 107). Additionally, a more detailed description of the culture method is now provided in the Methods, including the criteria used to select pseudo-unipolar neurons (page 19, line 417, and page 21, line 474). 

      Regarding the timing of events, upon DRG dissociation, neurons reinitiate polarization, taking 21 days to reach approximately 40% pseudo-unipolar morphology. A similar percentage is reached at E16.5 during rat development in vivo (PMID: 8729965).

      (2) Are the neurons and their branches resting on the glia? Is there any relation to the presence of glia and the type of growth that is seen?

      Yes, neurons and their branches rest on glia. This is required for DRG pseudounipolarization. In future studies, we plan to further investigate neuron-extrinsic mechanisms leading pseudo-unipolarization, and to identify the specific glial cell type(s) needed throughout this process. This is now discussed in the revised manuscript (page 14, line 306).

      (3) Is it possible to trace microtubules so as to see whether the microtubules of the two branches mix, or whether they remain separate all the way to the cell bodies?

      We used DRG neurons transduced with EB3-GFP, to examine microtubule polymerization at the T-junction through live imaging. This revealed a high continuum of polymerization from the stem axon to the central-like axon (Figure 4 – figure supplement 2D-G). To further determine whether microtubules from both branches mix or remain separate, alternative techniques such as FIB-SEM could be performed. This point is now further discussed in the revised manuscript (page 16, line 352).

      (4) Using the term MAPs would lead readers to expect to see an analysis of different levels of MAP1, MAP2, etc. It would be interesting to see this if the authors have done it, but it is not necessary for the paper.

      We assessed the expression of MAP2 via western blot in DRG peripheral and central axons and no differences were found. This is now referred to in the Discussion (pages 15, line 327).

      (5) The regeneration experiments on the spastin knockouts are complicated by the lesion being in CNS tissue, which introduces various issues. Is there a difference in regeneration after dorsal root crush?

      We have not yet examined whether regeneration differs after dorsal root crush in the spastin knockout model. However, this presents an interesting question, as Schwann cells in the dorsal root, may support regeneration of central DRG axons.  

      Reviewer #3:

      The authors stated that the normality of the datasets was tested using the Shapiro-Wilk or D'Agostino-Pearson omnibus normality test. Given the low sample size (n=4) for some of the experiments presented (e.g., Figure 3B), it is not clear how normality was assessed which justifies the use of parametric tests.

      We followed GraphPad’s recommendations for selecting the appropriate normality test (https://www.graphpad.com/support/faqid/959/). The D'Agostino-Pearson omnibus K2 test, recommended for its versatility, was used when sample size was 8 or more. For smaller sample sizes (n < 8), we used the Shapiro-Wilk test, which is also widely used in biological research and can be employed with datasets of at least 3 values. These tests guided our decision-making regarding the use of parametric or non-parametric statistical tests.

    1. eLife Assessment

      This important study investigates the role of ATG6 in regulating NPR1, a key protein in the plant immune response. The authors present compelling evidence that ATG6 not only interacts with NPR1 in both the cytoplasm and nucleus but also enhances its stability and nuclear accumulation, leading to increased resistance to Pst DC3000/avrRps4 infection in Arabidopsis thaliana. The work incorporates a variety of approaches from molecular biology, confocal imaging, and biochemistry, which together strengthen the conclusions.

    2. Reviewer #1 (Public Review):

      The authors showed that autophagy-related genes are involved in plant immunity by regulating the protein level of the salicylic acid receptor, NPR1.

      The experiments are carefully designed and the data is convincing. The authors did a good job of understanding the relationship between ATG6 and NRP1.

      Comments on latest version:

      The authors have sufficiently addressed all concerns raised, which further enhanced data presentation. No additional concerns were raised.

    3. Reviewer #2 (Public Review):

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      Comments on latest version:

      The initial apprehensions about statistical oversights and the use of an unclear nuclear marker were fixed. The implementation of the nls-mCherry for nuclear co-localization and additional statistical analyses was done well. However, the functional importance pertaining to cytoplasmic accumulation of the ARG6 protein should ideally be explored in more detail in future studies.

      Updated sections:<br /> • Figure 1e: Added statistical analysis and updated with a nuclear marker.<br /> • Line Revisions: Terminology corrections for "infection" instead of "invasion".<br /> • NLS Analysis: Extended alignment and inclusion of conserved domains with predicted NLS (cut-off score: 2.6).

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      Comments on latest version:

      The term "invasion" has to be replaced with infection, as it doesn't have much meaning to this particular study. I already explained this point in the first review, but authors did not address it throughout the manuscript.

      Thank you for your constructive feedback. We have taken your suggestion into account and replaced "invasion" with "infection" in the revised manuscript (Lines 44,45,99,100,298,341,387,415,461,463,464,1002).

      In fig. 1e there's no statistical analysis. How can one show measurements from multiple samples without statistical analysis? All the data points have to be shown in the graph and statistics performed. In the arg6-npr1 and snrk-npr1 pairs no nuclear marker is included. How can one know where the nucleus is, particularly in such poor quality low res. images? The nucleus marker has to be included in this analysis and shown. This is an important aspect of the study as nuclear localization of ATG6 is proposed to be essential for its new function.

      Thank you for bringing this to our attention. We conducted the BIFC experiments again using nls-mCherry transgenic tobacco, which yielded clearer images. The results clearly demonstrate that ATG6 interacts with NPR1 in both the cytoplasm and nucleus. YFP signaling in the nucleus co-localizes with nls-mCherry (a nuclear localization mark). SnRK2.8 was employed as a positive control for NPR1 interaction." Relative fluorescence intensity of YFP were analyzed using image J software, n = 15 independent images were analyzed to quantify YFP fluorescence. All data points are displayed in the image, and we also conducted a Student's t-test analysis. We have incorporated these results into the revised manuscript (Fig 1d and e).

      Co-localization provided in the fig. S2 cannot complement this analysis, particularly since no cytoplasmic fraction is present for NPR1-GFP in fig. S2.

      Thank you for your observation. We repeated the experiment and confirmed that NPR1 and ATG6 co-localize in both the nucleus and cytoplasm. The image in Figure S2 has been updated accordingly.

      In the alignment in fig 2c, it is not explained what are the species the atg6 is taken from. The predicted NLS has to be shown in the context of either the entire protein sequence alignment or at least individual domain alignment with the indication of conserved residues (consensus). They have to include more species in the analysis, instead of including 3 proteins from a single species. Also, the predicted NLS in atg6 doesn't really have the classical type architecture, which might be an indication that it is a weak NLS, consistent with the fact that the protein has significant cytoplasmic accumulation. They also need to provide the NLS prediction cut-off score, as this parameter is a measure of NLS strength.

      Line 150: the NLS sequence "FLKEKKKKK" is a wrong sequence.

      Thank you for your suggestion. In both plants and animals, proteins are transported to the nucleus via specific nuclear localization signals (NLSs), which are typically characterized by short stretches of basic amino acids (Dingwall and Laskey, 1991, Raikhel, 1992, Nigg, 1997). Following your recommendation, we re-predicted potential NLS sequences in the ATG6 protein using NLSExplorer (http://www.csbio.sjtu.edu.cn/bioinf/NLSExplorer). Although we did not identify a classical monopartite NLS, we discovered a bipartite NLS similar to the consensus bipartite sequence (KRX<sub>(10-12)</sub>K(KR)(KR)) (Kosugi et al., 2009)in the carboxy-terminal region (475-517 aa) of ATG6, with a cut-off score of 2.6. These findings are consistent with substantial accumulation of ATG6 in the cytoplasm and minimal accumulation in the nucleus. Additionally, our comparison of ATG6 C-terminal sequences across several species, including Microthlaspi erraticum, Capsella rubella, Brassica carinata, Camelina sativa, Theobroma cacao, Brassica rapa, Eutrema salsugineum, Raphanus sativus, Hirschfeldia incana and Brassica napus, sequence comparison indicates that this bipartite NLS is relatively conserved. We have incorporated these results into the revised manuscript (lines 450-160).

      In fig. 3d no explanation for the error bars is included, and what type of statistical analysis is performed is not explained.

      Thank you for bringing this to our attention. In Figure 3d, a Student's t-test was conducted to analyze the data. The mean and standard deviation were calculated from three biological replicates, and the relevant description has been included in the figure notes.

      Reference

      Dingwall, C. and Laskey, R.A. (1991) Nuclear targeting sequences--a consensus? Trends Biochem Sci, 16, 478-481.

      Kosugi, S., Hasebe, M., Matsumura, N., Takashima, H., Miyamoto-Sato, E., Tomita, M. and Yanagawa, H. (2009) Six classes of nuclear localization signals specific to different binding grooves of importin alpha. J Biol Chem, 284, 478-485.

      Nigg, E.A. (1997) Nucleocytoplasmic transport: signals, mechanisms and regulation. Nature, 386, 779-787.

      Raikhel, N. (1992) Nuclear targeting in plants. Plant Physiol, 100, 1627-1632.

    1. eLife Assessment

      This important study investigates the signaling pathways regulating retinal regeneration. Convincing evidence shows that the sphingosine-1-phosphate (S1P) signaling pathway is inhibited following retinal injury. Small-molecule activators and inhibitors support a model in which S1P signaling must be inhibited to generate Müller glial progenitor cells-a key step in retinal regeneration. The presented results support the major conclusions. However, whether the drug treatments directly or indirectly affect the Müller cells remains unclear.

    2. Reviewer #1 (Public review):

      Summary:

      This study shows that the pro-inflammatory S1P signaling regulates the responses of muller glial cells to damage. The authors describe the expression of S1P signaling components. Using agonist and antagonist of the pathways they also investigate their effect on the de-differentiation and proliferation of Muller glial cells in damaged retina of postnatal chicks. They show that S1PR1 is highly expressed in resting MG and non-neurogenic MGPCs. This receptor suppresses the proliferation and neuronal activity promotes MGPC cell cycle re-entry and enhanced the number of regenerated amacrine-like cells after retinal damage. The formation of MGPCs in damaged retinas is impaired in the absence of microglial cells. This study further shows that ablation of microglial cells from the retina increases the expression of S1P-related genes in MG, whereas inhibition of S1PR1 and SPHK1 partially rescues the formation of MGPCs in damaged retinas depleted of microglia. The studies also show that expression of S1P-related genes is conserved in fish and human retinas.

      Strengths:

      This is well-conducted study, with convincing images and statistically relevant data

      Weaknesses:

      In a previous study, the authors have shown that S1P is upstream of NF-κB signaling (Palazzo et al. 2020; 2022, 2023). Although S1P and NF-κB signaling have overlapping effects, the authors here provide evidence for S1P specific effects, adding some new information to the field.

    3. Reviewer #2 (Public review):

      Summary:

      Sphingosine-1-phosphate (S1P) metabolic and signaling genes are expressed highly in retinal Müller glia (MG) cells. This study tested how S1P signaling regulates glial phenotype, dedifferentiation of, reprogramming into proliferating MG-derived progenitor cells (MGPCs), and neuronal differentiation of the progeny of MGPCs using in vivo chick retina. Major techniques used are Sc-RNASeq and immunohistochemistry to determine the gene expression and proliferation of MG cells that co-label with signaling antibodies or mRNA FISH following treating the in vivo eyes with various S1P signaling antagonists, agonists, and signal modulators. The major conclusions drawn are supported by the results presented. However, the methodology they have used to modulate the S1P pathway using various chemical drugs raises questions about the outcomes and whether those are the real effects of S1P receptor modulation or S1P synthesis inhibition.

      Strengths:

      - Use of elaborated single-cell RNAseq expression data.<br /> - Use of FISH for S1P receptors and kinase as a good quality antibody is not available.<br /> - Use of EdU assay in combination with IHC<br /> - Comparison with human and Zebrafish Sc-RNA data

      Weaknesses:

      The methodology is not very clean. A number of drugs (inhibitors/ antagonists/agonists signal modulators) are used to modulate S1P expression or signaling in the retina without evidence that these drugs are reaching the target cells. No alternative evaluation if the drugs, in fact, are effective. The drug solubility in the vehicle and in the vitreous is not provided, and how did they decide on using a single dose of each drug to have the optimal expected effect on the S1P pathway?

      In the revision, the authors provided justification for the use of single doses of the modulators and how they could pass the retinal barrier and affect the MG gene expression and receptor functioning.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      However, given that S1P is upstream NF-κB signaling, it is unclear if it offers conceptual innovations as compared to previous studies from the same team (Palazzo et al. 2020; 2022, 2023)

      We find distinct differences between the impacts of S1P- and NFkB-signaling on glial activation, neuronal differentiation of the progeny of MGPCs and neuronal survival in damaged retinas. In the current study we demonstrate that 2 consecutive daily intravitreal injections of S1P selectively activated mTor (pS6) and Jak/Stat3 (pStat3), but not MAPK (pERK1/2) signaling in Müller glia.  Further, inhibition of S1P synthesis (SPHK1 inhibitor) decreased ATF3, mTor (pS6) and pSmad1/5/9 levels in activated Müller glia in damaged retinas. Inhibition of NFkB-signaling in damaged chick retinas did not impact the above-mentioned cell signaling pathways (Palazzo et al., 2020). Thus, S1P-signaling impacts cell signaling pathways in MG that are distinct from NFκB, but we cannot exclude the possibility of cross-talk between NFkB and these pathways. Further, inhibition of NFκB-signaling potently decreases numbers of dying cells and increases numbers of surviving ganglion cells (Palazzo et al 2020). Consistent with these findings, a TNF orthologue, which presumably activates NFκB-signaling, exacerbates cell death in damage retinas (Palazzo et al., 2020). By contrast, 5 different drugs targeting S1P-signaling had no effect on numbers of dying cells and only one S1PR1 inhibitor modestly decreased numbers of dying cells (current study). Although two different inhibitors of NFkB-signaling suppressed the proliferation of microglia in damaged retinas (Palazzo et al., 2020), all of the S1P-targeting drugs had no effect upon the proliferation of microglia (current study). In addition, inhibition of NFκB does not influence the neurogenic potential of MGPCs in damaged chick retinas (Palazzo et al., 2020), whereas inhibition of S1P receptors (S1PR1 and S1PR3) and inhibition of S1P synthesis (SPHK1) significantly increased the differentiation of amacrine-like neurons in damaged retinas (current study). Collectively, in comparison to the effects of pro-inflammatory cytokines and NFκB-signaling, our current findings indicate that S1P-signaling through S1PR1 and S1PR3 in Müller glia has distinct effects upon cell signaling pathways, neuronal regeneration and cell survival in damaged retinas. We will revise text in the Discussion (pages 33-34) to better highlight these important distinctions between NFκB- and S1P-signaling.

      Reviewer #2 (Public review):

      Weaknesses:

      The methodology is not very clean. A number of drugs (inhibitors/ antagonists/agonists signal modulators) are used to modulate S1P expression or signaling in the retina without evidence that these drugs are reaching the target cells. No alternative evaluation if the drugs, in fact, are effective. The drug solubility in the vehicle and in the vitreous is not provided, and how did they decide on using a single dose of each drug to have the optimal expected effect on the S1P pathway?

      Müller glia are the predominant retinal cell type that expresses S1P receptors. Consistent with these patterns of expression, we report Müller glia-specific effects of different agonists and antagonists that increase or decrease S1P-signaling. Since we compare cell-level changes within contralateral eyes wherein one retina is exposed to vehicle and the other is exposed to vehicle plus drug, it seems highly probable that the drugs are eliciting effects upon the Müller glia. It is possible, but very unlikely, that the responses we observed could have resulted from drugs acting on extra-retinal tissues, which might secondarily release factors that elicit cellular responses in Müller glia. However, this seems unlikely given the distinct patterns of expression for different S1P receptors in Müller glia, and the outcomes of inhibiting Sphk1 or S1P lyase on retinal levels of S1P.

      For example, we provide evidence that S1PR1 and S1PR3 expression is predominant in Müller glia in the chick retina using single cell-RNA sequencing and fluorescence in situ hybridization (FISH). Thus, we expect that S1PR1/3-targeting small molecule inhibitors to directly act on Müller glia, which is consistent with our read-outs of cell signaling with injections of S1P in undamaged retinas. We show that SPHK1 and SGPL1, which encode the enzymes that synthesize or degrade S1P, are expressed by different retinal cell types, including the Müller glia. The efficacy of the drugs that target SPHK1 and SGPL1 was assessed by measuring levels of S1P in the retina. By using liquid chromatography and tandem mass spectroscopy (LC-MS/MS), we provide data that inhibition of S1P synthesis (inhibition of SPHK1) significantly decreased levels of S1P in normal retinas, whereas inhibition of S1P degradation (inhibition of SGPL1) increased levels of S1P in damaged retinas (Fig. 5).  These data suggest that the SPHK1 inhibitor and the SGPL1 inhibitor specifically act at the intended target to influence retinal levels of S1P.  Further, inhibition of SPHK1 (to decrease levels S1P) results in decreased levels of ATF3, pS6 (mTor) and pSMAD1/5/9 in Müller glia, consistent with the notion that reduced levels of S1P in the retina impacts signaling at Müller glia. Finally, we find similar cellular responses to chemically different agonists or antagonists, and we find opposite cellular responses to agonists and antagonists, which are expected to be complimentary if the drugs are specifically acting at the intended targets in the retina. We will revise the Discussion to better address caveats and concerns regarding the actions and specificity of different drugs within the retina following intravitreal delivery.

      We will provide the drug solubility specifications and estimates of the initial maximum dose per eye for each drug. For chick eyes between P7 and P14, these estimates will assume a volume of about 100 ul of liquid vitreous, 800 ul gel vitreous and an average eye weight of 0.9 grams. We will revise Table 1 (pharmacological compounds) with ranges of reported in vivo ED50’s (mg/kg) for drugs and we will list the calculated initial maximum dose (mg/kg equivalent) per eye. Doses were chosen based on estimates of the initial maximum ocular dose that were within the range of reported ED50’s. However, as is the case for any in vivo model system, it is difficult to predict rates of drug diffusion out of the vitreous, how quickly the drugs are cleared from the entire eye, how much of the compound enters the retina, and how quickly the drug is cleared from the retina. Accordingly, we assessed drug specificity and sites of activation by relying upon readouts of cell signaling pathways that are parsed with patterns of expression of different S1P receptors and measurements of retinal levels of S1P following exposure to drugs targeted enzymes that synthesize or degrade S1P, as described above. 

      Reviewer #1 (Recommendations for the authors):

      I am wondering if Muller glia can be considered as fully differentiated at early postnatal stages as those used in this study. Is this mechanism operative in adult retinas? Could the authors perform studies in older animals, just to have the proof of principle that the proposed mechanism is retained.

      Chickens are considered to be adult at about 4 months of age, when the females start laying eggs. Unfortunately, housing, maintenance, handling and experimentation on large adult chickens has proven to be challenging. Nevertheless, there is evidence that Muller glia reprogramming remains robust in mature chick retinas from the P1 through P30, but the zones of proliferation shift away from central retina and become increasingly confined to the retinal periphery (Fischer, 2005). MG “maturation” appears to occur in a central-to-peripheral gradient, much like the process of embryonic retinal differentiation, but a zone of regeneration-competent MG remains in the periphery during adolescent development (Fischer, 2005).

      We have defined central vs peripheral retina in the Methods.

      To partially address this question, we have generated a new supplemental Figure 6 showing (i) SPHK1 fluorescent in-situ labeling of central and peripheral regions at P10, and (ii) analysis of EdU+Sox2+ MGPCs in central versus regions treated with NMDA +/-S1PR1 inhibitor or NMDA+/- SPHK1 inhibitor. We find that patterns of S1PR1 transcription in the central region are similar to the peripheral region (not shown), and S1PR1 inhibition modestly increased numbers of MGPCs in central regions. Unlike the peripheral regions of retina, SPHK1 FISH signal in the central region remains low at 48 hours post-injury (supplemental Fig. 6). Additionally, we found that the SPHK1 inhibitor had no effect on numbers of proliferating MGPCs in the central regions of retina, whereas SPHK1 inhibitors stimulated proliferation of MGPCs in the periphery (Fig. 4). It is likely that mature MG in central retinal regions are not responsive to SPHK1 inhibition due to low levels of expression.

      We have previously shown that Notch-related genes show unique patterns of expression in the central and peripheral retinas, and expression levels significantly change at P0, P7, and P21 (Ghai et al, 2010). We found that Notch inhibition reduced cell death and numbers of MGPCs in central regions but not peripheral regions. Recent sc-RNA sequencing analysis of murine macula and peripheral retinal regions has revealed interesting differences in NFKBIA/Z and NFIA expression, possibly indicating a difference in the early inflammatory transcriptional response to retinal damage (Zhang et al, 2024 biorxiv). We believe that spatial sequencing of peripheral “immature” and central “mature” chick Muller glia will be a useful tool in the future to reveal key differences in signaling pathway-related gene expression which confer a competence for regeneration in the periphery.

      We have added text to the Results (pages 20-21) and Discussion (page 32) to address the S1P-signaling in central (mature MG) vs peripheral (immature MG) regions of the retina.

      Minor points.

      The abstract is difficult to follow and consists of a list of what activates or represses the formation of MGPC. Please rewrite the abstract to integrate information and provide a clearer message. Also, please include the species of study in the abstract and mention it again at the beginning of the results, at least.

      We have rewritten the abstract to simplify and clarify our main points (p 2).

      Lines 65-69. The sentence is unclear, perhaps there are words either missing or in excess and there is a need to check the spelling.

      We have simplified this sentence to improve clarity and referenced our recently published review to support.

      Lines 112-113. Please explain why " retinas were treated with saline, NMDA, or 2 or 3 doses insulin+FGF2 and the combination of NMDA and insulin+FGF2". There is a reference but readers will appreciate understanding right away why.

      We have added a sentence to clarify the purpose of comparing gene expression patterns in MG and MGPCs in NMDA-damaged retinas versus retinas treated with insulin+FGF2.

      Lines 223-257. This list of experiments is difficult to follow and perhaps should be summarized better. Somehow lines 257-261 say it all.

      We have revised this section to clarify differences in outcomes between S1PR1/3 activators and inhibitors. We also stated the enzymatic functions of SPHK1 and SGPL1 to improve clarity.

      Lines 392-441. Comparative expression analysis should be summarized as the message is somehow simple but the description is rather lengthy.

      We have revised our comparative expression analyses to be more concise.

      Reviewer #2 (Recommendations for the authors):

      (1) Only a single dose of the drugs (inhibitor/ antagonists/agonists signal modulators) is used for each drug, as shown in Table 1. How do they know this is an effective dose?

      We estimated the appropriate dose based on the initial maximum dose, which we based on the reported ED50 values for each drug. We have revised Table 1 to include this information.

      (2) Most of the drugs appeared to be hydrophobic, but except for sphingosine and S1P, all are described to be injected with sterile saline. They must provide solubility characteristics of these drugs in solvents. For example, FTY720 is not water-soluble, which raises the question of all of their drugs' solubility, bioavailability to the cells of interest, and their effectivity in signal transduction in the retinal cells.

      Some S1P-targeting compounds were delivered in 20% DMSO in saline to support the solubility of the different lipophillic small molecule agonists/antagonists. We have added information to the Methods to describe the use of DMSO to solubilize these drugs (p 6) in Table 1 and p 5. We have also revised Table 1 with ranges of reported ED50’s (mg/kg) for all drugs and listed the calculated initial maximum dose (mg/kg) per eye.

      (3) Drugs were delivered to the vitreous chamber, but there was no information on how they would cross the inner limiting membrane to affect or modulate S1P metabolism in retinal MG or to bind the S1P receptors on MG or other retinal cell types.

      All selected compounds are small-molecule drugs, many of which are structural analogues of sphingosine or S1P. These drugs would be classified as BDDCS Class II drugs, meaning they have low solubility but high cell permeability. Thus, it is highly probable that they diffuse across the ILM to act on S1P receptors on MG, but it is also likely that their bioavailability is more limited, requiring a higher dose, repeated doses, and the use of solubilizing agents. We have clarified our use of DMSO to solubilize these drugs (p 6) according to vendor recommendations (p 5). This information has been added to the Methods.

      (4) Gene expression is a very dynamic process; without providing more evidence that the expression changes are the direct effect of the drug treatment, the conclusions made based on the gene expression profiles are not strong. Additional points:

      We do not make assertions that changes in scRNA-seq expression profiles are the direct result of S1P-targetting drugs. We report significant changes in cellular expression profiles following NMDA-induced retinal damage or ablation of microglia. We feel that new experiments to assess the gene expression profiles of retinal cells that are directly downstream of the different S1P-targetting drugs is better suited for future studies.

      (5) Please add in the introduction that there is only one sphingosine kinase in chicken, as no SPHK2 is known to be present.

      We have added additional information regarding the expression of SPHK1 and SPHK2 genes in the chick genome (p 4).

      (6) Fig 1d and in many other UMAP clusters, the low expressing genes are barely visible (Ex. 1d, S1PR2, and S1PR3); please extract them in separate UMAP clusters and provide them in supplements.

      We have revised supplemental Figure 1 to include separate panels for each of the S1P-related gene.

      (7) The Figure References for SPHK1 (Fig. 2e), SGPL1 (Fig. 2e), ASAH1 (Fig. 2f), CERS6 (Fig. 2f), and CERS5 (Fig. 2f) in the line # 124- 132 should belong to Figure 1, not Figure 2.

      We have corrected these figure references (p 14).

      (8) The description of the expression of zebrafish genes does not match the figures. For example, 'Similarly, sphk1 was detected in very few cells in the retina (Fig. 10j). By comparison, sphk2 was detected in a few bipolar cells and rod photoreceptors (Fig. 10j). Similar to patterns of expression seen in chick and human retinas, sgpl1 was detected in microglia and a few cells scattered among the different clusters of inner retinal neurons and rod photoreceptors (Fig. 10j)', the expression of these genes are not in very few or few scattered cells rather in many cells.

      We have revised these statements to improve clarity and more accurately describe the data in Figure 10 (p 28).

    1. eLife Assessment

      This study presents a valuable finding that synthetically lethal kinase genes FYN and KDM4 may play a role in drug resistance to kinase inhibitors in TNBC. The evidence supporting the claims of the authors is solid, although the exploration of the upstream mechanisms regulating KDM4A or the downstream pathways through which FYN upregulation confers drug resistance would have strengthened the study. The work will be of interest to medical biologists working in the field of breast cancer.

    2. Reviewer #1 (Public review):

      Summary:

      The authors employed a combinatorial CRISPR-Cas9 knockout screen to uncover synthetically lethal kinase genes that could play a role in drug resistance to kinase inhibitors in triple-negative breast cancer. The study successfully reveals FYN as a mediator of resistance to depletion and inhibition of various tyrosine kinases, notably EGFR, IGF-1R, and ABL, in triple-negative breast cancer cells and xenografts. Mechanistically, they demonstrate that KDM4 contributes to the upregulation of FYN and thereby is an important mediator of the drug resistance. All together, these findings suggest FYN and KDM4A as potential targets for combination therapy with kinase inhibitors in triple-negative breast cancer. Moreover, the study may also have important implications for other cancer types and other inhibitors, as the authors suggest that FYN could be a general feature of drug-tolerant persister cells.

      Strengths:

      (1) The authors used a large combination matrix of druggable tyrosine kinase gene knockouts, enabling studying of co-dependence of kinase genes. This approach mitigates off-target effects typically associated with kinase inhibitors, enhancing the precision of the findings.

      (2) The authors demonstrate the importance of FYN in drug resistance in multiple ways. They demonstrate synergistic interactions using both knockouts and inhibitors, while also revealing its transcriptional upregulation upon treatment, strengthening the conclusion that FYN plays a role in the resistance.

      (3) The study extends its impact by demonstrating the potent in vivo efficacy of certain combination treatments, underscoring the clinical relevance of the identified strategies.

      Weaknesses:

      (1) The combination of FYN knockout with other gene knockouts exhibits only very modest synergy. The high standard deviation observed for FYN knockout in Figure S2A weakens the robustness of these findings. As combination treatments involving inhibitors did demonstrate stronger synergistic effects, the data still support the role of FYN in regulating sensitivity to the described drugs.

      (2) While the study identifies KDM4A as a key contributor to FYN upregulation, it does not fully explore the upstream mechanisms regulating KDM4A or the downstream pathways through which FYN upregulation confers drug resistance. These unaddressed questions limit the mechanistic understanding that can be obtained from this study.

      (3) FYN has been implicated in drug resistance in previous studies, and other mechanisms for its upregulation and downstream effects have already been described. While this study adds value to the existing literature in the context of breast cancer, it does not present entirely novel findings regarding FYN's role in drug resistance.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors employed a combinatorial CRISPR-Cas9 knockout screen to uncover synthetically lethal kinase genes that could play a role in drug resistance to kinase inhibitors in triple-negative breast cancer. The study successfully reveals FYN as a mediator of resistance to depletion and inhibition of various tyrosine kinases, notably EGFR, IGF-1R, and ABL, in triple-negative breast cancer cells and xenografts. Mechanistically, they demonstrate that KDM4 contributes to the upregulation of FYN and thereby is an important mediator of drug resistance. All together, these findings suggest FYN and KDM4A as potential targets for combination therapy with kinase inhibitors in triple-negative breast cancer. Moreover, the study may also have important implications for other cancer types and other inhibitors, as the authors suggest that FYN could be a general feature of drug-tolerant persister cells.

      Strengths:

      (1) The authors used a large combination matrix of druggable tyrosine kinase gene knockouts, enabling studying of co-dependence of kinase genes. This approach mitigates off-target effects typically associated with kinase inhibitors, enhancing the precision of the findings.

      (2) The authors demonstrate the importance of FYN in drug resistance in multiple ways. They demonstrate synergistic interactions using both knockouts and inhibitors, while also revealing its transcriptional upregulation upon treatment, strengthening the conclusion that FYN plays a role in the resistance.

      (3) The study extends its impact by demonstrating the potent in vivo efficacy of certain combination treatments, underscoring the clinical relevance of the identified strategies.

      Weaknesses:

      (1) The methods and figure legends are incomplete, posing a barrier to the reproducibility of the study and hindering a comprehensive understanding and accurate interpretation of the results.

      We thank the reviewer for pointing this out. We tried adding as much detail in methods and figures legends as possible to maximize reproducibility and accuracy in interpreting our results as will be described for our responses for the recommendations for authors.

      (2) The authors make use of a large quantity of public data (Fig. 2D/E, Fig. 3F/L/M, Fig 4C, Fig 5B/H/I), whereas it would have strengthened the paper to perform these experiments themselves. While some of this data would be hard to generate (e.g. patient data) other data could have been generated by the authors. The disadvantage of the use of public data is that it merely comprises associations, but does not have causal/functional results (e.g. FYN inhibition in the different cancer models with various drugs). Moreover, by cherry-picking the data from public sources, the context of these sources is not clear to the reader, and thus harder to interpret correctly. For example, it is not directly clear whether the upregulation of FYN in these models is a very selective event or whether it is part of a very large epigenetic re-programming, where other genes may be more critical. While some of the used data are from well-known curated databases, others are from individual papers that the reader should assess critically in order to interpret the data. Sometimes the public data was redundant, as the authors did do the experiments themselves (e.g. lung cancer drug-tolerant persisters), in this case, the public data could also be left out.

      More importantly, the original sources are not properly cited. While the GEO accession numbers are shown in a supplementary table, the articles corresponding to this data should be cited in the main text, and preferably also in the figure legend, to clarify that this data is from public sources, which is now not always the case (e.g. line 224-226). If these original papers do already mention the upregulation of FYN, and the findings from the authors are thus not original, these findings should be discussed in the Discussion section instead of shown in the Results.

      We welcome the reviewer’s concern. As reviewer pointed out, our analysis with FYN expression levels in multiple studies with drug tolerant cells may merely reflect association and not causal relationships. We had at least shown that FYN inhibition may reduce drug tolerance in TNBC and EGFR inhibitor treated lung cancer cells (figures 2H, 5E). The causal role of FYN in emergence of drug tolerance in other cancers treated with different drugs (such as irinotecan treated colon adenocarcinoma and gemcitabine treated pancreatic adenocarcinoma) may be beyond scope of this study. We made a brief discussion addressing this concern in lines 273-275.

      We also added proper citations of the public data used in this study in main text and figure legends in lines 267-269. The GEO accession numbers are listed in supplementary table S2. Importantly, none of the referenced studies identified FYN as key factor in generating drug tolerant cells.

      (3) The claim in the abstract (and discussion) that the study "highlights FYN as broadly applicable mediator of therapy resistance and persistence", is not sufficiently supported by the results. The current study only shows functional evidence for this for an EGFR, IGF1R, and Abl inhibitor in TNBC cells. Further, it demonstrates (to a limited extent) the role of FYN in gefitinib and osimertinib resistance (also EGFR inhibitors) in lung cancer cells. Thus, the causal evidence provided is only limited to a select subset of tyrosine kinase inhibitors in two cancer types. While the authors show associations between FYN and drug resistance in other cancer types and after other treatments, these associations are not solid evidence for a causal connection as mentioned in this statement. Epigenetic reprogramming causing drug resistance can be accompanied by altered gene expression of many genes, and the upregulation of FYN may be a consequence, but not a cause of the drug resistance. Therefore, the authors should be more cautious in making such statements about the broad applicability of FYN as a mediator of therapy resistance.

      We fully agree with the reviewer’s concern that FYN upregulation is simply an association, and may not be the cause of drug tolerance and resistance. Therefore, to accurately convey our findings, we edited our manuscript in lines 34-36 in abstract to “FYN expression is associated with therapy resistance and persistence by demonstrating its upregulation in various experimental models of drug-tolerant persisters and residual disease following targeted therapy, chemotherapy, and radiotherapy” and lines 288-290 in discussion to “ Upregulation of FYN is a general feature of drug tolerant cancer cells, suggesting the association of FYN expression with drug resistance and tumor recurrence after treatment.” We hope this satisfies the reviewer.

      (4) The rationale for picking and validating FYN as the main candidate gene over other genes such as FGFR2, FRK2, and TEK is not clear.

      a. While gene pairs containing FGFR2 knockouts seemed to be equally effective as FYN gene pairs in the primary screening, these could not be validated in the validation experiment. It is unclear whether multiple individual or a pool of gRNAs were used for this validation, or whether only 1 gRNA sequence was picked per gene for this validation. If only 1 gRNA per gene was used, this likely would have resulted in variable knockout efficiencies. Moreover, the T7 endonuclease assay may not have been the best method to check knockout efficiency, as it only implies endonuclease activity around a gene (but not to the extent of indels that can cause frameshifts, such as by TIDE analysis, or extent of reduction in protein levels by western blot).

      b. Moreover, FRK2 and TEK, also demonstrated many synergistic gene pairs in the primary screen. However, many of these gene pairs were not included in the validation screening. The selection criteria of candidate gene pairs for validation screening is not clear. Still, TEK-ABL2 was also validated as a strong hit in the validation screen. The authors should better explain the choice of FYN over other hits, and/or mention that TEK and FRK2 may also be important targets for combination treatment that can be further elucidated.

      We thank the reviewer for improving our manuscript. We had concerns with the generalizability of FGFR2, FRK and TEK in TNBC as their expressions are very low in MDA-MB-231, nor were they enriched in TNBC compared to cancer cell lines of other subtypes. We added a brief comment on this concern in results section and discussion section (lines 150-154, figure S3). Although we acknowledge that the validations done in figure 2B is a result of only one guide RNA, with validations with pharmacological inhibition of FYN (figure 2F-I), we hope the reader and reviewer can be convinced with our key findings in synthetic lethality between FYN and other tyrosine kinases.

      (5) On several occasions, the right controls (individual treatments, performed in parallel) are not included in the figures. The authors should include the responses to each of the single treatments, and/or better explain the normalization that might explain why the controls are not shown.

      a. Figure 2G: The effect of PP2 treatment, without combined treatment, is not shown.

      b. Figure 2H/3G: The effect of the knockouts on growth alone, compared to sgGFP, is not demonstrated. It is unclear whether the viability of knockouts is normalized to sgGFP, or to each untreated knockout.

      c. Figure 2L: The effect of SB203580 as a single treatment is not shown.

      We thank the reviewer for pointing this out. The data shown for all figures listed in these concerns were normalized by the changes in viability by pharmacological or genetic perturbations that synergized with TKIs (NVP-ADW742, gefitinib…etc.) used in the figures in the original manuscript. As reviewer had suggested, we newly added the effect of SB203580 and PP2 treatment on cell viability in supplementary figures S4A, S4K. SB203580 had no significant effect on cell viability, while PP2 treatment caused significant decrease in cell viability, which is expected as PP2 can inhibit activity of multiple Src family kinases. Regardless of the effect of SB203580 and PP2 on cell viability as single agent, it is evident that treatment of TKIs synergistically decreased cell viability in cancer cell lines. The change in viability by FYN or histone lysine demethylase knockout was also provided in newly added figure S4D and S6C. Notably, genetic ablation of FYN or histone lysine demethylases had modest, if any, influences on cell viability.

      (6) The study examines the effects at a single, relatively late time point after treatment with inhibitors, without confirming the sequential impact on KDM4A and FYN. The proposed sequence of transcriptional upregulation of KDM4A followed by epigenetic modifications leading to FYN upregulation would be more compellingly supported by demonstrating a consecutive, rather than simultaneous, occurrence of these events. Furthermore, the protein level assessment at 48 hours (for RNA levels not clearly described), raises concerns about potential confounding factors. At this late time point, reduced cell viability due to the combination treatment could contribute to observed effects such as altered FYN expression and P38 MAPK phosphorylation, making it challenging to attribute these changes solely to the specific and selective reduction of FYN expression by KDM4A.

      We thank the reviewer for pointing this out. We performed time course experiment for NVP-ADW742 treatment on MDA-MB-231 cells in our newly added figure 3E. Surprisingly, treatment of NVP-ADW742 increased KDM4A protein level within two hours. FYN protein accumulation followed KDM4A accumulation after 24 hours. This observation, with our chromatin immunoprecipitation data in figure 3O, provide evidence that FYN accumulation is a consequence of KDM4A accumulation and H3K9me3 demethylation upon TKI treatment. We newly discussed this data in results and discussion section in lines 214-216.

      (7) The cut-off for considering interactions "synergistic" is quite low. The manual of the used "SynergyFinder" tool itself recommends values above >10 as synergistic and between -10 and 10 as additive ( https://synergyfinder.fimm.fi/synergy/synfin_docs/). Here, values between 5-10 are also considered synergistic. Caution should be taken when discussing those results. Showing the actual dose response (including responses to each single treatment) may be required to enable the reader to critically assess the synergy, along with its standard deviation.

      We thank the reviewer for careful comments. We reanalyzed our data with SynergyFinder plus tool (Zheng, Genomics, Proteomics, and Bioinformatics 2022), which implements mathematical models distinct from SynergyFinder 3, for more faithful implementation of Bliss, Loewe independence models, and more critically, calculates statistical significance of the synergy. We provide updates synergy plots with statistics in figures 2F, 3J, and S4B. All drug combinations show statistically significant synergy (p<0.01). We also add raw data used to calculate synergy in figures 2F, 3J and S4B in supplementary dataset S2.

      (8) As the effect size on Western blots is quite limited and sometimes accompanied by differences in loading control, these data should be further supported by quantifications of signal intensities of at least 3 biological replicates (e.g. especially Figure 3A/5A). The figure legends should also state how many independent experiments the blots are representative of.

      We added quantifications for figure 3A and 5A for better depiction of our results. Figure legends were edited to indicate this is a representative of three independent experiments.

      (9) While the article provides mechanistic insights into the likely upregulation of FYN by KDM4A, this constitutes only a fragment of the broader mechanism underlying drug resistance associated with FYN. The study falls short in investigating the causes of KDM4A upregulation and fails to explore the downstream effects (except for p38 MAPK phosphorylation, which may not be complete) of FYN upregulation that could potentially drive sustained cell proliferation and survival. These omissions limit the comprehensive understanding of the complete molecular pathway, and the discussion section does not address potential implications or pathways beyond the identified KDM4A-FYN axis. A more thorough exploration of these aspects would enhance the study's contribution to the field.

      We welcome the reviewer’s careful concern. We agree our delineation of mechanisms underlying TKI resistance in TNBC involving KDM4 and FYN is far from complete. The increases in expression of histone demethylases were observed in cancers treated with different drugs. The mechanisms governing the increase in histone demethylase expression is not known and is beyond the scope of this paper. We newly added this in discussion section in lines 299-304.

      (10) FYN has been implied in drug resistance previously, and other mechanisms of its upregulation, as well as downstream consequences, have been described previously. These were not evaluated in this paper, and are also not discussed in the discussion section. Moreover, the authors did not investigate whether any of the many other mechanisms of drug resistance to EGFR, IGF1R, and Abl inhibitors that have been described, could be related to FYN as well. A more comprehensive examination of existing literature and consideration of alternative or parallel mechanisms in the discussion would enhance the paper's contribution to understanding FYN's involvement in drug resistance.

      FYN has been implicated in TKI resistance in CML cell lines (Irwin, Oncotarget, 2015). In this study, FYN is similarly transcriptionally upregulated in imatinib resistant CML, and this upregulation is dependent on EGR1 transcription factor. To address this concern, we generated EGR1 KO MDA-MB-231 cells and tested whether these cells retain the ability to accumulate FYN. Consistent with the previous study, imatinib treatment increased EGR1 protein level. However, EGR1 knockout did not influence FYN accumulation in MDA-MB-231 cells. EGR1 mediated accumulation of FYN may be context specific phenomenon to CML (Figure S5B). We newly discussed this result in result sections in lines 187-190. We also acknowledge that SRC family kinases are generally involved in drug resistance in many cancers. We discuss the recent findings regarding SRC family kinases in drug resistance in result section in lines 145-147 and discussion sections in lines 315-317.

      Reviewer #2 (Public Review):

      Summary:

      Kim et al. conducted a study in which they selected 76 tyrosine kinases and performed CRISPR/Cas9 combinatorial screening to target 3003 genes in Triple-negative breast cancer (TNBC) cells. Their investigation revealed a significant correlation between the FYN gene and the proliferation and death of breast cancer cells. The authors demonstrated that depleting FYN and using FYN inhibitors, in combination with TKIs, synergistically suppressed the growth of breast cancer tumor cells. They observed that TKIs upregulate the levels of FYN and the histone demethylase family, particularly KDM4, promoting FYN expression. The authors further showed that KDM4 weakens the H3K9me3 mark in the FYN enhancer region, and the inhibitor QC6352 effectively inhibits this process, leading to a synergistic induction of apoptosis in breast cancer cells along with TKIs. Additionally, the authors discovered that FYN is upregulated in various drug-resistant cancer cells, and inhibitors targeting FYN, such as PP2, sensitize drug-resistant cells to EGFR inhibitors.

      Strengths:

      This study provides new insights into the roles and mechanisms of FYN and KDM4 in tumor cell resistance.

      Weaknesses:

      It is important to note that previous studies have also implicated FYN as a potential key factor in drug resistance of tumor cells, including breast cancer cells. While the current study is comprehensive and provides a rich dataset, certain experiments could be refined, and the logical structure could be more rigorous. For instance, the rationale behind selecting FYN, KDM4, and KDM4A as the focus of the study could be more thoroughly justified.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The methods and figure legends are incomplete, posing a barrier to the reproducibility of the study and hindering a comprehensive understanding and accurate interpretation of the results. A critical revision of these aspects is needed, for example:

      a. Catalogue numbers of certain products critical to reproduce the study (e.g. antibodies) and/or at what company they have been purchased (e.g. used compounds)

      b. On several occasions the used concentrations of drugs or exposure time are not mentioned (e.g. Figure 2H, G (PP2), I, J, K, L, etc.)

      c. Figure legend of figure panels E-I in Figure 5 seems to be completely incorrect and not consistent with the figure axis etc.

      d. RT-qPCR methodology is not described in Methods.

      e. Western blot methods are very limited: these should be described in more detail or cite an article that does.

      f. Organoid culture: Information about the source of tumour cells (e.g. pre-treatment biopsy, material after surgery), isolation of tumor cells (e.g. methodology, characterization of material) and culture conditions (e.g. culture time before the experiment) is lacking.

      g. Information about how gefitinib/osimertinib-resistant PC9 and HCC827 cells are generated (as well as culture conditions and where they are from) is missing.

      We thank the reviewer for pointing these out. We have done our best to add experimental details for reproducibility in methods section and figure legends in lines 343-348, 408-426, 431-432, 439-453, 648-650, 671-672 and 691-693.

      (2) Figure 1B/C/D: it would be more meaningful if the most important hits (at least in one of these panels) were highlighted (e.g. line with gene-pair named), or visualized separately, so that the reader does not have to read the supplementary table to know what the most important hits were.

      We thank the reviewer for careful concern. We newly added labels for key synergistic gene pairs in figures 1D as reviewer suggested.

      (3) qPCR data shown in Figure S4 is from 1 independent experiment. As these experiments (especially qPCR) can be rather variable and the effect size is not very large, I would highly recommend repeating these experiments, or excluding them, as conclusions from them are not solid.

      We found performing qPCR with many drugs that did not cause substantial synergistic cell death with NVP-ADW742 in figure S5C (figure S4A in previous version of manuscript) will not provide much additional insights. Also, as we were more interested in finding direct regulators of FYN expression, we focused on drugs that inhibit epigenetic regulator that activate transcription. Therefore, we focused on performing FYN qPCR with drug combinations involving GSK-J4 (KDM6 inhibitor) and pinometostat(DOT1L inhibitor). As shown in our newly added figure in S5D, while GSK-J4 inhibited FYN expression, pinometostat failed to do so. Also, we also confirm that knockout of KDM5 or KDM6 reproducibly failed to decrease FYN expression upon TKI treatment (figure S5E and S5G). The new results are discussed in lines 193-198. We hope these additions satisfy the reviewer.

      (4) For validation of synergistic knockouts, it would be helpful for the interpretation to also show the viability/growth of each knockout (or treatment), instead of mostly normalized scores. For example, the reader now has no insight into whether FYN knockout itself already affects cell viability, or not. If it (or EGFR/IGF1R/ABL knockout) would already substantially affect cell viability, a further reduction in cell viability may not be as relevant as when it would not affect cell viability at all.

      We thank the reviewer for pointing this out. We replaced our figure in figure 2A to indicate raw changes in cell viability in each single and double knockout cells in figure S2A. We hope this satisfies the reviewer.

      (5) The curve fitting as in Figure 2G is somewhat misleading. While the curve seems to be forced to go from 1-0, the +PP2 dose-response curve does actually not seem to start at 1, but rather at 0.8, likely resulting from the effect of PP2 as a single treatment, thus, effects may be interpreted as more synergistic than that they truly are.

      The results shown in figure 2G is actually normalized to cells treated or not with PP2 to better reflect the effect of NVP-ADW742, gefitinib and imatinib in the presence of PP2. So viability value starting at 0.8 is not because of the effect of PP2 treatment as single agent (because it is normalized to PP2 treated cells), but is actually because very small dose of particularly NVP-ADW742 resulted in modest decrease in viability. To more accurately depict our findings, we added the data point in figure 2G with TKI dose of 0uM at viability 1. We also added details for normalization of viability in figure legends.

      (6) The readability of the paper could be enhanced by higher-quality images (now the text is quite pixelated).

      We had technical difficulties in converting file types. We have replaced figures for better resolution for all main and supplementary figures.

      (7) The discussion now contains one paragraph about the selectivity of kinase inhibitors, and that repurposing of inhibitors with more relaxed specificity or multi-kinase inhibitors can be beneficial. This does not seem to fall within the scope of the study, as there was no comparison between selective and non-selective inhibitors. It was also not clearly mentioned that the non-selective inhibitors worked better than the gene knockouts, or that for example, KDM3 and KDM4 knockout together worked better than only KDM4 knockout. It is recommended to either remove this paragraph, or rephrase it so that it better fits the actual results

      We agree with the reviewer. We chose to remove this paragraph in lines 308-313.

      (8) The entire paper does not discuss any known functions of FYN. Its function could be very briefly introduced in the results section when highlighting it as an important hit. More importantly, its known role in cancer and especially drug resistance should be discussed in the discussion (see also Public review).

      We thank the reviewer for pointing this out. We added brief description of the role of FYN in cancer malignancy and drug resistance in lines 145-147. Particularly, FYN accumulation by EGR1 transcription factor had been described in the context of imatinib resistant chronic myeloid leukemia (Irwin, Oncotarget, 2015). To address this, we tested whether EGR1 knockout decreases FYN level in MDA-MB-231 (Figure S5A). Notably EGR1 knockout failed to decrease FYN protein level. This result was discussed in lines 187-190.

      (9) Textual changes including:

      a. Line 29 (and others) "Massively parallel combinatorial CRISPR screens": I would rather choose a more descriptive term, such as "combinatorial tyrosine kinase knockout CRISPR screen", which already clarifies the screen used knockouts of (druggable) tyrosine kinases only. Using both "Parallel" and "combinatorial" is somewhat redundant, and "massively" is subjective, in my opinion.

      Manuscript edited as suggested (lines 29, 63, 86, 283). The term “massively parallel” have been removed as they don’t significantly change our scientific findings.

      b. Line 67 (and others): "to identify ... for elimination of TNBC": while this may be its potential implication, this study has identified genes in (mostly) TNBC cell lines and cell line xenografts. Please rephrase to something more within the scope of this research.

      Manuscript edited as suggested (lines 68-69) as “we utilize CombiGEM-CRISPR technology to identify tyrosine kinase inhibitor combinations with synergistic effect in TNBC cell line and xenograft models for potential combinatorial therapy against TNBC.” We hope it satisfies the reviewer.

      c. Line 31 (and others): Please check the capitals of words describing inhibitors, and make them consistent (e.g. Imatinib written with capital I, other inhibitors without capitals).

      We thank the reviewer for catching this error. We changed all “imatinib” and “osimertinib” to lowercase.

      d. Line 71: "... combining PP2, saracatinib (FYN inhibitor), .." ..." Here it is not clear PP2 is a FYN inhibitor, and, as saracatinib is a well-known Src-inhibitor, it is not correct to just say "FYN inhibitor". Better to rephrase to something such as:  "combining PP2 (Lck/Fyn inhibitor), saracatinib (Src/FYN inhibitor).

      As reviewer noted, most Src family kinase inhibitors are not selective against specific member among other Src family members. Therefore, we changed line 73 to “PP2, saracatinib (Src family kinase / FYN inhibitor).”

      e. Line 81: "The resulting library enabled massively parallel screens of pairwise knockouts, .." To clarify this is for the selected kinases only: "The resulting library enabled screens of pairwise knockouts of the 76 tyrosine kinase genes, .."

      Manuscript edited as suggested by the reviewer in line 86.

      f. Line 88 (and others): "after infection" consider rephrasing to "after transduction" as this is more commonly used when using lentiviral vectors only.

      We thank the reviewer for this. Every “infection” that designates lentiviral transduction were changed to “transduction”.

      g. Line 97-99: While being described as "good" correlation, a correlation of the same sgRNA pair, yet in a different order, of r=0.5 does not seem to be very good, neither does a correlation of r=0.74 for biological replicates. Please consider describing in a less subjective way.

      We removed the subjective terms and changed the manuscript as follows: “sgRNA pair (e.g., sgRNA-A + sgRNA-B and sgRNA-B + sgRNA-A) were positively correlated (r = 0.50) and were combined when calculating Z (Fig. S1D). The Z scores for three biological replicates were also correlated with r = 0.74 between replicates #2 and #3 (Fig. S1E).” in lines 97-101.

      h. Lines 92-96 and lines 102-115: The results section here contains quite a lot of technical information. While some information may be directly needed to understand the described results (such as a very short and simple explanation of how to interpret gene interaction score), other information may be more appropriate for the Methods section, to enhance the readability of the paper. Consider simplifying here and giving a more detailed overview in the Methods section. Also, the text is not entirely clear. You seem to give two separate explanations of how the GI scores were calculated (Starting in lines 106 and 111): please rephrase and clearly indicate the connections between those two explanations (in the Methods section).

      We thank the reviewer for valuable suggestion. We moved significant portions of the technical descriptions in methods section. We also clarified the text regarding the procedures for calculating GI scores in lines 385-387.

      i. Line 142: "These findings suggest that gene A could represent an attractive drug target.." "Gene A" should be "FYN"?

      We thank the reviewer for catching this. Indeed, it is “FYN” and we changed it in line 154.

      j. Line 149: Introduce Saracatinib, and make the reader aware that it actually mostly targets Src, and FYN with lower affinity.

      We newly added text in lines 73 and 164 to indicate that saracatinib is an inhibitor against Src family kinases.

      k. Line 469: "by the two sgRNA." "by the two sgRNAs".

      Corrected

      l. Throughout text/figures/figure legends, please check for consistency in the naming of cell lines, compounds, referring to figures etc. (E.g. MDA-MB-231/MDA MB 231/MDAMB-231 ; Fig. 1/Figure 1).

      Corrected. Thank you for catching this error.

      m. In Methods, frequently ug or uL are used instead of µg or µL

      Corrected.

      n. Legend Figure 5: Clarify what A, G, I, D, and P mean.

      Corrected in line 685-686 to: “A: NVP-ADW742, G: gefitinib, I: imatinib, D: doxorubicin, P: Paclitaxel.”

      o. Line 303: What is meant by: "The six variable nucleotides were added in reverse primer for multiplexing". Could you clarify this in the text?

      We apologize for confusion the six nucleotides is index sequence for multiplexed run in NGS. The text in lines 373-374 is edited to: “The six nucleotides described as “NNNNNN” in reverse primer above represents unique index to identify biological replicates in multiplexed NGS run.”

      Reviewer #2 (Recommendations For The Authors):

      To enhance the robustness of the conclusions drawn from this study, certain concerns merit attention.

      Concerns:

      (1) Line 130 indicates that eight synergistic target gene combinations were validated. It would be helpful to clarify the criteria used to select these gene pairs and provide the rationale for studying these specific combinations of genes.

      In fact, we had selected the gene pairs that we had the sgRNAs against available when we performed the experiments, so we did not have very good reason to explain our selections. Instead we added a brief discussion in lines 304-306 that further validations are required for the gene pairs not experimentally tested.

      (2) According to Figure 2C, FYN was identified as crucial among the 30 gene pairs, and its upregulation in TNBC prompted further investigation. It would be informative to discuss the expression levels of TEK, FRK, and FGFR2 in TNBC and explain why these nodes were not studied. Is there existing evidence demonstrating the superiority of FYN over these other genes?

      The similar concern was raised by reviewer #1. The expression levels of TEK, FRK and FGFR2 were relatively low in MDA-MB-231 and TNBCs in general, and we were concerned about the generalizability of these targets for treating TNBC. While the validation of these genes for possible synthetic lethality may lead to valuable insight, this may be beyond scope of this paper. This concern is newly discussed in result and discussion sections in lines 150-154.

      (3) The screening process employed only one cell line, and validation was conducted with only one cell line (Figure 2A). Consider supplementing the findings with more convincing evidence from other breast cancer cell lines to strengthen the conclusions.

      Although the CRISPR screens and primary validations were done with only one cell line, further validations with drug combinations were done in independent cancer cell lines such as Hs578T (figures S4E-J). Also, the possible association of FYN expression in drug tolerant cells were also demonstrated in lung cancer cells. We hope this satisfies the reviewer.

      (4) The network analysis in Figure 2C lacks a description of the methodology used. It would be beneficial to provide a brief explanation of the methods employed for this analysis.

      The network analysis was done manually with the size of each node proportional to the number of gene pairs. We newly added text in figure legend in line 638 to clarify this.

      (5) The significance of gene A mentioned in line 142 is unclear. Please provide a clear explanation or context for the importance of this gene.

      This is a mistake that were also pointed out by reviewer #1. The “gene A” should have been “FYN”. We corrected this in line 154.

      6. In Figure 2J and Figure 2K, it would be more informative to measure the phosphorylation levels of FYN and SRC rather than just their baseline levels. Consider revising the figures accordingly.

      We thank the reviewer for a careful comment. We newly provide supplementary figure S5A to show that phosphorylation level of FYN is increased, but this increase was proportional to the increase in FYN protein level, so the ratio of pFYN/FYN did not change significantly. We discussed this result in lines 187-190.

      (7) Figure S4B lacks biological replicates, which could impact the reliability of the experimental results. Consider adding biological replicates to enhance the robustness of the findings.

      This was also pointed out by reviewer #1. Instead of performing qPCR for all drugs, we focused on validating the decrease in FYN mRNA level for drug combinations that synergistically kill cancer cells. We were also aiming to identify direct mediator of FYN mRNA upregulation, so we focused on drug combination that involves inhibitor of epigenetic regulator that promotes transcription. To this end, we tested the impact of GSK-J4(KDM6 inhibitor) and pinometostat (DOT1L inhibitor) in combination with TKI in regulating FYN expression level. Notably, while GSK-J4 attenuated FYN mRNA accumulation by NVP-ADW742 treatment, pinometostat failed to do so (figure S5C). We newly described these results in lines 192-197 in results section.

      (8) Line 186 indicates that KDM3 knockout was not tested in Figure S5A. It would be helpful to provide an explanation for this omission or consider including the data if available.

      We thank the reviewer for pointing this out. The T7 endonuclease assay results for KDM3, KDM4 and PHF8 are added in figure S6B. All guide RNAs used in the study efficiently generated indel mutations.

      (9) In line 206, KDM4A is introduced, but Figures 3J and 3M had already pointed to KDM4A. The authors did not analyze the ChIP results for other members of the KDM4 family at this point. Please address this inconsistency and provide a rationale for focusing on KDM4A. Additionally, in Figure 3M, consider adding peak labeling to the enriched portion for clarity.

      We welcome the reviewer’s careful concern. KDM4 family enzymes perform catalytically identical reactions, and are thought to be redundant. Therefore, we judged that the most abundantly expression genes among KDM4 family should be the primary target to focus on. To this end, we analyzed the expression levels of KDM4 family genes in supplementary figure S6A. Indeed KDM4A expression was the highest among other KDM4 family genes. We discussed this in results section in lines 218-220.

      (10) The author only indicated the relationship between the H3K9me3 level in the enhancer region and FYN expression. It would be valuable to verify the activity of the enhancers and investigate additional markers such as H3K27ac and H3K4me1. Consider discussing these aspects to provide a more comprehensive understanding.

      Since we and others had shown that histone dementhylases are increased upon drug treatment, we focused on histone methylation marks which are associated with gene repression and whose removal by demethylases are associated with drug resistance. To this end, KDM6 demethylases removing H3K27me3 may serve as attractive alternative. In our newly added supplementary figure S6E, ADW742 treatment did not decrease H3K27me3 level in FYN promoter, indicating that H3K9me3 may be the dominant epigenetic change that modulates FYN expression upon drug treatment. This was briefly discussed in lines 233-235.

      (11) In Figure 4A, the addition of the drug alone does not inhibit tumor growth. Please provide an explanation for this result and consider discussing potential reasons for the observed lack of inhibition.

      The drug dose was adjusted carefully to minimize tumor shrinkage by single drug so that synergistic tumor shrinkage can be clearer.

      (12) Line 208 indicates missing parentheses in the text describing Figure 4C. Please correct the text accordingly to ensure clarity.

      Corrected. Thank you for catching this error.

      (13) The figure legends for Figures 5E, F, G, and H contain errors. Please correct the figure legends to accurately describe the respective figures.

      We thank the reviewer for catching this error. We have changed the figure legends in lines 691-697 to accurately describe the figures.

      (14) It may be beneficial for the authors to divide the results section into several subsections and add headings to improve the overall understanding of the findings.

      This is an excellent suggestion. We divided our results section into subsections and added headings in lines 80, 141, 181, 237 and 251 to help readers understand our findings.

      (15) The authors should include the sgRNA sequences used for gene targeting, along with details of the target genes and negative/positive controls, in the Supplementary Materials to enhance reproducibility and transparency.

      This is a critical point for improving reproducibility of our work. The sgRNA sequences used in the study are newly added in supplementary table S3.

      (16) The resolution of the figures in the Supplementary Materials is too low, which may impede the authors' ability to interpret the data. Consider providing higher-resolution figures for better readability.

      We had similar concern posed by reviewer #1, we provided higher resolution image for all main and supplementary figures.

    1. eLife Assessment

      In this useful study, the authors tested a novel approach to eradicating HIV reservoirs by constructing a herpes simplex virus (HSV)-based therapeutic vaccine and evaluating efficacy in experimental infections of chronically SIV-infected, antiretroviral therapy (ART)-treated macaques. While mean viremia at rebound was lower in the HSV vaccine-treated group, the evidence presented appears to be incomplete, as the group size was small and the viral load at rebound was highly variable. This is a revised paper, but the support for the conclusions, particularly the effect of the HSV-vectored therapeutic vaccine on the SIV reservoir in the SIV-infected macaques, remains limited.

    2. Reviewer #1 (Public review):

      Summary:

      The authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NFkappaB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); this is acknowledged by the authors that no full mechanistic explanation can be given at this moment.

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? A RNA seq analysis is required to show the effect on cellular genes.

      A RNA seq analysis was done in the revised manuscript comparing the effect of HSV-1 and deleted vector in J-LAT cells (Fig S5). More than 2000 genes are upregulated after transduction with the modified vector in comparison with the WT vector. Hence, the specificity of upregulation of SIV genes is questioned. Authors do NOT comment on these findings. In my view it questions the utility of this approach.

      (3) The primate groups are too small and the results to variable to make averages. In Fig 5, the group with ART and saline has two slow rebounders. It is not correct to average those with the single quick rebounder. Here the interpretation is NOT supported by the data.

      Although authors provided some promising SIV DNA data, no additional animals were added. Groups of 3 animals are too small to make any conclusion, especially since the huge variability in response. The average numbers out of 3 are still presented in the paper, which is not proper science.

      No data are given of the effect of the deletion in primates. Now the deleted construct is compared with an empty vector containing no SIV genes. Authors provide new data in Fig S2 on the comparison of WT and modified vector in cells from PLWH, but data are not that convincing. A significant difference in reactivation is seen for LTR in only 2/4 donors and in Gag in 3/4 donors. (Additional question what is meaning of LTR mRNA, do authors relate to genomic RNA??)

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      The RNA seq data add on to this worry and should at least be discussed.

    3. Reviewer #2 (Public review):

      Summary:

      In this article Wen et. al., describe the development of a 'proof-of-concept' bi-functional vector based out of HSV-deltaICP-34.5's ability to purge latent HIV-1 and SIV genomes from cells. They show that co-infection of latent J-lat T-cell lines with a HSV-deltaICP-34.5 vector can reactivate HIV-1 from a latent state. Over- or stable expression of ICP 34.5 ORF in these cells can arrest latent HIV-1 genomes from transcription, even in the presence of latency reversal agents. ICP34.5 can co-IP with- and de-phosphorylate IKKa/b to block its interaction with NF-k/B transcription factor. Additionally, ICP34.5 can interact with HSF1 which was identified by mass-spec. Thus, the authors propose that the latency reversal effect of HSV-deltaICP-34.5 in co-infected JLat cells is due to modulatory effects on the IKKa/b-NF-kB and PP1-HSF-1 pathway.

      Next the authors cleverly construct a bifunctional HSV based vector with deleted ICP34.5 and 47 ORFs to purge latency and avoid immunological refluxes, and additionally expand the application of this construct as a vaccine by introducing SIV genes. They use this 'vaccine' in mouse models and show the expected SIV-immune responses. Experiments in rhesus macaques (RM), further elicit potential for their approach to reactivate SIV genomes and at the same time block their replication by antibodies. What was interesting in the SIV experiments is that the dual-functional vector vaccine containing sPD1- and SIV Gag/Env ORFs effectively delayed SIV rebound in RMs and in some cases almost neutralized viral DNA copy detection in serum. Very promising indeed, however there are some questions I wish the authors explored to answer, detailed below.

      Overall, this is an elegant and timely work demonstrating the feasibility of reducing virus rebound in animals, and potentially expand to clinical studies. The work was well written, and sections were clearly discussed.

      Strengths:

      The work is well designed, rationale explained and written very clearly for lay readers.<br /> Claims are adequately supported by evidence and well designed experiments including controls.

      Weaknesses:

      (1) It looks like ICP0 is also involved in latency reversal effects. More follow-up work will be required to test if this is in fact true.

      (2) It is difficult to estimate the depletion of the latent viral reservoir. The authors have tried to address this issue. A more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility to obtain such a result is not clearly demonstrated.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimen taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect higher level of viral loads being released in response to the vaccine in question.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NF-kB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Thank you for your kind comments and suggestions, which are very helpful in improving our manuscript. We have carefully revised our manuscript and performed additional experiments accordingly, and we now think this version has been substantially improved for your reconsideration.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. Why is no eGFP readout given in Figure 1C as for WT HSV? The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); so the story seems incomplete.

      Thank you for your careful review and kind reminder.

      (1) We are sorry for the misunderstanding of Figure 1C. In the experiment of Figue 1C, we used an HSV-1 17 strain containing GFP (HSV-GFP) and HSV-DICP34.5 (recombinant HSV-1 17 strain with ICP34.5 deletion based on HSV-GFP) to reactivate the HIV latency cell line (J-Lat 10.6 cell). Since detecting GFP cannot distinguish between HSV infection and HIV reactivation, we assessed the reactivation by measuring the mRNA levels of HIV LTR upon stimulation with either HSV-GFP or HSV-ΔICP34.5. Actually, in Figure 1B, we had verified the reactivation efficacy by infecting J-Lat 10.6 cells with the HSV-1 17 strain containing GFP (HSV-GFP) and found significant upregulation of mRNA levels of HIV-1 LTR, Tat, Gag, Vif, and Vpr. We have adjusted the corresponding descriptions accordingly in the revised manuscript.

      (2) We agree with your insightful mention that the mechanism underlying increased activation by HSV-ΔICP34.5 is worthy to be further explored in the future study. In this study, we found that ICP34.5 play an antagonistic role with the reactivation of HIV latency by HSV-1 mainly through the modulation of host NF-κB and HSF1 pathways, while HSV-1 (especially HSV-ΔICP34.5) might reactivate HIV latency through NF-κB, HSF1, and other yet-to-be-determined mechanisms. Thus, ICP34.5 overexpression can only a partial effect on the reduction of the HIV latency reactivation by HSV-1. We have mentioned this issue in the revised “Discussion section”. Intriguingly, these findings collectively indicated that ICP34.5 might play an antagonistic role in the reactivation of HIV by HSV-1, and thus our modified HSV-DICP34.5 constructs can effectively reactivate HIV/SIV latency through the release of imprisonment from ICP34.5. However, ICP34.5 overexpression had only a partial effect on the reduction of the HIV latency reactivation, indicating that HSV-DICP34.5-based constructs can also reactivate HIV latency through other yet-to-be-determined mechanisms. (Lines 334 to 340).

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your questions and suggestions.

      (1) It’s well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies (in gene therapy and oncolytic virotherapy) have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (2) In our study, we found both adenovirus and vaccinia virus cannot reactivate HIV latency (Figure S3). In addition, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4). Thus, these data suggested the reactivation of HIV latency by HSV-1 might be virus-specific. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (3) To explore the mechanism of reactivating viral latency by HSV-DICP34.5-based constructs, we performed RNA-seq analysis (Figure S5). We have added the corresponding description accordingly in the revised manuscript.

      (3) The primate groups are too small and the results to variable to make averages. In Figure 5, the group with ART and saline has two slow rebounders. It is not correct to average those with a single quick rebounder. Here the interpretation is NOT supported by the data.

      We agree with you that this is a pilot study with limited numbers of rhesus macaques. Although the number of macaques was relatively limited, these nine macaques were distributed evenly based on the background level of age, sex, weight, CD4 count, and viral load (VL) (Table S2). All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our Chinese rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. We think the results of this pilot study were very promising for further studies which will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      As for your question regarding “the two animals with low VL and slow rebound”, our explanation is following: As mentioned above, these macaques were distributed evenly based on the background level of CD4 count and VL (Table S2), and then there were different change of viral load and viral rebound in different groups. Thus, we think these data can support our interpretation. Moreover, our conclusion can also be supported from at least three evidences.

      (1) The VL in the ART+saline group promptly rebounded after ART discontinuation, with an average 8.63-fold increase in the rebounded peak VL compared with the pre-ART VL (Figure 5A, D and E). However, plasma VL in the ART+HSV-sPD1-SIVgag/SIVenv group exhibited a delayed rebound interval (Figure 5B-D).

      (2) There was a lower rebounded peak VL than pre-ART VL in the ART+HSV-sPD1-SIVgag/SIVenv group (average 12.20-fold decrease), while a higher rebounded peak VL than pre-ART VL in the ART+HSV-empty group (average 2.74-fold increase) (Figure 5E).

      (3) We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G).

      Thank you for your understanding.

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      Thank you for your kind question comment and question. We confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 in primary CD4+ T cells from people living with HIV (PLWH) (Figure S2). As mentioned above, previous studies have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this article, Wen et. al. describe the development of a 'proof-of-concept' bi-functional vector based on HSV-deltaICP-34.5's ability to purge latent HIV-1 and SIV genomes from cells. They show that co-infection of latent J-lat T-cell lines with an HSV-deltaICP-34.5 vector can reactivate HIV-1 from a latent state. Over- or stable expression of ICP 34.5 ORF in these cells can arrest latent HIV-1 genomes from transcription, even in the presence of latency reversal agents. ICP34.5 can co-IP with- and de-phosphorylate IKKa/b to block its interaction with NF-k/B transcription factor. Additionally, ICP34.5 can interact with HSF1 which was identified by mass-spec. Thus, the authors propose that the latency reversal effect of HSV-deltaICP-34.5 in co-infected JLat cells is due to modulatory effects on the IKKa/b-NF-kB and PP1-HSF-1 pathway.

      Next, the authors cleverly construct a bifunctional HSV-based vector with deleted ICP34.5 and 47 ORFs to purge latency and avoid immunological refluxes, and additionally, expand the application of this construct as a vaccine by introducing SIV genes. They use this 'vaccine' in mouse models and show the expected SIV-immune responses. Experiments in rhesus macaques (RM), further elicit the potential for their approach to reactivate SIV genomes and at the same time block their replication by antibodies. What was interesting in the SIV experiments is that the dual-functional vector vaccine containing sPD1- and SIV Gag/Env ORFs effectively delayed SIV rebound in RMs and in some cases almost neutralized viral DNA copy detection in serum. Very promising indeed, however, there are some questions I wish the authors had explored to get answers to, detailed below.

      Overall, this is an elegant and timely work demonstrating the feasibility of reducing virus rebound in animals, with the potential to expand to clinical studies. The work was well-written, and sections were clearly discussed.

      Strengths:

      The work is well designed, rationale explained, and written very clearly for lay readers.<br /> Claims are adequately supported by evidence and well-designed experiments including controls.

      Thank you for your nice comments regarding our work.

      Weaknesses:

      (1) While the mechanism of ICP34.5 interaction and modulation of the NF-kB and HSF1 pathways are shown, this only proves ICP34.5 interactions but does not give away the mechanism of how the HSV-deltaICP-34.5 vector purges HIV-1 latency. What other components of the vector are required for latency reversal? Perhaps serial deletion experiments of the other ORFs in the HSV-deltaICP-34.5 vector might be revealing.

      Thank you for your valuable suggestion. In fact, we are currently further exploring some potential viral genes of HSV-1 that might play a role in the reactivation of HIV latency. We have found that the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4), showing that ICP0 might play a vital role for the reactivation. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (2) The efficacy of the HSV vaccine vectors was evaluated in Rhesus Macaque model animals. Animals were chronically infected with SIV (a parent of HIV), treated with ART, challenged with bi-functional HSV vaccine or controls, and discontinued treatment, and the resulting virus burden and immune responses were monitored. The animals showed SIV Gag and Env-specific immune responses, and delayed virus rebound (however rebound is still there), and below-detection viral DNA copies. What would make a more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility of obtaining such a result is not clearly demonstrated.

      Thank you for your valuable mention. We have now provided more data about this issue. We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G). We have added the corresponding description in the revised manuscript.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimens taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect a higher level of viral loads to be released in response to the vaccine in question.

      Thanks for your kind mention and suggestion. We performed the following cell experiment to address this issue. Primary CD4+ T cells from people living with HIV (PLWH) were isolated, and then infected with HSV or HSV-∆ICP34.5 constructs. As expected, we confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 (Figure S2). Thank you.

      (4) How do the authors imagine neutralizing HIV-1 envelope epitopes by a similar strategy? A discussion of this point may also help.

      Thank you for your kind comment. We have added the corresponding discussion in the revised manuscript. “The current consensus on HIV/AIDS vaccines emphasizes the importance of simultaneously inducing broadly neutralizing antibodies and cellular immune responses. Therefore, we believe that incorporating the induction of broadly neutralizing antibodies into our future optimizing approaches may lead to better therapeutic outcomes.” (Lines 384 to 388)

      (5) I thought the empty HSV-vector control also elicited somewhat delayed kinetics in virus rebound and neutralization, can the authors comment on why this is the case?

      Thank you for your careful review and mention. We agree with you that the HSV-1 empty vector does exhibit somewhat a delayed rebound. We think the possible reason is: Although the empty HSV-vector cannot elicit SIV-specific CTL responses, it effectively activates the latent SIV reserviors, and then these activated virions can be partially killed by ART drugs. Therefore, even without carrying HIV/SIV antigens, somewhat delayed kinetics in virus rebound may be observed. Thank you.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide toxicity data for HSV transduction after deleting ICP34.5 and provide an explanation of why overexpression of ICP34.5 has such a small effect.

      Thank you for your questions and suggestions. As mentioned above, we now provided data for the safety of HSV-DICP34.5-based constructs.

      (1) It’s well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies (in gene therapy and oncolytic virotherapy) have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (2) We agree with your insightful mention that the mechanism underlying increased activation by HSV-ΔICP34.5 is worthy to be further explored in the future study. In this study, we found that ICP34.5 play an antagonistic role with the reactivation of HIV latency by HSV-1 mainly through the modulation of host NF-κB and HSF1 pathways, while HSV-1 (especially HSV-ΔICP34.5) might reactivate HIV latency through NF-κB, HSF1, and other yet-to-be-determined mechanisms. Thus, ICP34.5 overexpression can only a partial effect on the reduction of the HIV latency reactivation by HSV-1. We have mentioned this issue in the revised “Discussion section”. “Intriguingly, these findings collectively indicated that ICP34.5 might play an antagonistic role in the reactivation of HIV by HSV-1, and thus our modified HSV-DICP34.5 constructs can effectively reactivate HIV/SIV latency through the release of imprisonment from ICP34.5. However, ICP34.5 overexpression had only a partial effect on the reduction of the HIV latency reactivation, indicating that HSV-DICP34.5-based constructs can also reactivate HIV latency through other yet-to-be-determined mechanisms.” (Lines 334 to 340).

      (2) How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your questions and suggestions.

      (1) In our study, we found both adenovirus and vaccinia virus cannot reactivate HIV latency (Figure S3). In addition, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4). Thus, these data suggested the reactivation of HIV latency by HSV-1 might be virus-specific. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (2) To explore the mechanism of reactivating viral latency by HSV-DICP34.5-based constructs, we performed RNA-seq analysis (Figure S5). Results showed that there were numerous differentially expressed genes (DEGs) in response to HSV-ΔICP34.5 infection. Among them, 2288 genes were upregulated, and 611 genes were downregulated. GO analysis showed the enrichment of these DEGs in cellular cycle, cellular development, and cellular proliferation, and KEGG enrichment analysis indicated the enrichment in pathways such as cellular cycle and cytokine-cytokine receptor interaction. We have added the corresponding description accordingly in the revised manuscript.

      (3) A comparison in primates has to be given for constructs with or without ICP34.5 to validate cell culture data (what is an empty vector?)

      Thank you for your reminder. In the revised manuscript, we performed the following cell experiment to address this issue. Primary CD4+ T cells from people living with HIV (PLWH) were isolated, and then infected with HSV or HSV-∆ICP34.5 constructs. As expected, we confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 (Figure S2). Thank you.

      (4) Legends should be improved in writing and content.

      Thank you for your kind mention. In the revised version, we have improved both the manuscript content and the legends of all Figures have been carefully revised in writing and content. Thank you.

      (5) The primate groups should be enlarged before any reliable conclusions can be made. Inflammatory/tox data should be provided.

      Thank you for your question.

      (1) As mentioned above, we agree with you that this is a pilot study with limited numbers of rhesus macaques. Although the number of macaques was relatively limited, these nine macaques were distributed evenly based on the background level of age, sex, weight, CD4 count, and viral load (VL) (Table S2). All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our Chinese rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. We think the results of this pilot study were very promising for further studies which will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      (2) As well known, ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (6) Discuss the potential of inflammatory HSV vaccines to be used in PLWH without clinical symptoms.

      Thank you for your mention. As discussed above, we found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (Figure 1D, Figure S1), and we also found that HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      I think the authors have done due diligence to the experimental system, and collected evidence to show the feasibility of delaying virus rebound in macaques. However, I would encourage the authors to perform experiments that can back up the claim that delayed virus rebound is due to neutralization effects, or perhaps due to a reduction in viral reservoir. I believe insights into this process will add rigor, and push the relevance of the study to the next level.

      Thank you for your nice comment and valuable suggestion. We have now provided more data about this issue. We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G). We also discussed that incorporating the induction of broadly neutralizing antibodies into our future optimizing approaches may lead to better therapeutic outcomes in the revised Discussion section. We have added the corresponding description in the revised manuscript. Thank you.

      Altogether, all of the above comments and suggestions are very helpful in improving our manuscript. We have taken these comments into account seriously and try our best to address these questions point-by-point. After making extensive revisions, we now submit this revised manuscript for your re-consideration. Thank you again for all of your comments and suggestions.

    1. eLife Assessment

      The results highlight an important physiological function of PGAM in the differentiation and suppressive activity of Treg cells by regulating serine synthesis. This role is proposed to intersect with glycolysis and one-carbon metabolism. Although the study's conclusion is supported by solid evidence from in-vitro cellular and in-vivo mouse models, there are some weaknesses and the reviewers suggested ways to improve the manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      This work provides a new potential tool to manipulate Tregs function for therapeutic use. It focuses on the role of PGAM in Tregs differentiation and function. The authors, interrogating publicly available transcriptomic and proteomic data of human regulatory T cells and CD4 T cells, state that Tregs express higher levels of PGAM (at both message and protein levels) compared to CD4 T cells. They then inhibit PGAM by using a known inhibitor ECGC and show that this inhibition affects Tregs differentiation. This result was also observed when they used antisense oligonucleotides (ASOs) to knockdown PGAM1.

      PGAM1 catalyzes the conversion of 3PG to 2PG in the glycolysis cascade. However, the authors focused their attention on the additional role of 3PG: acting as starting material for the de novo synthesis of serine.

      They hypothesized that PGAM1 regulates Tregs differentiation by regulating the levels of 3PG that are available for de novo synthesis of serine, which has a negative impact on Tregs differentiation. Indeed, they tested whether the effect on Tregs differentiation observed by reducing PGAM1 levels was reverted by inhibiting the enzyme that catalyzes the synthesis of serine from 3PG.

      The authors continued by testing whether both synthesized and exogenous serine affect Tregs differentiation and continued with in vivo experiments to examine the effects of dietary serine restriction on Tregs function.

      In order to understand the mechanism by which serine impacts Tregs function, the authors assessed whether this depends on the contribution of serine to one-carbon metabolism and to DNA methylation.

      The authors therefore propose that extracellular serine and serine whose synthesis is regulated by PGAM1 induce methylation of genes Tregs associated, downregulating their expression and overall impacting Tregs differentiation and suppressive functions.

      Strengths:

      The strength of this paper is the number of approaches taken by the authors to verify their hypothesis. Indeed, by using both pharmacological and genetic tools in in vitro and in vivo systems they identified a potential new metabolic regulation of Tregs differentiation and function.

      Weaknesses:

      Using publicly available transcriptomic and proteomic data of human T cells, the authors claim that both ex vivo and in vitro polarized Tregs express higher levels of PGAM1 protein compared to CD4 T cells (naïve or cultured under Th0 polarizing conditions). The experiments shown in this paper have all been carried out in murine Tregs. Publicly available resources for murine data (ImmGen -RNAseq and ImmPRes - Proteomics) however show that Tregs do not express higher PGAM1 (mRNA and protein) compared to CD4 T cells. It would be good to verify this in the system/condition used in the paper.

      It would also be good to assess the levels of both PGAM1 mRNA and protein in Tregs PGAM1 knockdown compared to scramble using different methods e.g. qPCR and western blot. However, due to the high levels of cell death and differentiation variability, that would require cells to be sorted.

      It is not specified anywhere in the paper whether cells were sorted for bulk experiments. Based on the variability of cell differentiation, it would be good if this was mentioned in the paper as it could help to interpret the data with a different perspective.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have tried to determine the regulatory role of Phosphoglycerate mutate (PGAM), an enzyme involved in converting 3-phosphoglycerate to 2-phosphoglycerate in glycolysis, in differentiation and suppressive function of regulatory CD4 T cells through de novo serine synthesis. This is done by contributing one carbon metabolism and eventually epigenetic regulation of Treg differentiation.

      Strengths:

      The authors have rigorously used inhibitors and antisense RNA to verify the contribution of these pathways in Treg differentiation in-vitro. This has also been verified in an in-vivo murine model of autoimmune colitis. This has further clinical implications in autoimmune disorders and cancer.

      Weaknesses:

      The authors have used inhibitors to study pathways involved in Treg differentiation. However, they have not studied the context of overexpression of PGAM, which was the actual reason to pursue this study.

    1. eLife Assessment

      This valuable study uses single-molecule imaging for characterization of factors controlling the localization, mobility, and function of RNase E in E. coli, a key bacterial ribonuclease central for mRNA catabolism. While the supporting evidence for the differential roles of RNAse E's membrane targeting sequence and the C-terminal domain (CTD) is solid, the work could be further strengthened by clarifying some experimental discrepancies, restructuring the narration order, and exploring the generality of some observations and their physical basis, such as the membrane-RNase E interactions and the unstructured nature of the RNase E C-terminal domain. This interdisciplinary study will be of interest to cell biologists, microbiologists, biochemists, and biophysicists.

    2. Reviewer #1 (Public review):

      This paper measures the positioning and diffusivity of RNaseE-mEos3.2 proteins in E. coli as a function of rifampicin treatment, compares RNaseE to other E. coli proteins, and measures the effect of changes in domain composition on this localization and motion. The straightforward study is thoroughly presented, including very good descriptions of the imaging parameters and the image analysis/modeling involved, which is good because the key impact of the work lies in presenting this clear methodology for determining the position and mobility of a series of proteins in living bacteria cells.

      My key notes and concerns are listed below; the most important concerns are indicated with asterisks.

      (1) The very start of the abstract mentions that the domain composition of RNase E varies among species, which leads the reader to believe that the modifications made to E. coli RNase E would be to swap in the domains from other species, but the experiment is actually to swap in domains from other E. coli proteins. The impact of this work would be increased by examining, for instance, RNase E domains from B. subtilis and C. crescentus as mentioned in the introduction.

      (2) Furthermore, the introduction ends by suggesting that this work will modulate the localization, diffusion, and activity of RNase E for "various applications", but no applications are discussed in the discussion or conclusion. The impact of this work would be increased by actually indicating potential reasons why one would want to modulate the activity of RNase E.

      (3) Lines 114 - 115: "The xNorm histogram of RNase E shows two peaks corresponding to each side edge of the membrane": "side edge" is not a helpful term. I suggest instead: "...corresponding to the membrane at each side of the cell"

      (4) ***A key concern of this reviewer is that, since membrane-bound proteins diffuse more slowly than cytoplasmic proteins, some significant undercounting of the % of cytoplasmic proteins is expected due to decreased detectability of the faster-moving proteins. This would not be a problem for the LacZ imaging where essentially all proteins are cytoplasmic, but would significantly affect the reported MB% for the intermediate protein constructs. How is this undercounting considered and taken into account? One could, for instance, compare LacZ vs. LacY (or RNase E) copy numbers detected in fixed cells to those detected in living cells to estimate it.

      (5) ***The rifampicin treatment study is not presented well. Firstly, it is found that LacY diffuses more rapidly upon rifampicin treatment. This change is attributed to changes in crowding at the membrane due to mRNA. Several other things change in cells after adding rif, including ATP levels, and these factors should be considered. More importantly, since the change in the diffusivity of RNaseE is similar to the change in diffusivity of LacY, then it seems that most of the change in RNaseE diffusion is NOT due to RNaseE-mRNA-ribosome binding, but rather due to whatever crowding/viscosity effects are experienced by LacY (along these lines: the error reported for D is SEM, but really should be a confidence interval, as in Figure 1, to give the reader a better sense of how different (or similar) 1.47 and 1.25 are).

      (6) Lines 185-189: it is surprising to me that the CTD mutants both have the same change in D (5.5x and 5.3x) relative to their full-length counterparts since D for the membrane-bound WT protein should be much less sensitive to protein size than D for the cytoplasmic MTS mutant. Can the authors comment?

      (7) Lines 190-194. Again, the confidence intervals and experimental uncertainties should be considered before drawing biological conclusions. It would seem that there is "no significant change" in the rhlB and pnp mutants, and I would avoid saying "especially for ∆pnp" when the same conclusion is true for both (one shouldn't say 1.04 is "very minute" and 1.08 is just kind of small - they are pretty much the same within experiments like this).

      (8) ***Lines 221-223 " This is remarkable because their molecular masses (and thus size) are expected to be larger than that of MTS" should be reconsidered: diffusion in a membrane does not follow the Einstein law (indeed lines 223-225 agree with me and disagree with lines 221-223). (Also the discussion paragraph starting at line 375). Rather, it is generally limited by the interactions with the transmembrane segments with the membrane. So Figure 3D does not contain the right data for a comparison, and what is surprising to me is that MTS doesn't diffuse considerably faster than LacY2.

      (9) ***The logical connection between the membrane-association discussion (which seems to ignore associations with other proteins in the cell) and the preceding +/- rifampicin discussion (which seeks to attribute very small changes to mRNA association) is confusing.

      (10) Separately, the manuscript should be read through again for grammar and usage. For instance, the title should be: "Single-molecule imaging reveals the *roles* of *the* membrane-binding motif and *the* C-terminal domain of RNase E in its localization and diffusion in Escherichia coli". Also, some writing is unwieldy, for instance, "RNase E's D" would be easier to read if written as D_{RNaseE}. (underscore = subscript), and there is a lot of repetition in the sentence structures.

    3. Reviewer #2 (Public review):

      Summary:

      Troyer and colleagues have studied the in vivo localisation and mobility of the E.coli RNaseE (a protein key for mRNA degradation in all bacteria) as well as the impact of two key protein segments (MTS and CTD) on RNase E cellular localisation and mobility. Such sequences are important to study since there is significant sequence diversity within bacteria, as well as a lack of clarity about their functional effects. Using single-molecule tracking in living bacteria, the authors confirmed that >90% of RNaseE localised on the membrane, and measured its diffusion coefficient. Via a series of mutants, they also showed that MTS leads to stronger membrane association and slower diffusion compared to a transmembrane motif (despite the latter being more embedded in the membrane), and that the CTD weakens membrane binding. The study also rationalised how the interplay of MTS and CTD modulate mRNA metabolism (and hence gene expression) in different cellular contexts.

      Strengths:

      The study uses powerful single-molecule tracking in living cells along with solid quantitative analysis, and provides direct measurements for the mobility and localisation of E.coli RNaseE, adding to information from complementary studies and other bacteria. The exploration of different membrane-binding motifs (both MTS and CTD) has novelty and provides insight on how sequence and membrane interactions can control function of protein-associated membranes and complexes. The methods and membrane-protein standards used contribute to the toolbox for molecular analysis in live bacteria.

      Weaknesses:

      The Results sections can be structured better to present the main hypotheses to be tested. For example, since it is well known that RNase E is membrane-localised (via its MTS), one expects its mobility to be mainly controlled by the interaction with the membrane (rather than with other molecules, such as polysomes and the degradosome). The results indeed support this expectation - however, the manuscript in its current form does not lay down the dominant hypothesis early on (see second Results chapter), and instead considers the rifampicin-addition results as "surprising"; it will be best to outline the most likely hypotheses, and then discuss the results in that light.

      Similarly, the authors should first discuss the different modes of interaction for a peripheral anchor vs a transmembrane anchor, outline the state of knowledge and possibilities, and then discuss their result; in its current version, the ms considers the LacY2 and LacY6 faster diffusion compared to MTS "remarkable", but considering the very different mode of interaction, there is no clear expectation prior to the experiment. In the same section, it would be good to see how the MD simulations capture the motion of LacY6 and LacY12, since this will provide a set of results consistent with the experimental set.

      The work will benefit from further exploration of the membrane-RNase E interactions; e.g., the effect of membrane composition is explored by just using two different growth media (which on its own is not a well-controlled setting), and no attempts to change the MTS itself were made. The manuscript will benefit from considering experiments that explore the diversity of RNaseE interactions in different species; for example, the authors may want to consider the possibility of using the membrane-localisation signals of functional homologs of RNaseE in different bacteria (e.g., B. subtilis). It would be good to look at the effect of CTD deletions in a similar context (i.e., in addition to the MTS substitution by LacY2 and LacY6).

      The manuscript will benefit from further discussion of the unstructured nature of the CTD, especially since the RNase CTD is well known to form condensates in Caulobacter crescentus; it is unclear how the authors excluded any roles for RNaseE phase separation in the mobility of RNaseE in E.coli cells.

      Some statements in the Discussion require support with example calculations or toning down substantially. Specifically, it is not clear how the authors conclude that RNaseE interacts with its substrate for a short time (and what this time may actually be); further, the speculation about the MTS "not being an efficient membrane-binding motif for diffusion" lacks adequate support as it stands.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Troyer et al quantitatively measured the membrane localization and diffusion of RNase E, an essential ribonuclease for mRNA turnover as well as tRNA and rRNA processing in bacteria cells. Using single-molecule tracking in live E. coli cells, the authors investigated the impact of membrane targeting sequence (MTS) and the C-terminal domain (CTD) on the membrane localization and diffusion of RNase E under various perturbations. Finally, the authors tried to correlate the membrane localization of RNase E to its function on co- and post-transcriptional mRNA decay using lacZ mRNA as a model.

      The major findings of the manuscripts include:

      (1) WT RNase E is mostly membrane localized via MTS, confirming previous results. The diffusion of RNase E is increased upon removal of MTS or CTD, and more significantly increased upon removal of both regions.

      (2) By tagging RNase E MTS and different lengths of LacY transmembrane domain (LacY2, LacY6, or LacY12) to mEos3.2, the results demonstrate that short LacY transmembrane sequence (LacY2 and LacY6) can increase the diffusion of mEos3.2 on the membrane compared to MTS, further supported by the molecular dynamics simulation. A similar trend was roughly observed in RNase E mutants with MTS switched to LacY transmembrane domains.

      (3) The removal of RNase E MTS significantly increases the co-transcriptional degradation of lacZ mRNA, but has minimal effect on the post-transcriptional degradation of lacZ mRNA. Removal of CTD of RNase E overall decreases the mRNA decay rates, suggesting the synergistic effect of CTD on RNase E activity.

      Strengths:

      (1) The manuscript is clearly written with very detailed method descriptions and analysis parameters.

      (2) The conclusions are mostly supported by the data and analysis.

      (3) Some of the main conclusions are interesting and important for understanding the cellular behavior and function of RNase E.

      Weaknesses:

      (1) Some of the observations show inconsistent or context-dependent trends that make it hard to generalize certain conclusions. Those points are worth discussion at least. Examples include:

      (a) The authors conclude that MTS segment exhibits reduced MB% when succinate is used as a carbon source compared to glycerol, whereas LacY2 segment maintains 100% membrane localization, suggesting that MTS can lose membrane affinity in the former growth condition (Ln 341-342). However, the opposite case was observed for the WT RNase E and RNase E-LacY2-CTD, in which RNase E-LacY2-CTD showed reduced MB% in the succinate-containing M9 media compared to the WT RNase E (Ln 264-267). This opposite trend was not discussed. In the absence of CTD, would the media-dependent membrane localization be similar to the membrane localization sequence or to the full-length RNase E?

      (b) When using mEos3.2 reporter only, LacY2 and LacY6 both increase the diffusion of mEos3.2 compared to MTS. However, when inserting the LacY transmembrane sequence into RNase E or RNase E without CTD, only the LacY2 increases the diffusion of RNase E. This should also be discussed.

      (2) The authors interpret that in some cases the increase in the diffusion coefficient is related to the increase in the cytoplasm localization portion, such as for the LacY2 inserted RNase E with CTD, which is rational. However, the authors can directly measure the diffusion coefficient of the membrane and cytoplasm portion of RNase E by classifying the trajectories based on their localizations first, rather than just the ensemble calculation.

      (3) The error bars of the diffusion coefficient and MB% are all SEM from bootstrapping, which are very small. I am wondering how much of the difference is simply due to a batch effect. Were the data mixed from multiple biological replicates? The number of biological replicates should also be reported.

      (4) Some figures lack p-values, such as Figures 4 and 5C-D. Also, adding p-values directly to the bar graphs will make it easier to read.

    1. eLife Assessment

      This important study reports single-nucleus multiomics-based profiling of transcriptome and chromatin accessibility of mouse XX and XY primordial germ cells (PGCs). The main conclusions of this study, which will be of interest to developmental and reproductive biologists, as well as andrologists, are supported by convincing data.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses single nucleus multi-omics to profile the transcriptome and chromatin accessibility of mouse XX and XY primordial germ cells (PGCs) at three time points spanning PGC sexual differentiation and entry of XX PGCs into meiosis (embryonic days 11.5-13.5). They find that PGCs can be clustered into sub-populations at each time point, with higher heterogeneity among XX PGCs and more switch-like developmental transitions evident in XY PGCs. In addition, they identify several transcription factors that appear to regulate sex-specific pathways as well as cell-cell communication pathways that may be involved in regulating XX vs XY PGC fate transitions. The findings are important and overall rigorous. The study could be further improved by better connection to the biological system, including putting the transcriptional heterogeneity of XX PGCs in the context of findings that meiotic entry is spatially asynchronous in the fetal ovary and further addressing the role of retinoic acid signaling. Overall, this study represents and advance in germ cell regulatory biology and will be a highly used resource in the field of germ cell development.

      Strengths:

      (1) The multi-omics data is mostly rigorously collected and carefully interpreted.

      (2) The dataset is extremely valuable and helps to answer many long-standing questions in the field.

      (3) In general, the conclusions are well anchored in the biology of the germ line in mammals.

      Comments on revised version:

      Most of my concerns have been addressed in the revised manuscript. I have one remaining concern but I believe this is important in order for the paper to be fully appreciated:

      In Figures 2a, 2e, 3a, and 3e, the visualization scheme is very difficult to follow, and has not been updated or improved in the revised manuscript. It's very hard to see the colors corresponding to average expression for many genes because the circles are so small. The yellow color is hard to see and makes it hard to estimate the size of the circle. This issue is particularly egregious in Figure 2a for the data relating to ZKSCAN5, which is specifically highlighted in the text in lines 421-426. This data must be shown in a more convincing way in order to make the claims. An update to the visualization, including color scheme, is very strongly recommended; it is not difficult and would substantially improve the ability of these panels to communicate their message.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Alexander et al describes a careful and rigorous application of multiomics to mouse primordial germ cells (PGCs) and their surrounding gonadal cells during the period of sex differentiation.

      Strengths:

      In thoughtfully designed figures, the authors identify both known and new candidate gene regulatory networks in differentiating XX and XY PGCs and sex-specific interactions of PGCs with supporting cells. In XY germ cells, novel findings include the predicted set of TFs regulating Bnc2, which is known to promote mitotic arrest, as well as the TFs POU6F1/2 and FOXK2 and their predicted targets that function in mitosis and signal transduction. In XX germ cells, the authors deconstruct the regulation of the premeiotic replication factor Stra8, which reveals TFs involved in meiosis, retinoic acid signaling, pluripotency and epigenetics among predictions; this finding, along with evidence supporting regulatory potential of retinoic acid receptors in meiotic gene expression is an important addition to the debate over the necessity of retinoic acid in XX meiotic initiation. In addition, a self-regulatory network of other TFs is hypothesized in XX differentiating PGCs, including TFAP2c, TCF5, ZFX, MGA and NR6A1, which is predicted to turn on meiotic and Wnt signaling targets. Finally, analysis of PGC-support cell interactions during sex differentiation reveals substantially more interactions in XX, via WNTs and BMPs, as well as some new signaling pathways that predominate in XY PGCs including ephrins, CADM1, Desert Hedgehog and matrix metalloproteases. This dataset will be an excellent resource for the community, motivating functional studies and serving as a discovery platform.

      Weaknesses:

      While the authors performed all of their comparisons between XX versus XY datasets at each timepoint, a more systematic analysis of expression and accessibility changes across time for each sex would be valuable. It remains possible that common mechanisms of differentiation to XX and XY could be missing from this analysis that focused on sex-specific differences.

      Specific Questions:

      (1) Line 461: "the population of E13.5 XX PGCs displaying the strongest Stra8 expression levels corresponded to the same population of XX PGCs with the highest module score of early meiotic prophase I genes (Fig. 3c; Supplementary Fig. 3a-b)" however the Stra8+ XX PGCs that do not robustly express meiotic genes should be examined to understand more about their differentiation potential. The authors are well-poised to identify the likely trajectories available to cell subsets in their dataset, and not doing so is a missed opportunity.

      (2) The authors state that "we found that Stra8, Rec8, Rnf2, Sycp1, Sycp2, Ccnb3, and Zglp1 contain the RA receptor motifs in their regulatory sequences (Supplementary Figure 4g)." What is the strength of the RA->meiosis pathway compared to other mechanisms regulating meiosis? Perhaps the authors could take this analysis further with the following questions: (1) ask whether meiotic genes more enriched in RA motifs compared to other expressed genes or other motifs (2) compare the strength of peak-gene correlations for all peaks containing RA receptor motifs vs. those with peaks for Zglp1, Rnf2, etc binding. The strengths of these correlations could provide clues to how much gene expression varies in response to RA exposure vs. modulation of these other factors and thus tell us something about how much RA is playing a role.

      (3) In figure 4, the shift from promoters in E11.5 XX PGCs to distal intergenic regions is fascinating. What can we learn about epigenetic reprogramming/methylation changes across gene bodies?

      (4) The overlap between gene targets of TCFL5 with other highly expressed TFs differentially upregulated in E13.5 XX PGCs over XY suggests ambiguity regarding its role as a central or high-level regulator of differentiation; as in vivo validation has not been performed, I suggest softening this conclusion.

    4. Reviewer #3 (Public review):

      Summary:

      Alexander et al. reported the gene-regulatory networks underpinning sex determination of murine primordial germ cells (PGCs) through single-nucleus multiomics, offering a detailed chromatin accessibility and gene expression map across three embryonic stages in both male (XY) and female (XX) mice. It highlights how regulatory element accessibility may precede gene expression, pointing to chromatin accessibility as a primer for lineage commitment before differentiation. Sexual dimorphism in these elements and gene expression increases over time, and the study maps transcription factors regulating sexually dimorphic genes in PGCs, identifying sex-specific enrichment in various transcription factors.

      Strengths:

      The study includes step-wise multiomic analysis with some computational approach to identify candidate TFs regulating XX and XY PGC gene expression, providing a detailed timeline of chromatin accessibility and gene expression during PGC development, which identifies previously unknown PGC subpopulations and offers a multimodal reference atlas of differentiating PGC clusters. Furthermore, the study maps a complex network of transcription factors associated with sex determination in PGCs, adding depth to our understanding of these processes.

      Weaknesses:

      While the multiomics approach is powerful, it primarily offers correlational insights between chromatin accessibility, gene expression, and transcription factor activity, without direct functional validation of identified regulatory networks.

      Comments on revised version:

      The authors have answered my questions and concerns in the revised manuscript and correspondence.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses single nucleus multiomics to profile the transcriptome and chromatin accessibility of mouse XX and XY primordial germ cells (PGCs) at three time-points spanning PGC sexual differentiation and entry of XX PGCs into meiosis (embryonic days 11.5-13.5). They find that PGCs can be clustered into sub-populations at each time point, with higher heterogeneity among XX PGCs and more switch-like developmental transitions evident in XY PGCs. In addition, they identify several transcription factors that appear to regulate sex-specific pathways as well as cell-cell communication pathways that may be involved in regulating XX vs XY PGC fate transitions. The findings are important and overall rigorous. The study could be further improved by a better connection to the biological system, including the addition of experiments to validate the 'omics-based findings in vivo and putting the transcriptional heterogeneity of XX PGCs in the context of findings that meiotic entry is spatially asynchronous in the fetal ovary. Overall, this study represents an advance in germ cell regulatory biology and will be a highly used resource in the field of germ cell development.

      Strengths:

      (1) The multiomics data is mostly rigorously collected and carefully interpreted.

      (2) The dataset is extremely valuable and helps to answer many long-standing questions in the field.

      (3) In general, the conclusions are well anchored in the biology of the germ line in mammals.

      Weaknesses:

      (1) The nature of replicates in the data and how they are used in the analysis are not clearly presented in the main text or methods. To interpret the results, it is important to know how replicates were designed and how they were used. Two "technical" replicates are cited but it is not clear what this means.

      The two independent technical replicates comprised different pools of paired gonads. This sentence was added to the methods section of the revised manuscript.

      (2) Transcriptional heterogeneity among XX PGCs is mentioned several times (e.g., lines 321-323) and is a major conclusion of the paper. It has been known for a long time that XX PGCs initiate meiosis in an anterior-to-posterior wave in the fetal ovary starting around E13.5. Some heterogeneity in the XX PGC populations could be explained by spatial position in the ovary without having to invoke novel subpopulations.

      We thank the reviewer for pointing out this important biological phenomenon. We also recognize that transcriptional heterogeneity among XX PGCs is likely due to the anterior-to-posterior wave of meiotic initiation in E13.5 ovaries and highlight this possibility in our manuscript. However, since our study utilizes single-nucleus RNA-sequencing and not spatial transcriptomics, we are not able to capture the spatial location of the XX PGCs analyzed in our dataset. As such, our analysis applied clustering tools to classify the populations of XX PGCs captured in our dataset. 

      (3) There is essentially no validation of any of the conclusions. Heterogeneity in the expression of a given marker could be assessed by immunofluorescence or RNAscope.

      In our revised manuscript, we included immunofluorescence staining of potential candidate factors involved in PGC sex determination, such as PORCN and TFAP2C. Testing and optimizing antibodies for the targets identified in this study are ongoing efforts in our lab and we look forward to sharing our results with the research community.

      (4) The paper sometimes suffers from a problem common to large resource papers, which is that the discussion of specific genes or pathways seems incomplete. An example here is from the analysis of the regulation of the Bnc2 locus, which seems superficial. Relatedly, although many genes and pathways are nominated for important PGC functions, there is no strong major conclusion from the paper overall.

      In this manuscript, we set out to identify candidate factors, some already known and many others unknown, involved in the developmental pathways of PGC sex determination using computational tools. Our goal, as a research group and with future collaborators, is to screen these interesting candidates and discover their function in the primordial germ cell. Our research, presented in this study, represents a launching pad for which to identify future projects that will investigate these factors in further detail.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Alexander et al describes a careful and rigorous application of multiomics to mouse primordial germ cells (PGCs) and their surrounding gonadal cells during the period of sex differentiation.

      Strengths:

      In thoughtfully designed figures, the authors identify both known and new candidate gene regulatory networks in differentiating XX and XY PGCs and sex-specific interactions of PGCs with supporting cells. In XY germ cells, novel findings include the predicted set of TFs regulating Bnc2, which is known to promote mitotic arrest, as well as the TFs POU6F1/2 and FOXK2 and their predicted targets that function in mitosis and signal transduction. In XX germ cells, the authors deconstruct the regulation of the premeiotic replication regulator Stra8, which reveals TFs involved in meiosis, retinoic acid signaling, pluripotency, and epigenetics among predictions; this finding, along with evidence supporting the regulatory potential of retinoic acid receptors in meiotic gene expression is an important addition to the debate over the necessity of retinoic acid in XX meiotic initiation. In addition, a self-regulatory network of other TFs is hypothesized in XX differentiating PGCs, including TFAP2c, TCF5, ZFX, MGA, and NR6A1, which is predicted to turn on meiotic and Wnt signaling targets. Finally, analysis of PGC-support cell interactions during sex differentiation reveals more interactions in XX, via WNTs and BMPs, as well as some new signaling pathways that predominate in XY PGCs including ephrins, CADM1, Desert Hedgehog, and matrix metalloproteases. This dataset will be an excellent resource for the community, motivating functional studies and serving as a discovery platform.

      Weaknesses:

      My one major concern is that the conclusion that PGC sex differentiation (as read out by transcription) involves chromatin priming is overstated. The evidence presented in the figures includes a select handful of genes including Porcn, Rimbp1, Stra8, and Bnc2 for which chromatin accessibility precedes expression. Given that the authors performed all of their comparisons between XX versus XY datasets at each timepoint, have they missed an important comparison that would be a more direct test of chromatin priming: between timepoints for each sex? Furthermore, it remains possible that common mechanisms of differentiation to XX and XY could be missing from this analysis that focused on sexspecific differences.

      We thank the reviewer for their thoughtful assessment and suggestions, as stated here. We note that chromatin priming in PGCs prior to sex determination is a well-documented research finding (see references below), that is further supported by our single-nucleus multiomics data. To support these findings previously stated in the scientific literature, we included data demonstrating the asynchronous correlation between chromatin accessibility and gene expression during PGC sex determination. Specifically, we investigated the associations of differentially accessible chromatin peaks with differentially expressed gene expression for each PGC type (between sexes and across embryonic stages) using computational tools and methods that are well-established and applied by the research community. In our manuscript, we note that the patterns we identified support the potential role of chromatin priming in PGC sex determination. Nevertheless, we further highlight that a comprehensive profile of 3D chromatin structure and enhancer-promoter contacts in differentiating PGCs is needed to fully understand how changes to chromatin facilitate PGC sex determination.

      References:

      (1) Chen, M., et al. Integration of single-cell transcriptome and chromatin accessibility of early gonads development among goats, pigs, macaques, and humans. Cell Reports 41 (2022).

      (2) Huang, T.-C. et al. Sex-specific chromatin remodelling safeguards transcription in germ cells. Nature 600, 737–742 (2021).

      Reviewer #3 (Public Review):

      Summary:

      Alexander et al. reported the gene-regulatory networks underpinning sex determination of murine primordial germ cells (PGCs) through single-nucleus multiomics, offering a detailed chromatin accessibility and gene expression map across three embryonic stages in both male (XY) and female (XX) mice. It highlights how regulatory element accessibility may precede gene expression, pointing to chromatin accessibility as a primer for lineage commitment before differentiation. Sexual dimorphism in these elements and gene expression increases over time, and the study maps transcription factors regulating sexually dimorphic genes in PGCs, identifying sex-specific enrichment in various transcription factors. Strengths:

      The study includes step-wise multiomic analysis with some computational approach to identify candidate TFs regulating XX and XY PGC gene expression, providing a detailed timeline of chromatin accessibility and gene expression during PGC development, which identifies previously unknown PGC subpopulations and offers a multimodal reference atlas of differentiating PGC clusters. Furthermore, the study maps a complex network of transcription factors associated with sex determination in PGCs, adding depth to our understanding of these processes.

      Weaknesses:

      While the multiomics approach is powerful, it primarily offers correlational insights between chromatin accessibility, gene expression, and transcription factor activity, without direct functional validation of identified regulatory networks.

      As stated in our response above to a similar concern, we note that our research study represents a launching pad for which to identify future projects that will investigate candidates that may be involved in PGC sex determination, in further detail. With this rich dataset in hand, our goal in future research projects is to screen these candidates and discover their function in PGCs. 

      Response to Recommendations

      Reviewer #1 (Recommendations For The Authors):

      (1) Clarify at first introduction how combined ATAC-seq/RNA-seq mulitomics libraries were prepared, including if ATAC and RNA-seq data are from the same cell.

      This information was added to the introduction of the revised manuscript.

      (2) Clarify what the two technical replicates represent. Are they two libraries from the same gonad or the same pool of gonads? Are they from 2 different gonads?

      The two independent technical replicates comprised different pools of paired gonads. This sentence was added to the methods section of the revised manuscript.

      (3) In Supplemental Figure 1, there is substantial variation in the number of unique snATAC-seq fragments between some conditions. Could this create a systematic bias that affects clustering?

      We recognize the concern that substantial variation in the number of unique snATAC-seq fragments between conditions could potentially create a systematic bias that affects clustering. However, we analyzed our snATAC-seq dataset with Signac, which performs term frequency-inverse document frequency (TF-IDF) normalization. This is a process that normalizes across cells to correct for differences in cellular sequencing depth. Given that sequencing depth was taken into account in our normalization and clustering procedures, and that the unbiased clustering of PGCs also reflects the sex and embryonic stage of PGCs, we are confident that the clustering of the snATAC-seq datasets closely reflects the biological variability present in the PGCs collected.

      References:

      Signac Website:  https://stuartlab.org/signac/articles/pbmc_vignette

      Stuart, T., Srivastava, A., Madad, S., Lareau, C. A., & Satija, R. (2021). Single-cell chromatin state analysis with Signac. Nature methods, 18(11), 1333-1341.

      (4) In Figures 2a, 2e, 3a, and 3e, the visualization scheme is very difficult to follow. It's very hard to see the colors corresponding to average expression for many genes because the circles are so small. In addition, the yellow color is hard to see and makes it hard to estimate the size of the circle since the boundaries can be indistinct. I recommend using a different visualization scheme and/or set of size scales be used.

      In Figures 2a, 2e, 3a, and 3e, we chose this color palette to be inclusive of viewers who are colorblind. The chosen colors are visible on both a computer screen and on printed paper. We also included a legend of the color scale and dot size representing the average expression and percent of cells expressing the gene, respectively. If the color cannot be seen, it is because the cell population is not expressing the gene.

      (5) Perform in vivo validation (immunofluorescence or RNAscope) of at least some targets implicated in PGC development by this study.

      Such validations (immunofluorescence staining of PORCN and TFAP2C) are now included in Figure 4 and the supplement.

      (6) In line 351, the authors state that "we observed a strong demarcation between XX and XY PGCs at E12.5-E13.5." But in Figure 1j it looks like a reasonably high fraction of both XX and XY E12.5 cells are in cluster 1, which should mean that there is some overlap.

      While it is true that Figure 1j shows overlap of both XX and XY E12.5 cells in cluster 1, we were commenting on the separation of E12.5 XX (clusters 4 and 5) and E12.5 XY (clusters 8 and 9) PGCs. We have modified the sentence beginning at line 351 to state that the separation between XX and XY PGCs occurs at E13.5.

      (7) In lines 404-405: "We first linked snATAC-seq peaks to XY PGC functional genes". It is important to know how the peaks were linked to genes.

      We added the following sentence to address this comment: “Peak-to-gene linkages were determined using Signac functionalities and were derived from the correlation between peak accessibility and the intensity of gene expression.”

      (8) In Supplemental Figure 5c, the XX E11.5 condition has a substantially higher fraction of ATAC peaks at promoter regions compared to the others. Does this have statistical and biological significance?

      This is an interesting observation beyond the scope of our manuscript. Many interesting questions arise from this study and it is our plan to investigate further in the future. 

      (9) Line 885: "The increased number of DA peaks at E13.5 may be the result of changes to chromatin structure as XX PGCs enter meiotic prophase I"; but in Figure 4b, there's only a modest increase in DAP number from E12.5 to E13.5 in XX PGCs, compared to a massive gain in XY PGCs.

      In our manuscript, we comment on both phenomena: the doubling of differentially accessible peaks in XX PGCs from E12.5 to E13.5 and the massive increase in differentially accessible peaks in XY PGCs from E12.5 to E13.5. In our description of these results, we propose several hypotheses leading to these increases in differentially accessible peaks. As such, it cannot be ruled out that the changes to chromatin structure that occur during meiotic prophase I contribute to the gain in differentially accessible peaks in XX PGCs at E13.5, and we included this statement in the manuscript accordingly.

      Reviewer #2 (Recommendations For The Authors):

      (1) The methods state at line 141 that nuclei with mitochondrial reads of more than 25% were removed, however our understanding from the Bioconductor manual and companion manuscript (Amezquita, R.A., Lun, A.T.L., Becht, E. et al. Orchestrating single-cell analysis with Bioconductor. Nat Methods 17, 137-145 (2020). https://doi.org/10.1038/s41592-019-0654-x) is that snRNA-seq approaches remove mitochondrial transcripts entirely and datasets containing mitochondrial transcripts are thought to feature incompletely stripped nuclei. It is thought that mitochondrial transcripts participating in nuclear import may remain hanging on to the nuclear envelope and get encapsulated into GEMs. If the mitochondrial read cutoff of 25% was used intentionally to keep this potentially contaminating signal, please justify why this was done for this dataset.

      We agree with the reviewer that the presence of mitochondrial transcripts may be potentially contaminating signal. In our preprocessing steps, we removed the mitochondrial genes and transcripts from our datasets so that they would not influence or affect our analyses. The following sentence was added to the methods section on snRNA-seq data processing: “Mitochondrial genes and transcripts were removed from the snRNA-seq datasets to eliminate any potentially contaminating signal.”

      (2) Methods line 227: please include log2fold change and p-adjusted value cutoffs for GO enrichment.

      We used clusterprofiler for our GO enrichment analysis. Our GO enrichment analysis did not include a log2fold change analysis and the p-adjusted value cutoff is stated in the methods.

      (3) Results line 310: the claim that "At E12.5-E13.5, XY PGCs converged onto a single distinct population (cluster 7), indicating less transcriptional diversity among E12.5-E13.5 XY PGCs when compared to E12.5E13.5 XX PGCs (Fig1d)" would be strengthened if the authors quantified transcriptional distance with distance metrics such as euclidean or cosine distance.

      We used a clustering approach to gain insights into the transcriptional diversity of PGC populations. Using an additional metric, such as Euclidean or cosine distance, would not provide meaningful information not already achieved by clustering or change the conclusions presented in the manuscript.

      (4) Results line 317: the authors allude to Lars2 defining clusters 2 & 3 as a marker gene, but it is not clear why this is highlighted until the reader reaches the discussion, which alludes to the published role of Lars2 in reproduction. Please consider moving this sentence to the results section for clarity and perhaps expanding the discussion on the meaning.

      To provide clarity, we added the statement “genes with reported roles in reproduction” to the results section.

      (5) In Figure 2a, why do the authors choose to focus on Zkscan5 in XY PGCs when it is expressed by such a small portion of cells (<25%)? Do they assume that this is due to dropouts?

      We chose to focus on Zkscan5 as an example because of its enriched and differential expression in male PGCs, the motif for Zkscan5 is not enriched in female PGCs, and the reported roles of Zkscan5 in regulating cellular proliferation and growth. Zkscan5 is an example of how candidate genes can be identified for further investigation.

      (6) Line 461: "the population of E13.5 XX PGCs displaying the strongest Stra8 expression levels corresponded to the same population of XX PGCs with the highest module score of early meiotic prophase I genes (Figure 3c; Supplementary Fig. 3a-b)". However did the authors also consider examining the Stra8+ XX PGCs that do not robustly express meiotic genes to understand more about their differentiation potential?

      We are thankful to the reviewer for this suggestion. However, this research question is beyond the scope of the manuscript. We plan to investigate further in future research studies.

      (7) Line 505: "when we searched for the presence of RA receptor motifs in peaks linked to genes related to meiosis and female sex determination, we found that Stra8, Rec8, Rnf2, Sycp1, Sycp2, Ccnb3, and Zglp1 contain the RA receptor motifs in their regulatory sequences (Supplementary Figure 4g)." My read of the text is that the authors are not taking a side on the RA and meiosis controversy, but rather trying to reveal what the data can tell us, and the answer is that there is a strong signature linking RA to meiotic genes, which supports this as a valid biological pathway. But what is the strength of the RA>meiosis pathway compared to other mechanisms (which must be functioning in the triple receptor KO)? Perhaps the authors could take this analysis further with the following questions: (1) ask whether meiotic genes are more enriched in RA motifs compared to other expressed genes or other motifs (2) compare the strength of peak-gene correlations for all peaks containing RA receptor motifs vs. those with peaks for Zglp1, Rnf2, etc binding. The strengths of these correlations could provide clues to how much gene expression varies in response to RA exposure vs. modulation of these other factors and thus tell us something about how much RA is playing a role.

      We agree with the reviewer that this is a very interesting and important question. We also thank the reviewer for their thoughtful suggestions on the types of bioinformatics analyses that could answer this question. However, the section on RA signaling during PGC sex determination is only a small part of the manuscript and would be better analyzed in greater detail in a future research study or publication.

      (8) The shift from promoters in E11.5 XX PGCs to distal intergenic regions is fascinating. What can we learn about epigenetic reprogramming/methylation changes across gene bodies? 

      We agree with the reviewer that this is an interesting question about gene regulation in E11.5 XX PGCs. However, we prefer to analyze the epigenetic reprogramming changes across gene bodies in this cell population in additional research studies. Our purpose and goal for this section was to link differentially accessible chromatin peaks with differentially expressed genes to identify putative gene regulatory networks.

      (9) Line 581: why did the authors choose to highlight and validate PORCN1 in PGCs? Please elaborate.

      As stated in the manuscript, we chose to highlight and validate PORCN1 in PGCs because of its role in WNT signaling and because of the visibly strong correlation between chromatin accessibility at the XXenriched DAP in Fig. 4c (dashed box) and and gene expression of PORCN1.

      (10) Figure 5f would be easier to interpret if presented as two columns rather than a circle; show one line of the proteins and the other line with the transcripts so that each is on the same line and there are connections between them.

      This comment is related to stylistic preferences. The purpose of Fig. 5f is to demonstrate that the candidate transcription factors may regulate the expression of other enriched transcription factors. Figure 5f figure accomplishes this goal.

      (11) Line 640: "The predicted target genes of TCFL5 totaled 74% (367/494) of all DEGs with peak-to-gene linkages in XX PGCs". This seems like a high number and a lot of work for just TCFL5; given the overlap between other TFs and target genes, how many of these 367 target genes overlap with other TFs?

      We agree with the reviewer that this is an important declaration to make. We added the following sentence to the results section on TCFL5: “A large majority of the predicted target genes of TCFL5 were also predicted to be the target genes of the enriched TFs presented in Fig. 5e, e.g., the predicted target genes of these TFs overlapped with 4%-100% of the predicted target genes of TCFL5.”

      (12) The presentation of TCFL5 in the results section would make more sense with the additional mention of reproductive phenotypes already known (currently in the discussion Lines 914-917). I would furthermore suggest that the discussion goes into more depth on the difference between the regulatory network of TCFL5 in XX meiosis vs XY.

      We thank the reviewer for this comment, however, we already state in the results section that TCFL5 is known to influence XX PGC sex determination.

      (13) In the Methods, please state more clearly for those not familiar that the genetic background of mice is mixed.

      We described the mice with their official names, which provides the context of their genetic backgrounds.

      (14) Please specify which morphologic criteria were used to verify the stage of embryos in the methods.

      We added the following text to the methods section of the revised manuscript: “Plug date was used to determine the stage of embryos collected for single-nucleus RNA-seq and ATAC-seq. The stage of E11.5 embryos was confirmed by counting somites. The stage of embryos collected at E12.5 was confirmed by the morphological presence of the vessel and cords of the testes collected from XY embryos. Similarly, we confirmed the stage of embryos collected at E13.5 by the size of the gonads, the presence of more distinct cords in the testes of XY embryos, and the elongation of the ovaries of XX embryos.”

      (15) The total number of cells and PGCs that passed QC and are included in UMAPS should be stated.

      The requested information was added to the legend for Fig. 1 of the revised manuscript: “The number of PGCs per sex and embryonic stage are: 375 E11.5 XX PGCs; 1,106 E12.5 XX PGCs; 750 E13.5 XX PGCs; 110 E11.5 XY PGCs; 465 E12.5 XY PGCs; and 348 E13.5 XY PGCs.”

      (16) The order of timepoints changes between figures, and this is not for any obvious reason. Please make it consistent. Figures 1 and 6 list XX 11.5, 12.5, 13.5, and the same for XY, but Figures 2, 3, and 4 use the reverse order: XY E13.5, E12.5, E11.5, and then XX. 

      We thank the reviewer for this comment. However, we chose this order for each of the figures to match the coordinates of the graphs and where we would expect the reader to begin reading the graph first. For example, in Figure 3a, XX E11.5 is closest to the x-axis and would be expected to be read first.   

      (17) In Figure S2 the colors of clusters are hard to distinguish, and it is suggested that the cluster numbers should be listed above each colored bar to avoid frustration.

      We made the suggested correction to Figure S2.

      (18) In Figures 2e and 3e: what do the dashed boxes indicate?

      The dashed boxes are to guide the reader’s eyes to the fact that the order of transcription factors/genes under the Cistrome DB regulatory potential score and gene expression plots are the same.

      (19) In Figure 5a: break panels into i-iv so that the in-text call-outs are not all the same.

      We made the suggested correction to Figure 5a and modified the in-text call-outs.

      (20) Please indicate XX in Figure 5e and XY in Figure 5l.

      We made the suggested correction to Figure 5e and 5l.

      (21) In Figure S5c: Please reorganize DA chromatin peak charts so that columns are XX and XY with rows at the same timepoint.

      We made the suggested correction to Figure S5c.

      (22) In Figure S7a: please make images larger so that the overlapping expression of PORCN and TRA98 is more visible, and consider adding a more magnified panel.

      This image is now included in the main text, with expanded panels.

      (23) Line 742-754: this seems like a long introduction for the results section; please consider tightening it up.

      We believe this text is important and necessary to provide context to the bioinformatics analyses of cell signaling pathways in PGCs. Not all readers will be familiar with the ligand-receptor signals between gonadal support cells and PGCs, and this text provides details on which signaling pathways are known to direct sex determination of PGCs.

      (24) For UMAP plots in Figures 2c, 3c, S3b, and S4b, the text overlaid with the timepoints and sexes onto the UMAP plots is misleading, as it allows the reader to presume that the entire group of cells for a given sex/timepoint is located in the location of the text overlay. However, from the UMAP plots in Figure 1i-j, it is clear that the cells from a given sex/timepoint are actually spread across multiple identified clusters. Thus, the overlaid text obscures the important heterogeneity detected. To better represent the actual locations on the UMAP plot of cells from each sex/timepoint, it would be better to show inset density plots alongside these UMAP plots so the reader can locate the cells for themselves. 

      We thank the reviewer for this comment. However, we chose this formatting to offer simplicity and ease of understanding to our UMAPs in addition to highlighting the general biological patterns of gene expression. If the reader is interested in discerning more of the heterogeneity of the UMAPs, they may refer back to Figure 1.

      Reviewer #3 (recommendations for the authors):

      There are some errors or places that need clarification or corrections:

      (1) Figure 1f, according to the graph, it should be 8 clusters, not 9.

      There are 9 clusters because the numbering for the clusters start at ‘0’.

      (2) Why did cluster 8 have so many different states of cells from both sexes?

      The identification of cluster 8 is likely an artifact of sequencing, and would require several different analyses to figure out why cluster 8 has many different states of cells from both sexes. While this will address a technical issue associated with the dataset, this will not change any major conclusions of the study.

      (3) Figure 1i, shouldn't that be ten instead of eleven?

      There are 11 clusters because the numbering for the clusters start at ‘0’.

      (4) Figure 2a, zkscan expression level comparison was not so obvious as the bubble size was small. How many folds of differences from xx pgc?

      There is a 1.5 fold increase in the expression of Zkscan5 between XY and XX PGCs at E13.5. We included this information in the revised manuscript.

    1. eLife Assessment

      In this useful study, the authors tested a novel approach to eradicate the HIV reservoir by constructing a herpes simplex virus (HSV)-based therapeutic vaccine designed to reactivate HIV from latently infected cells and induce an immune response to kill such infected cells. Testing this approach with SIV in a primate model, the authors report that the SIV reservoir was reduced. However, the evidence presented appears to be incomplete because the animal group size was small and the SIV reservoir size highly variable.

    2. Reviewer #1 (Public review):

      Summary:

      Authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NFkappaB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); this is acknowledged by the authors that no full mechanistic explanation can be given at this moment.

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? A RNA seq analysis is required to show the effect on cellular genes.

      A RNA seq analysis was done in the revised manuscript comparing the effect of HSV-1 and deleted vector in J-LAT cells (Fig S5). More than 2000 genes are upregulated after transduction with the modified vector in comparison with the WT vector. Hence, the specificity of upregulation of SIV genes is questioned. Authors do NOT comment on these findings. In my view it questions the utility of this approach.

      (3) The primate groups are too small and the results to variable to make averages. In Fig 5, the group with ART and saline has two slow rebounders. It is not correct to average those with the single quick rebounder. Here the interpretation is NOT supported by the data.

      Although authors provided some promising SIV DNA data, no additional animals were added. Groups of 3 animals are too small to make any conclusion, especially since the huge variability in response. The average numbers out of 3 are still presented in the paper, which is not proper science.

      No data are given of the effect of the deletion in primates. Now the deleted construct is compared with an empty vector containing no SIV genes. Authors provide new data in Fig S2 on the comparison of WT and modified vector in cells from PLWH, but data are not that convincing. A significant difference in reactivation is seen for LTR in only 2/4 donors and in Gag in 3/4 donors. (Additional question what is meaning of LTR mRNA, do authors relate to genomic RNA??)

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      The RNA seq data add on to this worry and should at least be discussed.

      Comments on revisions:

      The authors accept the limitations of the primate study (too small for strong conclusions). The new way of presenting the data clearly shows these limitations.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NFkappaB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); this is acknowledged by the authors that no full mechanistic explanation can be given at this moment.

      Thank you for your comments. We agree with you that the mechanism underlying increased reactivation by deleting ICP34.5 is only partially explored. As you pointed out, the deletion of ICP34.5 leads to a significant reactivation, while the overexpression of ICP34.5 has a relatively weak inhibitory effect on reactivation. This difference prompts us to further contemplate the role of HSV-1 in regulating HIV latency and reactivation. Our data (Figure S4), along with previous literature (Mosca et al., 1987, Nabel et al., 1988), have indicated that the ICP0 protein might play a crucial role in the reactivation of HIV latency. However, we found for the first time that ICP34.5 can play an antagonistic role with this reactivation. This is a very interesting topic for understanding the complicated interactions between host cells and different viruses. We will investigate the deeper insights in future studies, and we have mentioned this limitation in the revised Discussion Section. Thank you!

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? A RNA seq analysis is required to show the effect on cellular genes.

      A RNA seq analysis was done in the revised manuscript comparing the effect of HSV-1 and deleted vector in J-LAT cells (Fig S5). More than 2000 genes are upregulated after transduction with the modified vector in comparison with the WT vector. Hence, the specificity of upregulation of SIV genes is questioned. Authors do NOT comment on these findings. In my view it questions the utility of this approach.

      Thank you for your mentions.

      (1) As for the toxicity of HSV-ΔICP34.5, it is well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and thus deleting ICP34.5 is beneficial to improve the safety of HSV-based constructs. As expected, we have demonstrated experimentally that HSV-DICP34.5 exhibited lower virulence and replication ability than wild-type HSV-1 (Figure S1). Importantly, we also observed a significant decrease in the expression of inflammatory factors in PWLH when compared to wild-type HSV-1 (Figure 1I-K). These data suggested that the safety of HSV-DICP34.5 should be more tolerable than wild-type HSV vector.

      (2) The RNASeq analysis is aimed to explore the HSV-ΔICP34.5-induced signaling pathways, but it is not suitable to use this data for assessing the toxicity of HSV-ΔICP34.5 constructs. As for the RNASeq data, we think it is reasonable to observe many upregulated genes (which are involved in a variety of signaling pathways), since HSV-DICP34.5 constructs reactivated HIV latency more effectively than wild-type HSV by modulating the IKKα/β-NF-kB pathway and PP1-HSF1 pathway.

      (3) To further validate whether HSV-ΔICP34.5 can specifically activate the HIV latent reservoir, we conducted additional experiments using vaccinia virus and adenovirus as controls, and results showed that both vaccinia virus and adenovirus cannot effectively reactivate HIV latency (Figure S3). Moreover, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1, and overexpressing ICP0 greatly reactivate the latent HIV (Figure S4, Figure S5), implying that this reactivation should be virus-specific and ICP0 plays an important factor on reversing HIV latency. Interestingly, we herein found that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. Our research group will investigate the underlying mechanism in future studies. Thank you for your insightful mention.

      (3) The primate groups are too small and the results to variable to make averages. In Fig 5, the group with ART and saline has two slow rebounders. It is not correct to average those with the single quick rebounder. Here the interpretation is NOT supported by the data.

      Although authors provided some promising SIV DNA data, no additional animals were added. Groups of 3 animals are too small to make any conclusion, especially since the huge variability in response. The average numbers out of 3 are still presented in the paper, which is not proper science.

      No data are given of the effect of the deletion in primates. Now the deleted construct is compared with an empty vector containing no SIV genes. Authors provide new data in Fig S2 on the comparison of WT and modified vector in cells from PLWH, but data are not that convincing. A significant difference in reactivation is seen for LTR in only 2/4 donors and in Gag in 3/4 donors. (Additional question what is meaning of LTR mRNA, do authors relate to genomic RNA??)

      Thank you for your serious review and kind reminder.

      (1) We agree with you that it is not appropriated to use averages for this pilot study with limited numbers of macaques. We are currently unable to conduct another experiment with a larger number of macaques, but we think the results of this pilot study were very promising for further studies. Now, following your kind suggestions, we have removed the averages and now presented the data for each monkey individually in the revised manuscript. We have also modified the corresponding description accordingly (Line 254 to 262). Thank you for your understanding.

      (2) Regarding your comment about the lack of data on the deletion of ICP34.5 from HSV-1, we are sorry for previously unclear description. In fact, the empty vector used in our animal experiments not only does not contain SIV antigens but also has the ICP34.5 deletion. We have revised the corresponding description accordingly (For example, we use HSV-DICP34.5DICP47-empty, HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv instead of HSV-empty, HSV-sPD1-SIVgag/SIVenv). We hope this revision will address your question.

      (3) As for the reactivation effects observed in PLWH samples, the data may be not perfect, but we think this result (a significant difference in reactivation is seen for LTR in 2/4 donors and for Gag in 3/4 donors, and the purpose of detecting LTR RNA is to evaluate the level of virus replication) is promising to support our conclusion (The enhanced reactivation effect in primary CD4+ T cells by HSV-∆ICP34.5 than wild-type HSV). Of course, we recognize the need for more samples to gain a comprehensive understanding of reactivation effect in different individuals in future study. In addition, we corrected the description of LTR RNA (Lines 99-106 and 115-116). Thank you for the reminder!

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      The RNA seq data add on to this worry and should at least be discussed.

      Thank you for your mention. As mentioned above, the RNASeq analysis is aimed to explore the HSV-ΔICP34.5-induced signaling pathways, but it is not suitable to use this data for assessing the toxicity of HSV-ΔICP34.5 constructs. Actually, ICP34.5 is a neurotoxicity factor that can antagonize innate immune responses, and thus ICP34.5 deletion is beneficial to improve the safety of HSV-based constructs. As expected, our data have demonstrated experimentally that HSV-DICP34.5 exhibited lower virulence and replication ability than wild-type HSV-1 (Figure S1). Importantly, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5H) and body weight (Figure S10) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv group (Figure S11). These data suggested that the safety of HSV-DICP34.5 should be more tolerable than wild-type HSV vector. We have added a more comprehensive description in the revised Discussion (Lines 328-334). Thank you again for all of your kind comments and suggestions.

      Reviewer #2 (Public review):

      Summary:

      In this article Wen et. al., describe the development of a 'proof-of-concept' bi-functional vector based out of HSV-deltaICP-34.5's ability to purge latent HIV-1 and SIV genomes from cells. They show that co-infection of latent J-lat T-cell lines with a HSV-deltaICP-34.5 vector can reactivate HIV-1 from a latent state. Over- or stable expression of ICP 34.5 ORF in these cells can arrest latent HIV-1 genomes from transcription, even in the presence of latency reversal agents. ICP34.5 can co-IP with- and de-phosphorylate IKKa/b to block its interaction with NF-k/B transcription factor. Additionally, ICP34.5 can interact with HSF1 which was identified by mass-spec. Thus, the authors propose that the latency reversal effect of HSV-deltaICP-34.5 in co-infected JLat cells is due to modulatory effects on the IKKa/b-NF-kB and PP1-HSF-1 pathway.

      Next the authors cleverly construct a bifunctional HSV based vector with deleted ICP34.5 and 47 ORFs to purge latency and avoid immunological refluxes, and additionally expand the application of this construct as a vaccine by introducing SIV genes. They use this 'vaccine' in mouse models and show the expected SIV-immune responses. Experiments in rhesus macaques (RM), further elicit potential for their approach to reactivate SIV genomes and at the same time block their replication by antibodies. What was interesting in the SIV experiments is that the dual-functional vector vaccine containing sPD1- and SIV Gag/Env ORFs effectively delayed SIV rebound in RMs and in some cases almost neutralized viral DNA copy detection in serum. Very promising indeed, however there are some questions I wish the authors explored to answer, detailed below.

      Overall, this is an elegant and timely work demonstrating the feasibility of reducing virus rebound in animals, and potentially expand to clinical studies. The work was well written, and sections were clearly discussed.

      Strengths:

      The work is well designed, rationale explained and written very clearly for lay readers.

      Claims are adequately supported by evidence and well designed experiments including controls.

      We appreciate your positive comment for our work.

      Weaknesses:

      (1) It looks like ICP0 is also involved in latency reversal effects. More follow-up work will be required to test if this is in fact true.

      Both our data (Figure S4, Figure S5) and previous literature (Nabel et al., 1988, Mosca et al., 1987) have reported that HSV ICP0 may play a role in reversing HIV latency. However, the exact mechanisms behind this effect have not yet been fully elucidated. Of note, we herein reported for the first time that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. Our research group will investigate the underlying mechanism in future studies. Thank you for your insightful mention.

      (2) It is difficult to estimate the depletion of the latent viral reservoir. The authors have tried to address this issue. A more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility to obtain such a result is not clearly demonstrated.

      Thank you for your comment. As you mentioned, we have indeed measured both total DNA and integrated DNA (iDNA) in blood cells (see Figure 5E-F), which can provide support for the reduction of the latent viral reservoir. Thank you for your kind reminder.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimen taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect higher level of viral loads being released in response to the vaccine in question.

      Thank you for your valuable suggestion. We believe that the reduced virus rebound observed may be influenced by immune responses from T cells and antibodies induced by both ART and the vaccine. We appreciate your insight and agree that future studies should focus on investigating the activation effects of the vaccine under controlled conditions that simulate the absence of immune responses in primary animal cells. This will help us better understand the mechanisms involved and address your concerns more comprehensively.

      Reviewer #2 (Recommendations for the authors):

      The Authors have sufficiently addressed my comments. Below are a few minor changes that can help with clarity.

      Lines 126-127: This sentence should be changed. Perhaps, "these data suggests that .... Safety of... in PLWH might be tolerable, at least in vitro."

      Thanks for your suggestion. We have revised it accordingly. (Line 130).

      Lines 128-132: Would this not mean that reactivation is due to ICP0 gene? Have the authors tried to express ICP0-gene into J-Lat cells and see if that is the reason for reactivation? This seems somewhat incomplete. At the end of 132, please add ", in the presence of ICP0". Also a sentence describing this effect is warranted.

      Thank you for your insightful suggestion. Yes, both our data and previous literature supported that the ICP0 gene can play a significant role in the reactivation of HIV latency (Figure S4, Figure S5). Of note, we herein reported for the first time that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. We have described this effect in the revised version accordingly. Additionally, we have added the phrase “in the presence of ICP0” to the results section (Lines 137) to clarify this point.

      MOSCA, J. D., BEDNARIK, D. P., RAJ, N. B., ROSEN, C. A., SODROSKI, J. G., HASELTINE, W. A., HAYWARD, G. S. & PITHA, P. M. 1987. Activation of human immunodeficiency virus by herpesvirus infection: identification of a region within the long terminal repeat that responds to a trans-acting factor encoded by herpes simplex virus 1. Proc Natl Acad Sci U S A 84:  7408.DOI: https://doi.org/10.1073/pnas.84.21.7408, PMID: 2823260

      NABEL, G. J., RICE, S. A., KNIPE, D. M. & BALTIMORE, D. 1988. Alternative mechanisms for activation of human immunodeficiency virus enhancer in T cells. Science 239:  1299.DOI: https://doi.org/10.1126/science.2830675, PMID: 2830675

    1. eLife Assessment

      This valuable paper describes the stiffness of meiotic chromosomes in both oocytes and spermatocytes. The authors identify differences in stiffness between meiosis I and II chromosomes, as well as an age-dependent increase in stiffness in meiosis I (and meiosis II) chromosomes, results that are highly significant for the field of chromosome biology. The report is, however, mostly descriptive and the mechanisms underlying age-dependent changes in chromosome stiffness remain unclear. The evidence suggesting that changes in stiffness are independent of cohesin, which is known to deteriorate with age, is still incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      By using the biophysical chromosome stretching, the authors measured the stiffness of chromosomes of mouse oocytes in meiosis I (MI) and meiosis II (MII). This study was the follow-up of previous studies in spermatocytes (and oocytes) by the authors (Biggs et al. Commun. Biol. 2020: Hornick et al. J. Assist. Rep. and Genet. 2015). They showed that MI chromosomes are much stiffer (~10 fold) than mitotic chromosomes of mouse embryonic fibroblast (MEF) cells. MII chromosomes are also stiffer than the mitotic chromosomes. The authors also found that oocyte aging increases the stiffness of the chromosomes. Surprisingly, the stiffness of meiotic chromosomes is independent of meiotic chromosome components, Rec8, Stag3, and Rad21L. and aging increases the stiffness.

      Strengths

      This provides a new insight into the biophysical property of meiotic chromosomes, that is chromosome stiffness. The stiffness of chromosomes in meiosis prophase I is ~10-fold higher than that of mitotic chromosomes, which is independent of meiotic cohesin. The increased stiffness during oocyte aging is a novel finding.

      Weaknesses:

      A major weakness of this paper is that it does not provide any molecular mechanism underlying the difference between MI and MII chromosomes (and/or prophase I and mitotic chromosomes).

      Comments on revisions:

      The main text lacks the first page with the authors' names and their affiliations (and corresponding authors etc).

    3. Reviewer #2 (Public review):

      Initial Review:

      This paper reports investigations of chromosome stiffness in oocytes and spermatocytes> the paper shows that prophase I spermatocytes and MI/MII oocytes yield high Young Modulus values in the assay the authors applied. Deficiency in each one of three meiosis-specific cohesins they claim did not affect this result and increased stiffness was seen in aged oocytes but not in oocytes treated with the DNA-damaging agent etoposide.

      The paper reports some interesting observations which are in line with a report by the same authors of 2020 where increased stiffness of spermatocyte chromosomes was already shown. In that sense, it the current manuscript is an extension of that previous paper and thus novelty is somewhat limited. The paper is also largely descriptive as it does neither propose mechanism nor report factors that determine the chromosomal stiffness.

      There are several points that need to be considered.

      Limitations of the study and the conclusions are not discussed in "Discussion"; that's a significant gap. Even more so as the authors rely on just one experimental system for all their data - no independent verification - and that in vitro system may be prone to artefacts.

      It is somewhat unfortunate that they jump between oocytes and spermatocytes to address the cohesin question. Prophase I (pachytene) spermatocytes chromosomes are not directly comparable to MI or MII oocyte chromosomes. In fact, the authors report Young Modulus values of 3700 for MI oocytes and only 2700 for spermatocyte prophase chromosomes, illustrating this difference. Why not using oocyte-specific cohesin deficiencies?

      It remains unclear whether the treatment of oocytes with the detergent TritonX-100 affects the spindle and thus the chromosomes isolated directly from the Triton-lysed oocytes. In fact, it is rather likely that the detergent affects chromatin-associated proteins and thus structural features of the chromosomes.

      Why did the authors use mouse strains of different genetic background, CD-1 and C57BL/6? That makes comparison difficult. Breeding of heterozygous cohesin mutants will yield the ideal controls, i.e. littermates.

      How did the authors capture chromosome axes from STAG3-deficient spermatocytes which feature very little if any axes? How representative are those chromosomes that could be captured?

      Line 135: that statement is not substantiated; better to show retraction data and full reversibility.

      Line 144: the authors claim that the Young Modulus of MII oocytes is "slightly" higher than that of mitotic cells (MEFs). Well, "slightly" means it is rather similar and therefore the commonly used statement that MII is similar to mitosis is OK - contrary to the authors claim.

      There are a lot of awkward sentences in this text. Some sentences lack words, are not sufficiently precise in wording and/or logic, and there are numerous typos. Some examples can be found in lines 89 (grammar), 94, 95 ("looked"), 98, 101 ("difference" - between what?), and some are commonplaces or superficial (lines 92/93, 120..., ). Occasionally the present and past tense are mixed (e.g. in M&M). Thus the manuscript is quite badly written.

      Comments on revisions:

      In their revised paper, Liu et al have addressed a number of my concerns and thus the paper is clearly improved in several details, e.g. in showing a control for a potential effect of the detergent (new supplies. fig. 5). Other points were not sufficiently addressed though.

      I remain sceptical about using mice of a substantially different genetic background (CD1) as controls in the analysis of the cohesin mutants (C57BL/6). The argument that C57BL/6 yield smaller litter size is, frankly, ridiculous. Hundreds of labs worldwide extensively and successfully work with C57BL/6. Further, the paper Liu et al. cite to argue that there are no (or minor) differences in chromosome structure (Biggs et al., 2020, which is from the same lab) of the two mouse strains deals with spermatocyte chromosomes only. Nothing there on oocyte chromosomes. And there is no direct comparison within the same experimental setting since in Biggs et al only C57BL/6 is used (sic!). Thus, this is not a convincing argument. It would also be reassuring to see an independent reference directly comparing different genetic backgrounds (authors may have a look at older papers of Pat Hunt/Terry Hassold where they may find some data). In my experience, differences in genetic background do play a very clear role in meiosis, e.g. in the timing of juvenile spermatogenesis, in the onset of puberty, in the kinetics of oocyte maturation, in the success of PBE, and in biophysical properties as seen in the stability of oocytes during experimental handling. In fact, the authors themselves indicate differences in reproduction by stating the low litter size of C57BL/6. Thus, I strongly advise carrying out at least a few key experiments using C57BL/6 control mice (which can very easily and cheaply be obtained from vendors; the authors have used C57BL/6 wt before - see their 2020 paper).

      The answer to my question #5 is not really satisfactory. I asked specifically how the authors isolated the very small chromosomes from Stag3-/- spermatocytes, where the axes are almost non-existing. The authors refer to suppl. fig. 3, but that shows isolation from Rec8-/- spermatocytes, which still have nicely visible, well-formed, shortened axes. Suppl. fig. 4 shows this for Rad21l-/-. Why not show this for the Stag3-/-, which in this respect is the most critical and difficult, and specifically answer my question?

      The overall criticism of the lack of conceptual novelty of the basic message of the paper and of very little if any insights into the mechanisms and factors determining the changes in chromosome stiffness remains.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      By using the biophysical chromosome stretching, the authors measured the stiffness of chromosomes of mouse oocytes in meiosis I (MI) and meiosis II (MII). This study was the follow-up of previous studies in spermatocytes (and oocytes) by the authors (Biggs et al. Commun. Biol. 2020: Hornick et al. J. Assist. Rep. and Genet. 2015). They showed that MI chromosomes are much stiffer (~10 fold) than mitotic chromosomes of mouse embryonic fibroblast (MEF) cells. MII chromosomes are also stiffer than the mitotic chromosomes. The authors also found that oocyte aging increases the stiffness of the chromosomes. Surprisingly, the stiffness of meiotic chromosomes is independent of meiotic chromosome components, Rec8, Stag3, and Rad21L. with aging.

      Strengths:

      This provides a new insight into the biophysical property of meiotic chromosomes, that is chromosome stiffness. The stiffness of chromosomes in meiosis prophase I is ~10-fold higher than that of mitotic chromosomes, which is independent of meiotic cohesin. The increased stiffness during oocyte aging is a novel finding.

      Weaknesses:

      A major weakness of this paper is that it does not provide any molecular mechanism underlying the difference between MI and MII chromosomes (and/or prophase I and mitotic chromosomes).

      We acknowledge that our study does not provide a comprehensive explanation for the stage-related alterations in chromosome stiffness; however, we believe that the observation of these changes is itself of broad interest. Initially, we hypothesized that DNA damage or depletion of meiosis-specific cohesin might contribute to the observed increase in chromosome stiffness. However, our experimental finding did not support these hypotheses, indicating that neither DNA damage nor cohesion depletion is responsible for the stiffness increase. The molecular basis underlying the stage-related stiffness increase remains elusive and requires exploration in future studies. In the Discussion, we propose that factors such as condensin, nuclear proteins, and histone methylation may play a role in regulating meiotic chromosome stiffness. The involvement of these factors in stage-related chromosome stiffening requires future investigation.

      Reviewer #2 (Public Review):

      This paper reports investigations of chromosome stiffness in oocytes and spermatocytes. The paper shows that prophase I spermatocytes and MI/MII oocytes yield high Young Modulus values in the assay the authors applied. Deficiency in each one of three meiosis-specific cohesins they claim did not affect this result and increased stiffness was seen in aged oocytes but not in oocytes treated with the DNA-damaging agent etoposide.

      The paper reports some interesting observations which are in line with a report by the same authors of 2020 where increased stiffness of spermatocyte chromosomes was already shown. In that sense, the current manuscript is an extension of that previous paper, and thus novelty is somewhat limited. The paper is also largely descriptive as it does neither propose a mechanism nor report factors that determine the chromosomal stiffness.

      There are several points that need to be considered.

      (1) Limitations of the study and the conclusions are not discussed in the "Discussion" section and that is a significant gap. Even more so as the authors rely on just one experimental system for all their data - there is no independent verification - and that in vitro system may be prone to artefacts.

      Our experimental system has been used to study different types of chromosome stiffness as well as nuclear stiffness.  We have compared our results with previously published data and found the data is consistent across different experiments. To address the reviewer’s concern, we describe the limitations of our in vitro experimental approach in the Discussion section.

      (2) It is somewhat unfortunate that they jump between oocytes and spermatocytes to address the cohesin question. Prophase I (pachytene) spermatocytes chromosomes are not directly comparable to MI or MII oocyte chromosomes. In fact, the authors report Young Modulus values of 3700 for MI oocytes and only 2700 for spermatocyte prophase chromosomes, illustrating this difference. Why not use oocyte-specific cohesin deficiencies?

      In this study, our goal was to investigate the mechanism underlying the increased chromosome stiffness observed during prophase I. Ideally, we would have compared wild-type and cohesin-deleted mouse oocytes at the metaphase I (MI) stage. However, experimental constraints made this approach unfeasible: spermatocytes and oocytes from  Rec8<sup>-/-</sup> and  Stag3<sup>-/-</sup> mutant mice cannot reach MI stage, and  Rad21l<sup>-/-</sup> mutant mice are sterile in males and subfertile in females, because cohesin proteins are crucial for germline cell development.

      Additionally, collecting prophase I chromosomes from oocytes is exceptionally challenging and requires fetal mice as prophase I oocyte sources because female oocytes progress to the diplotene stage during fetal development. The process is further complicated by the difficulty of genotyping fetal mice, making the study of female prophase I impracticable. By contrast, spermatocytes are continuously generated in males throughout life, with meiotic stages readily identifiable, making them more accessible for analysis.

      Our findings consistently showed increased chromosome stiffness in both prophase I spermatocytes and MI oocytes, suggesting that the phenomenon is not sex-specific. This observation implies that similar effects on chromosome stiffness may occur across meiotic stages, from prophase I to MI.

      (3) It remains unclear whether the treatment of oocytes with the detergent TritonX-100 affects the spindle and thus the chromosomes isolated directly from the Triton-lysed oocytes. In fact, it is rather likely that the detergent affects chromatin-associated proteins and thus structural features of the chromosomes.

      Regarding the use of Triton X-100, it is important to emphasize that the concentration used (0.05%) is very low and unlikely to significantly affect chromosome stiffness. To support this assertion, we have provided additional evidence in the revised manuscript demonstrating that this low concentration of Triton X-100 has a negligible effect on chromosome stiffness (Supplement Fig. 5, Right panel).

      (4) Why did the authors use mouse strains of different genetic backgrounds, CD-1, and C57BL/6? That makes comparison difficult. Breeding of heterozygous cohesin mutants will yield the ideal controls, i.e. littermates.

      The genetic mutant mice, all in a C57BL/6 background, were generously provided by Dr. Philip Jordan and delivered to our lab. As our lab does not currently maintain C57BL/6 colony and given that this strain typically produces small litter sizes - which would have complicated the remainder of the study - we chose CD-1 mice as the control group and used C57BL/6 mice specifically for the cohesin study. To address potential concerns regarding genetic background differences, we compared our results with previously published data from C57BL/6 mice and found no significant differences (2710 ± 610 Pa versus 3670 ± 840 Pa, P= 0.4809) (Biggs et al., 2020). Furthermore, prophase I spermatocytes from CD-1 mice showed no significant difference compared to any of the three cohesin-deleted C57BL/6 mutant mice, suggesting that chromosome stiffness is not significantly influenced by genetic background.

      (5) How did the authors capture chromosome axes from STAG3-deficienct spermatocytes which feature very few if any axes? How representative are those chromosomes that could be captured?

      We isolated chromosomes from prophase I mutant spermatocytes, which were identified by their large size, round shape, and thick chromosomal threads - characteristics indicative of advanced condensation and a zygotene-like stage during prophase I (Supplemental Fig. 3). The methodology for isolating these chromosomes has been described in details in our previous publication (Biggs et al., 2020), which is referenced in the current manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Understanding the mechanical properties of chromosomes remains an important issue in cell biology. Measuring chromosome stiffness can provide valuable insights into chromosome organization and function. Using a sophisticated micromanipulation system, Liu et al. analyzed chromosome stiffness in MI and MII oocytes. The authors found that chromosomes in MI oocytes were ten-fold stiffer than mitotic ones. The stiffness of chromosomes in MI mouse oocytes was significantly higher than that in MII oocytes. Furthermore, the knockout of the meiosis-specific cohesin component (Rec8, Stag3, Rad21l) did not affect meiotic chromosome stiffness. Interestingly, the authors showed that chromosomes from old MI oocytes had higher stiffness than those from young MI oocytes. The authors claimed this effect was not due to the accumulated DNA damage during the aging process because induced DNA damage reduced chromosome stiffness in oocytes.

      Strengths:

      The technique used (isolating the chromosomes in meiosis and measuring their stiffness) is the authors' specialty. The results are intriguing and informative to the chromatin/chromosome and other related fields.

      Weaknesses:

      (1) How intact the measured chromosomes were is unclear.

      Currently, a well-calibrated chromosome mechanics experiment requires the extracellular isolation of chromosomes. In experiments conducted parallel to those in our previous study (Biggs et al., 2020), we obtained quantitatively consistent results, including measurements of the Young modulus for prophase I spermatocyte chromosomes.  Our isolation approach is significantly gentler than bulk methods that rely on hypotonic buffer-driven cell lysis and centrifugation. If substantial chromosomal damage had occurred during isolation, we would expect greater variation between experiments, as different amounts or types of damage could influence the results. 

      (2) Some control data needs to be included.

      We used wild-type prophase I spermatocytes and metaphase I (MI) oocytes as controls. To validate our findings, we compared some of our results with those reported in a previous study and observed consistent outcomes (Biggs et al., 2020).

      (3) The paper was not well-written, particularly the Introduction section.

      We have revised the paper and improved the overall quality of the manuscript.

      (4) How intact were the measured chromosomes? Although the structural preservation of the chromosomes is essential for this kind of measurement, the meiotic chromosomes were isolated in PBS with Triton X-100 and measured at room temperature. It is known that chromosomes are very sensitive to cation concentrations and macromolecular crowding in the environment (PMID: 29358072, 22540018, 37986866). It would be better to discuss this point.

      As suggested, we investigated the impact of PBS and Triton X-100 on chromosome stiffness. Our findings indicate that neither PBS nor Triton X-100 caused significant changes in chromosome stiffness (Supplemental Fig. 5).

      Recommendations For The Authors:

      Major points of Reviewers that the Editor indicated should be addressed

      (1) Reviewer's point 3, the effect of the high concentration of etoposide: It would be advisable to use lower concentrations of etoposide to observe the effect of DNA damage on chromosome stiffness more accurately.

      The effect of etoposide on oocyte is dose-dependent (Collins et al., 2015). Oocytes are generally not highly sensitive to DNA damage, and even at relatively high concentrations, not all may exhibit a response. To ensure that sufficient DNA damage in the oocytes we isolated, we used relatively high concentration of etoposide for the experiment. This concentration (50 μg/ml) falls within the typical range reported in the literature (Marangos and Carroll, 2012)(Cai et al., 2023)(Lee et al., 2023). As the reviewer suggested, we tested two additional lower concentrations of etoposide (5 μg/ml and 25 μg/ml) (see Fig. 5 C). We did not observe any significant differences in chromosome stiffness in 5 µg/ml etoposide-treated oocytes compared to the control. However, higher concentrations of etoposide (25 μg/ml) significantly reduced oocyte chromosome stiffness compared to the control.

      Revision to manuscript:

      “Results at lower etoposide concentrations revealed that chromosome stiffness in untreated control oocytes was not significantly different from that in oocytes treated with 5 μg/ml etoposide (3780 ± 700 Pa versus 3930 ± 400 Pa, P = 0.8624). However, chromosome stiffness in untreated oocytes was significantly higher than that in oocytes treated with 25 μg/ml etoposide (3780 ± 700 Pa versus 1640 ± 340 Pa, P = 0.015) (Figure 5C).”

      (2) Reviewer's point 3, the effect of Triton X-100: This is related to the concern of the #3 reviewer. It is critical to check whether the detergent does not affect the stiffness indirectly or not.

      To demonstrate that the low concentration of Triton X-100 does not influence chromosome stiffness, we conducted additional experiments. First, we isolated chromosomes and measured their stiffness. Then, we treated the chromosomes with 0.05% Triton X-100 via micro-spraying and remeasured the stiffness. The results showed no significant difference (see Supplement Fig. 5 right panel).

      Revision to manuscript:

      “In addition to past experiments indicating that mitotic chromosomes are stable for long periods after their isolation (Pope et al., 2006), we carried out control experiments on mouse oocyte chromosomes where we incubated them for 1 hour in PBS, or exposed them to a flow of Triton X-100 solution for 10 minutes; there was no change in chromosome stiffness in either case (Methods and Supplementary Fig. 5).”

      (3) Reviewer's point 1, the effect of the buffer composition: Please describe how the composition affects the stiffness of the chromosomes.

      PBS is an economical and effective buffer solution that closely mimics the osmotic conditions of the cytoplasm, which is crucial for maintaining chromosomal structural integrity. Appropriate ion concentrations are crucial for preserving chromosome integrity, as imbalances—either too high or too low—can alter chromosome morphology (Poirier and Marko, 2002). When chromosomes are stored in PBS, their stiffness remains relatively stable, even with prolonged exposure, ensuring minimal changes to their physical properties. To confirm this, we isolated chromosomes and measured their stiffness. After one-hour incubation in PBS, we remeasured stiffness and observed no significant differences, which demonstrated that chromosomes remain stable in PBS (see Supplement Fig.5 left panel).

      Revision to manuscript:

      “In this study, we developed a new way to isolate meiotic chromosomes and measure their stiffness. However, one concern is that the measurements were conducted in PBS solution, which is different from the intracellular environment. To address this, we monitored chromosome stiffness overtime in PBS solution and found that it remained stable over a period of one hour (Supplement Fig. 5 Left panel).”

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) Previously, the role of condensin complexes in chromosome stiffness is shown (Sun et al. Chromosome Research, 2018). Thus, at least the authors described the condensin staining on MI and MII chromosomes.

      We have added sentences in the discussion to elaborate on the role of condensin.

      Revision to manuscript:

      “Several factors, including condensin, have been found to affect chromosome stiffness (Sun et al., 2018). Condensin exists in two distinct complexes, condensin I and condensin II, and both are active during meiosis. Published studies indicate that condensin II is more sharply defined and more closely associated with the chromosome axis from anaphase I to metaphase II (Lee et al., 2011). Additionally, condensin II appears to play a more significant role in mitotic chromosome mechanics compared to condensin I (Sun et al., 2018). Thus, condensin II likely contributes more significantly to meiotic chromosome stiffness than condensin I.”

      (2) Although the authors nicely showed the difference in the stiffness between MI and MII chromosomes (Figure 2), as known, MI chromosomes are bivalent (with four chromatids) while MII chromosomes are univalent (with two chromatids). The physical property of the chromosomes would be affected by the number of chromatids. It would be essential for the authors to measure the physical properties of a univalent of MI chromosomes from mice defective in meiotic recombination such as Spo11 and/or Mlh3 KO mice.

      The reviewer correctly pointed out that the number of chromatids in chromosomes differs between metaphase I (MI) and metaphase II (MII) stages. We have addressed this difference by calculating Young’s modulus (E), a mechanical property that describes the elasticity of a material, independent of its geometry. Young’s modulus describes the intrinsic properties of the material itself, rather than the specific characteristics of the object being tested. It is calculated as E=(F/A)/(∆L/L0), where F was the force given to stretch the chromosome, A was the cross-section area, ∆L was the length change of the chromosome, and L0 was the original length of the chromosome. While an increase in chromosome or chromatid numbers, results in a larger cross-sectional area, leading to a higher doubling force (F). This variation in chromosome number or cross-sectional area does not impact the calculation of chromosome stiffness/Young’s modulus (E). While study of the mutants suggested by the referee would certainly be interesting, it would be likely that the absence of these key recombination factors would impact chromosome stiffness in a more complex way than just changing their thickness; this type of study is beyond the scope of the present manuscript and is an exciting direction for future studies.

      (3) In Figure 5, the authors measure the stiffness of etoposide-treated MI chromosomes. The concentration of the drug was 50 ug/ml, which is very high. The authors should analyze the different concentrations of the drug to check the chromosome stiffness. Moreover, etoposide is an inhibitor of Topoisomerase II. The effect of the drug might be caused by the defective Top2 activity, rather than Top2-adducts, thus DNA damage. It is very important to check the other Top2 inhibitors or DNA-damaging agents to generalize the effect of DNA damage on chromosome stiffness. Moreover, DNA damage induces the DNA damage response. It is important to check the effect of DDR inhibitors on the damage-induced change of stiffness.

      The reviewer is correct in noting that etoposide can induce DNA damage and inhibit Top2 activity. To address this concern, our previous DNase experiment provided further clarity and supports our results of this study (Biggs et al., 2020). This experiment was conducted in vitro, where DNase treatment caused DNA damage on chromosomes without affecting Top2 activity or triggering DNA damage response. The results demonstrated that DNase treatment led to reduced chromosome stiffness, which aligns with the findings presented in our manuscript.

      (4) In the same line as the #3 point, the authors also need to check the effect of etoposide on the stiffness of mitotic chromosomes from MEF.

      Experiments on MEF mitotic chromosomes were designed to serve as a reference for the meiotic chromosome studies. The etoposide experiments on meiotic chromosomes specifically aimed to investigate how DNA damage affects meiotic chromosome structure. While it would be interesting to explore the effects of etoposide-induced DNA damage on mitotic chromosomes, it represents a distinct research question that falls outside the scope of the current study.

      Minor points:

      (1) Line 141-142: Previous studies by the author analyzed the stiffness of mitotic chromosomes from pro-metaphase. Which stage of cell cycles did the authors analyze here?

      To ensure consistency in our experiments, we also measured the stiffness of mitotic chromosomes at the prometaphase stage. The precise stage used is very near to metaphase, at the very end of the prometaphase stage. We have modified the manuscript to clarify this point.

      Revision to manuscript:

      “For comparison with the meiotic case, we measured the chromosome stiffness of Mouse Embryonic Fibroblasts (MEFs) at late pro-metaphase (just slightly before their attachment to the mitotic spindle) and found that the average Young’s modulus was 340 ± 80 Pa (Figure 2B). The value is consistent with our previously published data, where the modulus for MEFs was measured to be 370 ± 70 Pa (Biggs et al., 2020).”

      (2) Line 157: Here, the doubling force of MI (and MII) oocytes should be described in addition to those of spermatocytes.

      The purpose of this paragraph is to demonstrate the reproductivity and consistency of our experiments. In this section, we compared our data with previously published findings. Published data do not include chromosome stiffness measurement from MI mouse oocytes. Our experiment is the first to assess this. Therefore, we did not include MI mouse oocytes in that comparison. To clarify this, we have added sentences to highlight the comparison of doubling force.

      Revision to manuscript:

      “Here, we found that the doubling forces of chromosomes from MI and MII oocytes are 3770 ± 940 pN and 510 ± 50 pN, respectively. We conclude that chromosomes from MI oocytes are much stiffer than those from both mitotic cells and MII oocytes (Supplement Fig. 2), in terms of either Young’s modulus or doubling force.”

      (3) Line 202: What stage of prophase I do the authors mean by the spermatocyte stage here? Diakinesis, Metaphase I or prometaphase I? I am not sure how the authors can determine a specific stage of prophase I by only looking at the thickness of the chromosomes. Please show the thickness distribution of WT and Rec8<sup>-/-</sup> chromosomes.

      We have reworded the sentence and clarified that the spermatocyte stage is prophase I stage. Since Rec8<sup>-/-</sup> spermatocytes cannot progress beyond the pachytene stage of prophase I, the isolated chromosomes must be in prophase I rather than diakinesis, metaphase I, prometaphase I, or any later stages (Xu et al., 2005). Based on the cell size and degree of chromosome condensation (Biggs et al., 2020), it is most likely that the measured chromosomes are at the zygotene-like stage. However, as we cannot definitively determine the exact substage of prophase I, thus, we have referred to them simply as prophase I.

      Revision to manuscript:

      “We isolated chromosomes from Rec8<sup>-/-</sup> prophase I spermatocytes, which displayed large and round cell size and thick chromosomal threads, indicative of advanced chromosome compaction after stalling at a zygotene-like prophase I stage (Supplement Fig. 3). The combination of large cell size and degree of chromosome compaction allowed us to reliably identify Rec8<sup>-/-</sup> prophase I chromosomes. Using micromanipulation, we measured chromosome stiffness by stretching the chromosomes (Supplement Fig. 3) (Biggs et al., 2019).”

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 135: that statement is not substantiated; better to show retraction data and full reversibility.

      We added a figure showing oocyte chromosome stretching, which showed that the oocyte chromosome is elastic, and that the stretching process is reversible (Supplement Fig.1).

      (2) Line 144: the authors claim that the Young Modulus of MII oocytes is "slightly" higher than that of mitotic cells (MEFs). Well, "slightly" means it is rather similar, and therefore the commonly used statement that MII is similar to mitosis is OK - contrary to the authors' claim.

      We have removed the word “slightly” in the manuscript. The difference is statistically significant.

      Revision to manuscript:

      “Surprisingly, despite this reduction, the stiffness of MII oocyte chromosomes was still significantly higher than that for mitotic cells (Figure 2B).”

      (3) There are a lot of awkward sentences in this text. Some sentences lack words, are not sufficiently precise in wording and/or logic, and there are numerous typos. Some examples can be found in lines 89 (grammar), 94, 95 ("looked"), 98, 101 ("difference" - between what?), and some are commonplaces or superficial (lines 92/93, 120..., ). Occasionally the present and past tense are mixed (e.g. in M&M). Thus the manuscript is quite poorly written.

      Thanks for the comments of the reviewer. We have revised all the sentences highlighted by the reviewer and polished the entire manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) Line 48. "We then investigated the contribution of meiosis-specific cohesin complexes to chromosome stiffness in MI and MII oocytes." There is no data on oocytes with meiosis-specific cohesin KO. This part should be corrected.

      We have corrected this error.

      Revision to manuscript:

      “We examined the role of meiosis-specific cohesin complexes in regulating chromosome stiffness.”

      (2) Lines 155-157. The result of MI mouse oocyte chromosomes should also be mentioned here (Supplementary Figure 1).

      Please see our response to Reviewer 1 – Minor Point 2.

      (3) Line 163. "The stiffness of chromosomes in MI mouse oocytes is significantly higher compared to MII oocytes."<br /> Is this because two homologs are paired in MI chromosomes (but not in MII chromosomes)? The authors may want to discuss the possible mechanism.

      Please see our response to Reviewer 1 – Major Point 2.

      (4) Line 188: "We hypothesized that MI oocytes... would have higher chromosome stiffness than MII oocytes." Why did the authors measure chromosomes from spermatocytes but not MI oocytes?

      Both spermatocytes and oocytes from Rec8<sup>-/-</sup>, Stag3<sup>-/-</sup>, and Rad21l<sup>-/-</sup> mutant mice cannot reach MI stage because cohesin proteins are crucial for germline-cell development. We chose to use spermatocytes in our study because collecting fetal meiotic oocytes is extremely difficult, and genotyping fetal mice adds another layer of complexity to the experiments. In females, all oocytes complete prophase I and progress to the dictyotene stage during the fetal stage. Obtaining individual oocytes at this stage is challenging. In contrast, spermatocytes are continuously generated at all stages in males.

      (5) To support the authors' conclusion, verifying the KO of REC8, STAG3, and RAD21L by immunostaining or other methods is essential.

      These mice are provided by one of the authors, Dr. Philip Jordan, who has published several papers using these knockout mice (Hopkins et al., 2014)(Ward et al., 2016). The immunostaining of these models has already been well-characterized in those previous studies. In addition to performing double genotyping, we also use the size of the collected testes as an additional verification of the mutant genotype. These knockout mice have significantly smaller testes compared to their wild-type counterparts, providing a clear physical indicator of the mutation.

      (6) Some of the cited papers and descriptions in the Introduction are not appropriate and confusing. This part should be improved:

      Line 79. Recent studies have revealed that the 30-nm fiber is not considered the basic structure of chromatin (e.g., review, PMID: 30908980; original papers, PMID: 19064912, 22343941, 28751582). This point should be included.

      We have corrected the references as needed. Additionally, thank you for the updated information regarding the 30-nm fiber. We have removed all the descriptions about the 30-nm fiber to ensure the information is accurate and up to date.

      (7) Line 83. Reviews on mitotic chromosomes, rather than Ref. 9, should be cited here. For instance, PMID: 33836947, 31230958.

      We have corrected it and added references according to the review’s suggestion.

      (8) Line 85. Refs. 10 and 11 are not on the "Scaffold/Radial-Loop" model. For instance, PMID: 922894, 277351, 12689587. The other popular model is the hierarchical helical folding model (PMID: 98280, 15353545).

      We have corrected it and added appropriate references according to the review’s suggestion. Regarding the hierarchical helical folding model, our experiments do not provide data that either support or refute this model. Thus, we have opted not to include any discussion of this model in our manuscript.

      (9) Figure legends. There is no description of the statistical test.

      We have added the description of the statistical test at the end of the figure legends for clarity.

      (10) Line 156. The authors should mention which stages in spermatocyte prophase I (pachytene?) were used for their measurement.

      We cannot precisely determine the substage of prophase I in the spermatocytes although it is most likely in the pachytene stage.

      (11) Line 241. "DNA damage reduces chromosome stiffness in oocytes." It would be better to show how much damage was induced in aged and etoposide-treated chromosomes, for example, by gamma-H2AX immunostaining. In addition, there are some papers that show DNA damage makes chromatin/chromosomes softer (e.g., PMID: 33330932). The authors need to cite these papers.

      The effects of etoposide and age on meiotic oocytes has been published (Collins et al., 2015)(Marangos et al., 2015)(Winship et al., 2018).

      We are grateful for the citation information provided by the reviewer and have added it to our manuscript.

      Revision to manuscript:

      “Overall, these findings suggest that DNA damage reduces chromosome stiffness in oocytes instead of increasing it, which aligns with other studies showing that DNA damage can make chromosomes softer (Dos Santos et al., 2021). These results suggest that the increased chromosome stiffness observed in aged oocytes is not due to DNA damage.”

      (12) Line 328. Senescence?

      This error is corrected in the revised manuscript.

      Revision to manuscript:

      “Defective chromosome organization is often related to various diseases, such as cancer, infertility, and senescence (Thompson and Compton, 2011; Harton and Tempest, 2012; He et al., 2018).”

      References:

      Biggs, R., P.Z. Liu, A.D. Stephens, and J.F. Marko. 2019. Effects of altering histone posttranslational modifications on mitotic chromosome structure and mechanics. Mol. Biol. Cell. 30:820–827. doi:10.1091/mbc.E18-09-0592.

      Biggs, R.J., N. Liu, Y. Peng, J.F. Marko, and H. Qiao. 2020. Micromanipulation of prophase I chromosomes from mouse spermatocytes reveals high stiffness and gel-like chromatin organization. Commun. Biol. 3:1–7. doi:10.1038/s42003-020-01265-w.

      Cai, X., J.M. Stringer, N. Zerafa, J. Carroll, and K.J. Hutt. 2023. Xrcc5/Ku80 is required for the repair of DNA damage in fully grown meiotically arrested mammalian oocytes. Cell Death Dis. 14:1–9. doi:10.1038/s41419-023-05886-x.

      Collins, J.K., S.I.R. Lane, J.A. Merriman, and K.T. Jones. 2015. DNA damage induces a meiotic arrest in mouse oocytes mediated by the spindle assembly checkpoint. Nat. Commun. 6. doi:10.1038/ncomms9553.

      Harton, G.L., and H.G. Tempest. 2012. Chromosomal disorders and male infertility. Asian J. Androl. 14:32–39. doi:10.1038/aja.2011.66.

      He, Q., B. Au, M. Kulkarni, Y. Shen, K.J. Lim, J. Maimaiti, C.K. Wong, M.N.H. Luijten, H.C. Chong, E.H. Lim, G. Rancati, I. Sinha, Z. Fu, X. Wang, J.E. Connolly, and K.C. Crasta. 2018. Chromosomal instability-induced senescence potentiates cell non-autonomous tumourigenic effects. Oncogenesis. 7. doi:10.1038/s41389-018-0072-4.

      Hopkins, J., G. Hwang, J. Jacob, N. Sapp, R. Bedigian, K. Oka, P. Overbeek, S. Murray, and P.W. Jordan. 2014. Meiosis-Specific Cohesin Component, Stag3 Is Essential for Maintaining Centromere Chromatid Cohesion, and Required for DNA Repair and Synapsis between Homologous Chromosomes. PLoS Genet. 10:e1004413. doi:10.1371/journal.pgen.1004413.

      Lee, C., J. Leem, and J.S. Oh. 2023. Selective utilization of non-homologous end-joining and homologous recombination for DNA repair during meiotic maturation in mouse oocytes. Cell Prolif. 56:1–12. doi:10.1111/cpr.13384.

      Lee, J., S. Ogushi, M. Saitou, and T. Hirano. 2011. Condensins I and II are essential for construction of bivalent chromosomes in mouse oocytes. Mol. Biol. Cell. 22:3465–3477. doi:10.1091/mbc.E11-05-0423.

      Marangos, P., and J. Carroll. 2012. Oocytes progress beyond prophase in the presence of DNA damage. Curr. Biol. 22:989–994. doi:10.1016/j.cub.2012.03.063.

      Marangos, P., M. Stevense, K. Niaka, M. Lagoudaki, I. Nabti, R. Jessberger, and J. Carroll. 2015. DNA damage-induced metaphase i arrest is mediated by the spindle assembly checkpoint and maternal age. Nat. Commun. 6:1–10. doi:10.1038/ncomms9706.

      Poirier, M.G., and J.F. Marko. 2002. Mitotic chromosomes are chromatin networks without a mechanically contiguous protein scaffold. Proc. Natl. Acad. Sci. U. S. A. 99:15393–15397. doi:10.1073/pnas.232442599.

      Pope, L.H., C. Xiong, and J.F. Marko. 2006. Proteolysis of Mitotic Chromosomes Induces Gradual and Anisotropic Decondensation Correlated with a Reduction of Elastic Modulus and Structural Sensitivity to Rarely Cutting Restriction Enzymes. Mol. Biol. Cell. 17:104. doi:10.1091/MBC.E05-04-0321.

      Dos Santos, Á., A.W. Cook, R.E. Gough, M. Schilling, N.A. Olszok, I. Brown, L. Wang, J. Aaron, M.L. Martin-Fernandez, F. Rehfeldt, and C.P. Toseland. 2021. DNA damage alters nuclear mechanics through chromatin reorganization. Nucleic Acids Res. 49:340–353. doi:10.1093/nar/gkaa1202.

      Sun, M., R. Biggs, J. Hornick, and J.F. Marko. 2018. Condensin controls mitotic chromosome stiffness and stability without forming a structurally contiguous scaffold. Chromosom. Res. 26:277–295. doi:10.1007/s10577-018-9584-1.

      Thompson, S.L., and D.A. Compton. 2011. Chromosomes and cancer cells. Chromosom. Res. 19:433–444. doi:10.1007/s10577-010-9179-y.

      Ward, A., J. Hopkins, M. Mckay, S. Murray, and P.W. Jordan. 2016. Genetic Interactions Between the Meiosis-Specific Cohesin Components, STAG3, REC8, and RAD21L. G3 (Bethesda). 6:1713–24. doi:10.1534/g3.116.029462.

      Winship, A.L., J.M. Stringer, S.H. Liew, and K.J. Hutt. 2018. The importance of DNA repair for maintaining oocyte quality in response to anti-cancer treatments, environmental toxins and maternal ageing. Hum. Reprod. Update. 24:119–134. doi:10.1093/humupd/dmy002.

      Xu, H., M.D. Beasley, W.D. Warren, G.T.J. van der Horst, and M.J. McKay. 2005. Absence of Mouse REC8 Cohesin Promotes Synapsis of Sister Chromatids in Meiosis. Dev. Cell. 8:949–961. doi:10.1016/j.devcel.2005.03.018.

    1. eLife Assessment

      By combining Synthetic Genetic Array (SGA) analysis with state-of-the-art imaging techniques, this study provides strong evidence that sphingolipid metabolism controls the maturation of Parkinson's disease-associated Synphilin-1 inclusion bodies (SY1 IBs) on the mitochondrial surface in a yeast model. The authors present compelling proof that perturbing the sphingolipid metabolic pathway leads to delayed SY1 IB maturation and enhanced SY1-triggered toxicity. Altogether, the authors show the important role of sphingolipid metabolism in the detoxification process of misfolded proteins by facilitating large IB formation on the mitochondrial outer membrane.

    2. Reviewer #2 (Public review):

      Summary:

      The authors used a yeast model for analyzing Parkinson's disease-associated synphilin-1 inclusion bodies (SY1 IBs). In this model system, large SY1 IBs are efficiently formed from smaller potentially more toxic SY1 aggregates. Using a genome-wide approach (synthetic genetic array, SGA, combined with a high content imaging approach), the authors identified the sphingolipid metabolic pathway as pivotal for SY1 IBs formation. Disturbances of this pathway increased SY1-triggered growth deficits, loss of mitochondrial membrane potential, increased production of reactive oxygen species (ROS), and decreased cellular ATP levels pointing to an increased energy crisis within affected cells. Notably, SY1 IBs were found to be surrounded by mitochondrial membranes using state-of-the-art super-resolution microscopy. Finally, the effects observed in the yeast for SY1 IBs turned out to be evolutionary conserved in mammalian cells. Thus, sphingolipid metabolism might play an important role in the detoxification of misfolded proteins by large IBs formation at the mitochondrial outer membrane.

      Strengths:

      • The SY1 IB yeast model is very suitable for the analysis of genes involved in IB formation.<br /> • The genome-wide approach combining a synthetic genetic array (SGA) with a high content imaging approach is a compelling approach and enabled the reliable identification of novel genes. The authors tightly checked the output of the screen.<br /> • The authors clearly showed, including a couple of control experiments, that the sphingolipid metabolic pathway is crucial for SY1 IB formation and cytotoxicity.<br /> • The localization of SY1 IBs at mitochondrial membranes has been clearly demonstrated with state-of-the-art super-resolution microscopy and biochemical methods.<br /> • Pharmacological manipulation of the sphingolipid pathway influenced mitochondrial function and cell survival.<br /> • The authors have carefully redone critical experiments to avoid any misleading interpretation of data.

      Weaknesses:

      • It remains unclear how sphingolipids are involved in SY1 IB formation.

      Comments on revisions: No further comments

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:<br /> (1) I still think that the authors need to set the importance of the differences in aggregation in the context of toxicity arising from protein misfolding/aggregation. While the authors state the limitation in the response, and I agree that a single manuscript cannot complete a field of investigation I still think that this is an important point missing from this manuscript.

      We thank the reviewer for the comments, we are working to address this issue and will elucidate in our future studies.

      (2) I retain my reservations about the fluorescence intensity data shown for Rho123, DCF, Jc1, and MitoSox. The errors are much lower than what we typically achieve in biological experiments in our as well as our collaborator's lab. A glimpse at published literature would also support our statement. Specifically, RHO123 shows a large difference in errors between Figure 5 and Figure 5 Supplement 2. The point to note is that the absolute intensities do not vary between these figures, but the errors are the order of magnitude lower in the main figures. I, therefore, accept these figures in good faith without further interrogation.

      We really value these comments from the reviewer and also do not want to cause any potential misleading interpretations of the data. We have therefore asked a more experienced author to redo all the experiments on the physiological indicators (Rho123, JC1 and MitoSox) that directly reflect mitochondrial function, and left out the DCF data. The new experimental data are in line with our previous results. We have clearly described these changes in the Results, Materials and Methods and Figure legends sections.

      The new data from the redo experiments are: Rho123 fluorescence intensity data in Figure 5A, B and C; Figure 6B; JC1 staining in Figure 6E; JC1 staining in Figure 7A, B and D.

    1. eLife Assessment

      This important study presents an original and promising approach to combine convolutional neural networks of visual processing with evidence accumulation models of decision-making. While the methodological approach is technically sophisticated and the evidence is solid, there is still a gap between the model and the behavioral data. The study will be of interest to researchers working in the fields of machine learning and cognitive modeling.

    2. Reviewer #1 (Public review):

      Summary:

      This paper introduces a new approach for modeling human behavioral responses using image-computable models. They create a model (VAM) that is a combination of a standard CNN coupled with a standard evidence accumulation model (EAM). The combined model is then trained directly on image-level data using human behavioral responses. This approach is original and can have wide applicability. However, many of the specific findings reported are less compelling.

      Strengths:

      (1) The manuscript presents an original approach of fitting an image-computable model to human behavioral data. This type of approach is sorely needed in the field.<br /> (2) The analyses are very technically sophisticated.<br /> (3) The behavioral data are large both in terms of sample size (N=75) and in terms of trials per subject.

      Weaknesses:

      (1) The main advance here thus appears to be methodological rather than conceptual. It's really cool that VAMs are image computable and are also fit to human data. But what we learn about the mind or brain is perhaps more modest.<br /> (2) In the approach here, a given stimulus is always processed in the same way through the core CNN to produce activations v_k. These v_k's are then corrupted by Gaussian noise to produce drift rates d_k, which can differ from trial to trial even for the same stimulus. In other words, the assumption built into VAM appears to be that the drift rate variability stems entirely from post-sensory (decisional) noise. In contrast, the typical interpretation of EAMs is that the variability in drift rates is sensory. In response to this concern, the authors responded that one can imagine an additional (unmodeled) sensory process that adds variability to the drift rates. However, this process remains unmodeled. The authors motivate their paper by saying "EAMs do not explain how the visual system extracts these representations in the first place" (second sentence of the Abstract). VAM is definitely a step in this direction but there's still a gap between the current VAM implementation and sensory systems.

    3. Reviewer #2 (Public review):

      In An image-computable model of speeded decision-making, the authors introduce and fit a combined CCN-EAM (a 'VAM') to flanker-task-like data. They show that the VAM can fit mean RTs and accuracies as well as the congruency effect that is present in the data, and subsequently analyze the VAM in terms of where in the network congruency effects arise.

      I have mixed feelings about this manuscript, as I appreciate the innovative efforts to combine CNNs with EAMs in a new class of cognitive models, while also having some reservations from an EAM perspective. The idea of combining these approaches has great potential, and I'm excited to see where this research will lead. However, I do have some concerns about the quality of fit between the behavioral data and the model. Specifically, the RT distributions, delta plots, and conditional accuracy function don't appear to be well-matched by the VAM. The conflict effects on behavioral data are well-established and typically considered crucial to understanding the underlying cognitive process. Unfortunately, it seems that these parts of the data don't fit well with the proposed model.

      This disparity is not entirely surprising. The EAM literature suggests that LBA models might not be suitable for conflict tasks, and the presented results seem to confirm this concern. Conflict EAMs, including the DMC (e.g., Ulrich et al., 2015; Evans & Servant, 2022; Lee & Sewell 2024), propose dynamic drift rates with a fast automatic process that is gradually withdrawn from evidence accumulation over time. This approach results in congruency effects arising from temporal dynamics, not spatial representations.<br /> In contrast, the VAM imposes static drift rates in the LBA model, leading to an effect between drift rates that translates to changes in representations. However, this account does not adequately explain the behavioral data, and the proposed representational geometry explanation is therefore limited.

      My concerns are addressed in the revised manuscript, but I struggle to understand why the authors distinguish between explaining mean effects across individuals and congruency effects within individuals. These concepts seem related, and issues at the individual level could propagate to the group mean. Furthermore, I find it challenging to accept that dynamics merely act 'in concert' with the orthogonalization mechanism, as it seems possible that an account that uses a time-varying EAM may not require any orthogonalization mechanism in the first place. The orthogonalization mechanism might have arisen because the model does not have the possibility to account for the conflict effect from temporal effects, instead of spatial effects. I could envision a CNN-DMC in which conflict effects arise only at the level of the choice model (e.g., as a time-varying filter that changes which information is read out from the visual system, rather than due to changes in the representations in the visual system itself). This possibility should be acknowledged in the paper, and it would be interesting to discuss how such an account would be tested.

      While I appreciate the technological advancement presented in this paper, my concerns are not about implementation details but rather about the choice of models and their consequences. I believe that a more in-depth exploration of which conclusions can be drawn, and which model comparisons would be required to reach a final conclusion.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper introduces a new approach to modeling human behavioral responses using image-computable models. They create a model (VAM) that is a combination of a standard CNN coupled with a standard evidence accumulation model (EAM). The combined model is then trained directly on image-level data using human behavioral responses. This approach is original and can have wide applicability. However, many of the specific findings reported are less compelling.

      Strengths:

      (1) The manuscript presents an original approach to fitting an image-computable model to human behavioral data. This type of approach is sorely needed in the field.

      (2) The analyses are very technically sophisticated.

      (3) The behavioral data are large both in terms of sample size (N=75) and in terms of trials per subject.

      Weaknesses:

      Major

      (1) The manuscript appears to suggest that it is the first to combine CNNs with evidence accumulation models (EAMs). However, this was done in a 2022 preprint

      (https://www.biorxiv.org/content/10.1101/2022.08.23.505015v1) that introduced a network called RTNet. This preprint is cited here, but never really discussed. Further, the two unique features of the current approach discussed in lines 55-60 are both present to some extent in RTNet. Given the strong conceptual similarity in approach, it seems that a detailed discussion of similarities and differences (of which there are many) should feature in the Introduction.

      Thanks for pointing this out—we agree that the novel contributions of our model (the VAM) with respect to prior related models (including RTNet) should be clarified, and have revised the Introduction accordingly. We include the following clarifications in the Introduction:

      “The key feature of the VAM that distinguishes it from prior models is that the CNN and EAM parameters are jointly fitted to the RT, choice, and visual stimulus data from individual participants in a unified Bayesian framework. Thus, both the visual representations learned by the CNN and the EAM parameters are directly constrained by behavioral data. In contrast, prior models first optimize the CNN to perform the behavioral task, then separately fit a minimal set of high-level CNN parameters [RTNet, Rafiei et al., 2024] and/or the EAM parameters to behavioral data [Annis et al., 2021; Holmes et al., 2020; Trueblood et al., 2021]. As we will show, fitting the CNN with human data—rather than optimizing the model to perform a task—has significant consequences for the representations learned by the model.”

      E.g. in the case of RTNet, the variability of the Bayesian CNN weight distribution, the decision threshold, and the magnitude of the noise added to the images are adjusted to match the average human accuracy (separately for each task condition). RTNet is an interesting and useful model that we believe has complementary strengths to our own work.

      Since there are several other existing models in addition to the VAM and RTNet that use CNNs to generate RTs or RT proxies (by our count, at least six that we cite earlier in the Introduction), we felt it was inappropriate to preferentially include a detailed comparison of the VAM and RTNet beyond the passage quoted above.

      (2) In the approach here, a given stimulus is always processed in the same way through the core CNN to produce activations v_k. These v_k's are then corrupted by Gaussian noise to produce drift rates d_k, which can differ from trial to trial even for the same stimulus. In other words, the assumption built into VAM appears to be that the drift rate variability stems entirely from post-sensory (decisional) noise. In contrast, the typical interpretation of EAMs is that the variability in drift rates is sensory. This is also the assumption built into RTNet where the core CNN produces noisy evidence. Can the authors comment on the plausibility of VAM's assumption that the noise is post-sensory?

      In our view, the VAM is compatible with a model in which the drift rate variability for a given stimulus is due to sensory noise, since we do not specify the origin of the Gaussian noise added to the drift rates. As the reviewer notes, the CNN component of the VAM processes a given stimulus deterministically, yielding the mean drift rates. This does not preclude us from imagining an additional (unmodeled) sensory process that adds variability to the drift rates. The VAM simply represents this and other hypothetical sources of variability as additive Gaussian noise. We agree however that it is worthwhile to think about the origin of the drift rate variability, though it is not a focus of our work.

      (3) Figure 2 plots how well VAM explains different behavioral features. It would be very useful if the authors could also fit simple EAMs to the data to clarify which of these features are explainable by EAMs only and which are not.

      In our view, fitting simple EAMs to the data would not be especially informative and poses a number of challenges for the particular task we study (LIM) that are neatly avoided by using the VAM. In particular, as we show in Figure 2, the stimuli vary along several dimensions that all appear to influence behavior: horizontal position, vertical position, layout, target direction, and flanker direction. Since the VAM is stimulus-computable, fitting the VAM automatically discovers how all of these stimulus features influence behavior (via their effect on the drift rates outputted by the CNN). In contrast, fitting a simple EAM (e.g. the LBA model) necessitates choosing a particular parameterization that specifies the relationship between all of the stimulus features and the EAM model parameters. This raises a number of practical questions. For example, should we attempt to fit a separate EAM for each stimulus feature, or model all stimulus features simultaneously?

      Moreover, while we could in principle navigate these issues and fit simple EAMs to the data, we do not intend to claim that simple EAMs fail to explain the relationship between stimulus features and behavior as well as the VAM. Rather, the key strength of the VAM relative to simple EAMs is that it includes a detailed and biologically plausible model of human vision. The majority of the paper capitalizes on this strength by showing how behavioral effects of interest (namely congruency effects) can be explained in terms of the VAM’s visual representations.

      (4) VAM is tested in two different ways behaviorally. First, it is tested to what extent it captures individual differences (Figure 2B-E). Second, it is tested to what extent it captures average subject data (Figure 2F-J). It wasn't clear to me why for some metrics only individual differences are examined and for other metrics only average human data is examined. I think that it will be much more informative if separate figures examine average human data and individual difference data. I think that it's especially important to clarify whether VAM can capture individual differences for the quantities plotted in Figures 2F-J.

      We would like to clarify that Fig. 2J in fact already shows how well the VAM captures individual differences for the average subject data shown in Fig. 2H (stimulus layout) and Fig. 2I (stimulus position). For a given participant and stimulus feature, we calculated the Pearson's r between model/participant mean RTs across each stimulus feature value. Fig. 2J shows the distribution of these Pearson’s r values across all participants for stimulus layout and horizontal/vertical position.

      Fig. 2G also already shows how well the VAM captures individual differences in behavior. Specifically, this panel shows individual differences in mean RT attributable to differences in age. For Fig. 2F, which shows how the model drift rates differ on congruent vs. incongruent trials, there is no sensible way to compare the models to the participants at any level of analysis (since the participants do not have drift rates). 

      (5) The authors look inside VAM and perform many exploratory analyses. I found many of these difficult to follow since there was little guidance about why each analysis was conducted. This also made it difficult to assess the likelihood that any given result is robust and replicable. More importantly, it was unclear which results are hypothesized to depend on the VAM architecture and training, and which results would be expected in performance-optimized CNNs. The authors train and examine performance-optimized CNNs later, but it would be useful to compare those results to the VAM results immediately when each VAM result is first introduced.

      Thanks for pointing this out—we apologize for any confusion caused by our presentation of the CNN analyses. We have added in additional motivating statements, methodological clarifications, and relevant references to our Results, particularly for Figure 3 in which we first introduce the analyses of the CNN representations/activity. In general, each analysis is prefaced by a guiding question or specific rationale, e.g. “How do the models' visual representations enable target selectivity for stimuli that vary along several irrelevant dimensions?” We also provide numerous references in which these analysis techniques have been used to address similar questions in CNNs or the primate visual cortex.

      We chose to maintain the current organization of our results in which the comparison between the VAM and the task-optimized models are presented in a separate figure. We felt that including analyses of both the VAM and task-optimized models in the initial analyses of the CNN representations would be overwhelming for many readers. As the reviewer acknowledges, some readers may already find these results challenging to follow. 

      (6) The authors don't examine how the task-optimized models would produce RTs. They say in lines 371-2 that they "could not examine the RT congruency effect since the task-optimized models do not generate RTs." CNNs alone don't generate RTs, but RTs can easily be generated from them using the same EAM add-on that is part of VAM. Given that the CNNs are already trained, I can't see a reason why the authors can't train EAMs on top of the already trained CNNs and generate RTs, so these can provide a better comparison to VAM.

      We appreciate this suggestion, but we judge the suggestion to “train EAMs on top of the already trained CNNs and generate RTs” to be a significant expansion of the scope of the paper with multiple possible roads forward. In particular, one must specify how the outputs of the task-optimized CNN (logits for each possible response) relate to drift rates, and there is no widely-accepted or standard way to do this. Previously proposed methods include transforming representation distances in the last layer to drift rates (https://doi.org/10.1037/xlm0000968), fitting additional subject-specific parameters that map the logits to drift rates

      (https://doi.org/10.1007/s42113-019-00042-1), or using the softmax-scored model outputs as drift rates directly (https://doi.org/10.1038/s41562-024-01914-8), though in the latter case the RTs are not on the same scale as human data. In our view, evaluating these different methods is beyond the scope of this paper. An advantage of the VAM is that one does not have to fit two separate models (a CNN and a EAM) to generate RTs.

      Nonetheless, we agree that it would be informative to examine something like RTs in the task-optimized models. Our revised Results section now includes an analysis of the confidence of the task-optimized models’ decisions, which we use a proxy for RTs:   

      “Since the task-optimized models do not generate RTs, it is not possible to directly measure RT congruency effects in these models without making additional assumptions about how the CNN's classification decisions relate to RTs. However, as a coarse proxy for RT, we can examine the confidence of the CNN's decisions, defined as the softmax-scored logit (probability) of the most probable direction in the final CNN layer. This choice of RT proxy is motivated by some prior studies that have combined CNNs with EAMs [Annis et al., 2021; Holmes et al., 2020; Trueblood et al., 2021]. These studies explicitly or implicitly derive a measure of decision confidence from the activity of the last CNN layer. The confidence measure is then mapped to the EAM drift rates, such that greater decision confidence generally corresponds to higher drift rates (and therefore shorter RTs).

      We calculated the average confidence of each task-optimized CNN separately for congruent vs. incongruent trials. On average, the task-optimized models showed higher confidence on congruent vs. incongruent trials (W = 21.0, p < 1e-3, Wilcoxon signed-rank test; Cohen's d = 0.99; n = 75 models). These analyses therefore provide some evidence that task-optimized CNNs have the capacity to exhibit congruency effects, though an explicit comparison of the magnitude of these effects with human data requires additional modeling assumptions (e.g., fitting a separate EAM).”

      (7) The Discussion felt very long and mostly a summary of the Results. I also couldn't shake the feeling that it had many just-so stories related to the variety of findings reported. I think that the section should be condensed and the authors should be clearer about which explanations are speculations and which are air-tight arguments based on the data.

      We have shortened the Discussion modestly and we have added in some clarifying language to help clarify which arguments are more speculative vs. directly supported by our data.

      Specifically, we added in the phrase “we speculate that…” for two suggestions in the Discussion (paragraphs 3 and 5), and we ensured that any other more speculative suggestions contain such clarifying language. We have also added in subheadings in the Discussion to help readers navigate this section. 

      (8) In one of the control analyses, the authors train different VAMs on each RT quantile. I don't understand how it can be claimed that this approach can serve as a model of an individual's sensory processing. Which of the 5 sets of weights (5 VAMs) captures a given subject's visual processing? Are the authors saying that the visual system of a given subject changes based on the expected RT for a stimulus? I feel like I'm missing something about how the authors think about these results.

      We agree that these particular analyses may cause confusion and have removed them from our revised manuscript.

      Reviewer #2 (Public Review):

      In an image-computable model of speeded decision-making, the authors introduce and fit a combined CCN-EAM (a 'VAM') to flanker-task-like data. They show that the VAM can fit mean RTs and accuracies as well as the congruency effect that is present in the data, and subsequently analyze the VAM in terms of where in the network congruency effects arise.

      Overall, combining DNNs and EAMs appears to be a promising avenue to seriously model the visual system in decision-making tasks compared to the current practice in EAMs. Some variants have been proposed or used before (e.g., doi.org/10.1016/j.neuroimage.2017.12.078 , doi.org/10.1007/s42113-019-00042-1), but always in the context of using task-trained models, rather than models trained on behavioral data. However, I was surprised to read that the authors developed their model in the context of a conflict task, rather than a simpler perceptual decision-making task. Conflict effects in human behavior are particularly complex, and thereby, the authors set a high goal for themselves in terms of the to-be-explained human behavior. Unfortunately, the proposed VAM does not appear to provide a great account of conflict effects that are considered fundamental features of human behavior, like the shape of response time distributions, and specifically, delta plots (doi.org/10.1037/0096-1523.20.4.731). The authors argue that it is beyond the scope of the presented paper to analyze delta plots, but as these are central to studies of human conflict behavior, models that aim to explain conflict behavior will need to be able to fit and explain delta plots.

      Theories on conflict often suggest that negative/positive-trending delta plots arise through the relative timing of response activation related to relevant and irrelevant information.

      Accumulation for relevant and irrelevant information would, as a result, either start at different points in time or the rates vary over time. The current VAM, as a feedforward neural network model, does not appear to be able to capture such effects, and perhaps fundamentally not so: accumulation for each choice option is forced to start at the same time, and rates are a static output of the CNN.

      The proposed solution of fitting five separate VAMs (one for each of five RT quantiles) is not satisfactory: it does not explain how delta plots result from the model, for the same reason that fitting five evidence accumulation models (one per RT quantile) does not explain how response time distributions arise. If, for example, one would want to make a prediction about someone's response time and choice based on a given stimulus, one would first have to decide which of the five VAMs to use, which is circular. But more importantly, this way of fitting multiple models does not explain the latent mechanism that underlies the shape of the delta plots.

      As such, the extensive analyses on the VAM layers and the resulting conclusions that conflict effects arise due to changing representations across layers (e.g., "the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations") - while inspiring, they remain hard to weigh, as they are contingent on the assumption that the VAM can capture human behavior in the conflict task, which it struggles with. That said, the promise of combining CNNs and EAMs is clearly there. A way forward could be to either adjust the proposed model so that it can explain delta plots, which would potentially require temporal dynamics and time-varying evidence accumulation rates, or perhaps to start simpler and combine CCNs-EAMs that are able to fit more standard perceptual decision-making tasks without conflict effects.

      We thank the reviewer for their thoughtful comments on our work. However, we note that the

      VAM does in fact capture the positive-trending RT delta plot observed in the participant data (Fig. S4A), though the intercepts for models/participants differ somewhat. On the other hand, the conditional accuracy functions (Fig. S4B) reveal a more pronounced difference between model and participant behavior. As the reviewer points out, capturing these effects is likely to require a model that can produce time-varying drift rates, whereas our model produces a fixed drift rate for a given stimulus. We also agree that fitting a separate VAM to each RT quantile is not a satisfactory means of addressing this limitation and have removed these analyses from our revised manuscript.

      However, while we agree that accurately capturing these dynamic effects is a laudable goal, it is in our view also worthwhile to consider explanations for the mean behavioral effect (i.e. the accuracy congruency effect), which can occur independently of any consideration of dynamics. One of our main findings is that across-model variability in accuracy congruency effects is better attributed to variation in representation geometry (target/flanker subspace alignment) vs.

      variation in the degree of flanker suppression. This finding does not require any consideration of dynamics to be valid at the level of explanation we pursue (across-user variability in congruency effects), but also does not preclude additional dynamic processes that could give rise to more specific error patterns. Our revised discussion now includes a section where we summarize and elaborate on these ideas:

      “It is not difficult to imagine how the orthogonalization mechanism described above, which explains variability in accuracy congruency effects across individuals, could act in concert with other dynamic processes that explain variability in congruency effects within individuals (e.g., as a function of RT). In general, any process that dynamically gates the influence of irrelevant sensory information on behavioral outputs could accomplish this, for example ramping inhibition of incorrect response activation [https://doi.org/10.3389/fnhum.2010.00222], a shrinking attention spotlight [https://doi.org/10.1016/j.cogpsych.2011.08.001], or dynamics in neural population-level geometry [https://doi.org/10.1038/nn.3643]. To pursue these ideas, future work may aim to incorporate dynamics into the visual component and decision component of the VAM with recurrent CNNs [https://doi.org/10.48550/arXiv.1807.00053, https://doi.org/10.48550/arXiv.2306.11582] and the task-DyVA model [https://doi.org/10.1038/s41562-022-01510-8], respectively.”

      Reviewer #3 (Public Review):

      Summary:

      In this article, the authors combine a well-established choice-response time (RT) model (the Linear Ballistic Accumulator) with a CNN model of visual processing to model image-based decisions (referred to as the Visual Accumulator Model - VAM). While this is not the first effort to combine these modeling frameworks, it uses this combination of approaches uniquely.

      Specifically, the authors attempt to better understand the structure of human information representations by fitting this model to behavioral (choice-RT) data from a classic flanker task. This objective is made possible by using a very large (by psychological modeling standards) industry data set to jointly fit both components of this VAM model to individual-level data. Using this approach, they illustrate (among other results) (1) how the interaction between target and flanker representations influence the presence and strength of congruency effects, (2) how the structure of representations changes (distributed versus more localized) with depth in the CNN model component, and (3) how different model training paradigms change the nature of information representations. This work contributes to the ML literature by demonstrating the value of training models with richer behavioral data. It also contributes to cognitive science by demonstrating how ML approaches can be integrated into cognitive modeling. Finally, it contributes to the literature on conflict modeling by illustrating how information representations may lead to some of the classic effects observed in this area of research.

      Strengths:

      (1) The data set used for this analysis is unique and is made publicly available as part of this article. Specifically, they have access to data for 75 participants with >25,000 trials per participant. This scale of data/individual is unusual and is the foundation on which this research rests.

      (2) This is the first time, to my knowledge, that a model combining a CNN with a choice-RT model has been jointly fit to choice-RT data at the level of individual people. This type of model combination has been used before but in a more restricted context. This joint fitting, and in particular, learning a CNN through the choice-RT modeling framework, allows the authors to probe the structure of human information representations learned directly from behavioral data.

      (3) The analysis approaches used in this article are state-of-the-art. The training of these models is straightforward given the data available. The interesting part of this article (opinion of course) is the way in which they probe what CNN has learned once trained. I find their analysis of how distractor and target information interfere with each other particularly compelling as well as their demonstration that training on behavioral data changes the structure of information representations when compared to training models on standard task-optimized data.

      Weaknesses:

      (1) Just as the data in this article is a major strength, it is also a weakness. This type of modeling would be difficult, if not impossible to do with standard laboratory data. I don't know what the data floor would be, but collecting tens of thousands of decisions for a single person is impractical in most contexts. Thus this type of work may live in the realm of industry. I do want to re-iterate that the data for this study was made publicly available though!

      We suspect (but have not systematically tested) that the VAMs can be fitted with substantially less data. We use data augmentation techniques (various randomized image transformations) during training to improve the generalization capabilities of the VAMs, and these methods are likely to be particularly important when training on smaller datasets. One could consider increasing the amount of image data augmentation when working with smaller datasets, or pursuing other forms of data augmentation like resampling from estimated RT distributions (see https://doi.org/10.1038/s41562-022-01510-8 for an example of this). In general, we don’t think that prospective users of our approach should be discouraged if they have only a few hundred trials per subject (or less) - it’s worth trying!

      (2) While this article uses choice-RT data it doesn't fully leverage the richness of the RT data itself. As the authors point out, this modeling framework, the LBA component in particular, does not account for some of the more nuanced but well-established RT effects in this data. This is not a big concern given the already nice contributions of this article and it leads to an opportunity for ongoing investigation.

      We agree that fully capturing the more nuanced behavioral effects you mention (e.g. RT delta plots and conditional accuracy functions) is a worthwhile goal for future research—see our response to Reviewer #2 for a more detailed discussion. ----------

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The phrase in the Abstract "convolutional neural network models of visual processing and traditional EAMs are jointly fitted" made me initially believe that the two models were fitted independently. You may want to re-word to clarify.

      We think that the phrase “jointly fitted” already makes it clear that both the CNN and EAM parameters are estimated simultaneously, in agreement with how this term is usually used. But we have nonetheless appended some additional clarifying language to that sentence (“in a unified Bayesian framework”).

      (2) Lines 27-28: EAMs "are the most successful and widely-used computational models of decision-making." This is only true for the specific type of decision-making examined here, namely joint modeling of choice and response times. Signal detection theory is arguably more widely-used when response times are not modeled.

      Thanks for pointing this out - we have revised the referenced sentence accordingly.

      (3) Could the authors clarify what is plotted in Figure 2F?

      Fig. 2F shows the drift rates for the target, flanker, and “other” (non-target/non-flanker) accumulators averaged over trials and models for congruent vs. incongruent trials. In case this was a source of confusion, we do not show the value of the flanker drift rates on congruent trials because the flanker and target accumulators are identical (i.e. the flanker/congruent drift rates are equivalent to the target/congruent drift rates).

      (4) Lines 214-7: "The observation that single-unit information for target direction decreased between the fourth and final convolutional layers while population-level decoding remained high is especially noteworthy in that it implies a transition from representing target direction with specialized "target neurons" to a more distributed, ensemble-level code." Can the authors clarify why this is the only reasonable explanation for these results? It seems like many other explanations could be construed.

      We have added additional clarification to this section and now use more tentative language:

      “The observation that single-unit information for target direction decreased between the fourth and final convolutional layers indicates that the units become progressively less selective for particular target directions. Since population-level decoding remained high in these layers, this suggests a transition from representing target direction with specialized "target neurons" to a more distributed, ensemble-level code.”

      (5) Lines 372-376: "Thus, simply training the model to perform the task is not sufficient to reproduce a behavioral phenomenon widely-observed in conflict tasks. This challenges a core (but often implicit) assumption of the task-optimized training paradigm, namely that to do a task well, a training model will result in model representations that are similar to those employed by humans." While I agree with the general sentiment, I feel that its application here is strange. Unless I'm missing something, in the context of the preceding sentence, the authors seem to be saying that researchers in the field expect that CNNs can produce a behavioral phenomenon (RTs) that is completely outside of their design and training. I don't think that anyone actually expects that.

      We moved the discussion/analyses of RTs to the next paragraph. It should now be clear that this statement refers specifically to the absence of an accuracy congruency effect in the task-optimized models.

      (6) Lines 387-389: "As a result, the VAMs may learn richer representations of the stimuli, since a variety of stimulus features-layout, stimulus position, flanker direction-influence behavior (Figure 2)." That is certainly true of tasks like this one where an optimal model would only focus on a tiny part of the image, whereas humans are distracted by many features. I'm not sure that this distractibility is the same as "richer representations". When CNNs classify images based on the background, would the authors claim that they have richer representations than humans?

      We agree that “richer” may not be the best way to characterize these representations, and have changed it to “more complex”.

      (7) Is it possible that drift rate d_k for each response happens to be negative on a given trial? If so, how is the decision given on such trials (since presumably none of the accumulators will ever reach the boundary)?

      It is indeed possible for all of the drift rates to be negative, though we found that this occurred for a vanishingly small number of trials (mean ± s.e.m. percent trials/model: 0.080 ± 0.011%, n = 75 models), as reported in the Methods. These trials were excluded from analyses.

      (8)  Can the authors comment on how they chose the CNN architecture and whether they expect that different architectures will produce similar results?

      Before establishing the seven-layer CNN architecture used throughout the paper, we conducted some preliminary experiments using other architectures that differed primarily in the number of CNN layers. We found that models with significantly fewer than seven layers typically failed to reach human-level accuracy on the task while larger models achieved human-level accuracy but (unsurprisingly) took longer to train.

      Reviewer #3 (Recommendations For The Authors):

      - In the introduction to this paper (particularly the paragraph beginning in line 33), the authors note that EAMs have typically been used in simplified settings and that they do not provide a means to account for how people extract information from naturalistic stimuli. While I agree with this, the idea of connecting CNNs of visual processing with EAMs for a joint modeling framework has been done. I recommend looking at and referencing these two articles as well as adjusting the tenor of this part of an introduction to better reflect the current state of the literature. For full disclosure, I am one of the authors on these articles. https://link.springer.com/article/10.1007/s42113-019-00042-1 https://www.sciencedirect.com/science/article/abs/pii/S0010027721001323

      We agree—thanks for pointing this out. The revised Introduction now discusses prior related models in more detail (including those referenced above) and better clarifies the novel contributions of our model. We specifically highlight that a novel contribution of the VAM is that “the CNN and EAM parameters are jointly fitted to the RT, choice, and visual stimulus data from individual participants in a unified Bayesian framework.”

      - The statement in lines 56-58 implies that this is the first article to glue CNNs together with EAMs. I would edit this accordingly based on the prior comment here and references provided. I will note that the second feature of the approach in this paper is still novel and really nice, namely the fact that the CNN and the EAM are jointly fitted. In the aforementioned references, the CNN is trained on the image set, and individual level Bayesian estimation was only applied to the EAM. Thus, it may be useful to highlight the joint estimation aspect of this investigation as well as how the uniqueness of the data available makes it possible.

      Agreed—see above.

      - Figure 3c and associated text. I understand the MI analysis you are performing here, however it is difficult to interpret as it stands. In the figure, what does a MI of 0.1 mean?? Can you give some context to that scale? I do find the interpretation of the hunchback shape in lines 210-222 to be somewhat of a stretch. The discussion that precedes (lines 199-209) this is clear and convincing. Can this discussion be strengthened more? And more interpretability of Figure 3c would be helpful; entropic scales can be hard to interpret without some context or scale associated.

      The MI analyses in Fig. 3C (and also Figs. 4C and 6E) show normalized MI, in which the raw MI has been divided by the entropy of the stimulus feature distribution. This normalization facilitates comparing the MI for different stimulus features, which is relevant for Figs. 4C and 6E. The normalized MI has a possible range of [0, 1], where 1 indicates perfect correlation between the two variables and 0 indicates complete independence. We now note in the legend of these figures that the possible normalized MI range is [0, 1], which should help with interpreting these values. Our revised results section for Fig. 3C now also includes some additional remarks on our interpretation of the hunchback shape of the MI.

      - Lines 244-248 and the analyses in Figure 3 suggest a change in the behavior of the CNN around layer 4. This is just a musing, but what would happen if you just used a 4 layer CNN, or even a 3 layer? This is not just a methods question. Your analysis suggests a transition from localized to distributed information representation. Right now, the EAM only sees the output of the distributed representation. What if it saw the results the more local representations from early layers? Of course, a shallower network may just form the distributed representations earlier, but it would interesting if there were a way to tease out not just the presence of distributed vs local representations, but the utility of those to the EAM.

      Thanks for this interesting suggestion. We did do some preliminary experiments in models with fewer layers, though we only examined the outputs of these models and did not assess their representations. We found that models with 3–5 layers generally failed to achieve human-level accuracy on the task. In principle, one could relate this observation to the representations of these models as a means of assessing the relative utility of distributed/local representations. However, there are confounding factors that one would ideally control for in order to compare models with different numbers of layers in this fashion (namely, the number of parameters).

      - Section Line 359 (Task optimized models) - It would be helpful to clarify here what these task-optimized models are being trained to do. As I understand it, they are being trained to directly predict the target direction. But are you asking them to learn to predict the true target direction? Or are you training them to predict what each individual responds? I think it is the second (since you have 75 of these), but it's not clear. I looked at the methods and still couldn't get a clear description of this. Also, are you just stripping the LBA off of the end of the CNN and then essentially putting a softmax in its place? If so, it would be helpful to say so.

      The task-optimized models were actually trained to output the true target direction in each stimulus, rather than trained to match the decisions of the human participants. We trained 75 such models since we wanted to use exactly the same stimuli as were used to train each VAM. The task-optimized CNNs were identical to those used in the VAMs, except that the outputs of the last layer were converted to softmax-scored probabilities for each direction rather than drift rates. The Results and Methods section now included additional commentary that clarifies these points.

      - Line 373-376: This statement is pretty well established at this point in the similarity judgement literature. I recommend looking at and referencing https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.13226 https://www.nature.com/articles/s41562-020-00951-3 https://link.springer.com/article/10.1007/s42113-020-00073-z

      Thanks for pointing this out. For reference, the statement in question is “Thus, simply training the model to perform the task is not sufficient to reproduce a behavioral phenomenon widely-observed in conflict tasks. This challenges a core (but often implicit) assumption of the task-optimized training paradigm, namely that training a model to do a task well will result in model representations that are similar to those employed by humans.”

      We agree that the first and third reference you mention are relevant, and we now cite them along with some other relevant work. In our view, the second reference you mention is not particularly relevant (that paper introduces a new computational model for similarity judgements that is fit to human data, but does not comment on training models to perform tasks vs. fitting to human data).

      - Line 387-388: "VAMs may learn richer representations". This is a bit of a philosophical point, but I'll go ahead and mention it. The standard VAM does not necessarily learn "richer" feature representations. Rather, you are asking the VAM and task-optimized models to do different things. As a result, they learn different representations. "Better" or "richer" is in the eye of the beholder. In one view, you could view the VAM performance as sub-par since it exhibits strange artifacts (congruency effects) and the expansion of dimensionality in the VAM representations is merely a side-effect of poor performance. I'm not advocating this view, just playing devils advocate and suggesting a more nuanced discussion of the difference between the VAM and task-optimized models.

      We agree—this is a great point. We have changed this statement to read “the VAMs may learn more complex [rather than richer] representations of the stimuli”.

      - Lines 567-570: Here you discuss how the LBA backend of the VAM can't account for shrinking spotlight-like RT effects but that fitting models to different RT quantiles helps overcome this. I find this to be one of the weakest points of the paper (the whole process of fitting RT quantiles separately to begin with). This is just a limitation of the RT component of the model. This is a great paper but this is just a limitation inherent in the model. I don't see a need to qualify this limitation and think it would be better to just point out that this is a limitation of the LBA itself (be more clear that it is the LBA that is the limiting factor here) and that this leaves room for future research. From your last sentence of this paragraph, I agree that recurrent CNNs would be interesting. I will note that RNN choice-RT models are out there (though not with CNNs as part of the model).

      We agree and have revised this section of the Discussion accordingly (see our response to Reviewer #2 for more detail). We also removed the analyses of models trained on separate RT quantiles.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      The study presents a potentially valuable approach to genetically modify cells to produce extracellular matrices with altered compositions, termed cell-laid, engineered extracellular matrices (eECM). The evidence supporting the authors' conclusions regarding the utility of eECM for endogenous repair is solid, although there are some disagreements on the chondrogenicity of lyophilized constructs which was viewed as lacking robust evidence for endochondral ossification.

      We thank the reviewers for the assessment of our work. We however strongly contest the lack of evidence for chondrogenicity and endochondral ossification. This is robustly demonstrated and a clear strength of our study.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to modify the characteristics of the extracellular matrix (ECM) produced by immortalized mesenchymal stem cells (MSCs) by employing the CRISPR/Cas9 system to knock out specific genes. Initially, they established VEGF-KO cell lines, demonstrating that these cells retained chondrogenic and angiogenic properties. Additionally, lyophilized carriage tissues produced by these cells exhibited retained osteogenic properties.

      Subsequently, the authors established RUNX2-KO cell lines, which exhibited reduced COLX expression during chondrogenic differentiation and notably diminished osteogenic properties in vitro. Transplantation of lyophilized carriage tissues produced by RUNX2-KO cell lines into osteochondral defects in rat knee joints resulted in the regeneration of articular cartilage tissues as well as bone tissues, a phenomenon not observed with tissues derived from parental cells. This suggests that gene-edited MSCs represent a valuable cell source for producing ECM with enhanced quality.

      Strengths:

      The enhanced cartilage regeneration observed with ECM derived from RUNX2-KO cells supports the authors' strategy of creating gene-edited MSCs capable of producing ECM with superior quality. Immortalized cell lines offer a limitless source of off-the-shelf material for tissue regeneration.

      Weaknesses:

      Most of the data align with anticipated outcomes, offering limited novelty to advance scientific understanding. Methodologically, the chondrogenic differentiation properties of immortalized MSCs appeared deficient, evidenced by Safranin-O staining of 3D tissues and histological findings lacking robust evidence for endochondral differentiation. This presents a critical limitation, particularly as authors propose the implantation of cartilage tissues for in vivo experiments. Instead, the bulk of data stemmed from type I collagen scaffold with factors produced by MSCs stimulated by TGFβ.

      We thank the reviewer for the thorough evaluation. We appreciate the highlighted novelty but overall disagree with key points from the provided assessment. The most important one being non the contested in vitro cartilage and endochondral ossification by engineered ECMs, for which we have provided compelling evidence. Of note, the reviewer points the “osteogenic” properties of our tissues; the wording is incorrect since cells are absent from the final grafts. Here, the term ”osteoinductivity” should be employed, in line with the model of ectopic ossification used to demonstrate de novo bone formation.

      In the revised version, the authors presented Safranin-O staining results of pellets prior to lyophilization. The inset of figures showing entire pellets revealed that Safranin-O-positive areas were limited, suggesting that cells in the negative regions had not differentiated into chondrocytes. In Figure 3F, DAPI staining showed devitalized cells in the outer layer but was negative in the central part, indicating the absence of cells in these areas and incomplete differentiation induction.

      We strongly disagree with the reviewer on the lack of demonstrated chondrogenicity. We have provided evidence of Safranin-O positivity, GAGs quantification, as well as collagen type 2 and collagen type X stainings (also quantified). Frankly, those are gold standard assays in the field and we do not understand the reviewer point of view. We however agree that our grafts are not entirely composed of cartilage matrix. There are areas where cartilage is absent, in particular in the core of the tissues. This is expected from in vitro engineered cartilage pellets even from primary BM-MSCs donors. By selecting primary donors it is possible to obtain a superior cartilage formation. Our MSOD-B cells remain to-the-best-of-our -knowledge, the only human line capable of in vitro chondrogenesis, even if considered moderate.

      We agree with the absence of cells in the core area of our tissues, as correctly pointed out by the reviewer. This has been reported in other studies whereby the lack of media diffusion can lead to necrotic core formation.

      The rationale for establishing VEGF-KO cell lines remains unclear, and the authors' explanation in the revised manuscript is still equivocal. While they mention that VEGF is a late marker for endochondral ossification, the data in Figures 1D and 1E clearly show that VEGF-KO affects the early phase of endochondral ossification.

      We feel that the rationale for a VEGF-KO is sufficiently conveyed. In our study, VEGF-KO affects GAGs content in the tissue, but not the efficiency of ossification.

      Insufficient depth was given to elucidate the disparity in osteogenic properties between those observed in ectopic bone formation and those observed in transplantation into osteochondral defects.

      We here agree with the reviewer on the limited depth of our osteochondral assessment. However, this was performed as a proof-of-concept and we clearly conveyed both limitations and need of a follow-up study to demonstrate the repair efficacy of our tissue in such defect context.

      In the ectopic bone formation study, most of the collagenous matrix observed at 2 weeks was resorbed by 6 weeks, with only a small amount contributing to bone formation in MSOD-B cells (Figs. 2I and 4C). This finding does not align with the micro-CT data presented in Figures 2H and 4B. For the micro-CT experiments, it would be more appropriate to use a standard window for bone and present the data accordingly.

      Stainings report the deposition of collagens and may be misleading as not only indicating frank bone formation. This is the reason why we provided microCT data, offering a quantitative assessment of the full grafts and more reliably evaluating mineralized/bone tissue. We feel that our results matched our conclusions.

      While the regeneration of articular cartilage in RUNX2-KO ECM presents intriguing results, the study lacked an exploration into underlying mechanisms, such as histological analyses at earlier time points.

      We do agree with the reviewer regarding this limitation. In addition to mechanisms and early timepoints, we are also interested in longer in vivo evaluation. This represents a significant amount of work which is beyond the scope of our present manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors have started off using an immortalized human cell line and then gene edited it to decrease the levels of VEGF1 (in order to influence vascularization), and the levels of Runx2 (to decrease osteogenesis). They first transplanted these cells with a collagen scaffold. The modified cells showed a decrease in vascularization when VEGF1 was decreased, and suggested an increase in cartilage formation.

      In another study, matrix generated by these cells subsequently remodeled into a bone marrow organ. When RUNX2 was decreased, the cells did not mineralize in vitro, and their matrices expressed types I and II collagen but not type X collagen in vitro, in comparison with unedited cells. In vivo, the author claims that remodeling of the matrices into bone was somewhat inhibited. Lastly, they utilized matrices generated by RUNX2-edited cells to regenerate chondro-osteal defects. They suggest that the edited cells regenerated cartilage in comparison with unedited cells.

      Strengths:

      - The notion that inducing changes in the ECM by genetically editing the cells is a novel one, as it has long been thought that ECM composition influences cell activity.

      - If successful, it may be possible to make off the shelf ECMS to carry out different types of tissue repair.

      Weaknesses:

      - The authors have not demonstrated robust cartilage formation (quantitation would be useful).

      - Measuring total GAG content does not prove the presence of cartilage

      - There are numerous overstatements about forming and implanting cartilage.

      - Although it is implied, RUNX2 deletion did not improve cartilage formation by the modified cells.

      - In the control line, MSOD-B there were variability in the amount of safranin O positive material in various histological panels in the figures.; more quantitation is needed.

      - In the in vivo articular defect experiments, an untreated injured joint is needed as a negative control.

      - Statements about bone generation are often not reflective of the microCT data presented.<br /> - The discussion over-interprets the results.

      We thank the reviewer for the further assessment of our work. We respectfully disagree with most of the provided statements. The chondrogenicity of our graft is robustly demonstrated using multiple readouts, including quantitative ones. Beyond GAGs, we provided clear Safranin-O stainings, as well as collagen type 2 and X indicating presence of hypertrophic cartilage matrix. Those are the gold standards in the field and we thus do not understand the reviewer scepticism. We do agree that our grafts are fully composed of cartilage matrix, with areas (in the core) deprived of cartilage. This does not impact the core findings of our study and its conclusions, and we strongly feel our statements about forming in vitro cartilage fully stand.

      We do not claim in the manuscript an increased cartilage formation following RUNX2 deletion. We report in vitro an impaired hypertrophy (collagen type X) and maintenance of collagen type 2 and GAGs content.

      We are confident on our data regarding de novo bone formation bi priming endochondral ossification, confirmed both by stainings and microCT. We feel that our claims are well-supported.


      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors aimed to modify the characteristics of the extracellular matrix (ECM) produced by immortalized mesenchymal stem cells (MSCs) by employing the CRISPR/Cas9 system to knock out specific genes. Initially, they established VEGF-KO cell lines, demonstrating that these cells retained chondrogenic and angiogenic properties. Additionally, lyophilized carriage tissues produced by these cells exhibited retained osteogenic properties. 

      Subsequently, the authors established RUNX2-KO cell lines, which exhibited reduced COLX expression during chondrogenic differentiation and notably diminished osteogenic properties in vitro. Transplantation of lyophilized carriage tissues produced by RUNX2-KO cell lines into osteochondral defects in rat knee joints resulted in the regeneration of articular cartilage tissues as well as bone tissues, a phenomenon not observed with tissues derived from parental cells. This suggests that gene-edited MSCs represent a valuable cell source for producing ECM with enhanced quality. 

      Strengths: 

      The enhanced cartilage regeneration observed with ECM derived from RUNX2-KO cells supports the authors' strategy of creating gene-edited MSCs capable of producing ECM with superior quality. Immortalized cell lines offer a limitless source of off-the-shelf material for tissue regeneration. 

      We thank the reviewer for the interest in our work. We however want to clarify that the present manuscript does not report the generation of ECM with “superior quality”, but rather of modulated composition and thus function.  

      Weaknesses: 

      Most data align with anticipated outcomes, offering limited novelty to advance scientific understanding. Methodologically, the chondrogenic differentiation properties of immortalized MSCs appeared deficient, evidenced by Safranin-O staining of 3D tissues and histological findings lacking robust evidence for endochondral differentiation. This presents a critical limitation, particularly as authors propose the implantation of cartilage tissues for in vivo experiments. Instead, the bulk of data stemmed from type I collagen scaffold with factors produced by MSCs stimulated by TGFβ. 

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B cells. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage grafts of similar quality than the MSOD-B counterpart. Of note, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We now provide additional stainings of generated tissues pre-lyophilization. This is implemented in Figure 1D, Figure 3D.

      The rationale behind establishing VEGF-KO cell lines remains unclear. What specific outcomes did the authors anticipate from this modification? 

      VEGF is a known master regulator of angiogenesis and a key mediator of endochondral ossification. It has also been extensively used in bone tissue engineering studies as a supplemented factor – primarily in the form of VEGFα – to increase the vascularization and thus outcome of bone formation of engineered grafts (https://www.nature.com/articles/s42003-020-01606-9, https://www.sciencedirect.com/science/article/pii/S8756328216301752). In our study, it was thus identified as a natural candidate to demonstrate the possibility to generate VEGF-KO cartilage and subsequently assess the functional impact on both the angiogenic and osteogenic potential of resulting cartilage tissue. This is now clarified in the manuscript (page 3, paragraph 4).

      Insufficient depth was given to elucidate the disparity in osteogenic properties between those observed in ectopic bone formation and those observed in transplantation into osteochondral defects. While the regeneration of articular cartilage in RUNX2-KO ECM presents intriguing results, the study lacked an exploration into underlying mechanisms, such as histological analyses at earlier time points. 

      Using RUNX2-KO ECM, we aimed at demonstrating the impact on cartilage remodeling and bone formation. This was performed ectopically but also in the rat osteochondral defect as a regenerative set-up of higher clinical relevance. We agree with the reviewer that additional experimental groups and time-points (not only earlier but also longer ones) would offer a better mechanistic understanding of the ECM contribution to the joint repair. However, as stated in our manuscript this is a proof-of-concept study that successfully demonstrated the influence of the cartilage ECM modification on the in vivo skeletal regeneration. A follow-up study would need to be performed to complement existing evidence and strengthen the relevance of our approach for cartilage repair. This is now further emphasized in the discussion (page 11, paragraph 3).  

      Reviewer #2 (Public Review): 

      The manuscript submitted by Sujeethkumar et al. describes an alternative approach to skeletal tissue repair using extracellular matrix (ECM) deposited by genetically modified mesenchymal stromal/stem cells. Here, they generate a loss of function mutations in VEGF or RUNX2 in a BMP2overexpressing MSC line and define the differences in the resulting tissue-engineered constructs following seeding onto a type I collagen matrix in vitro, and following lyophilization and subcutaneous and orthotopic implantation into mice and rats. Some strengths of this manuscript are the establishment of a platform by which modifications in cell-derived ECM can be evaluated both in vitro and in vivo, the demonstration that genetic modification of cells results in complexity of in vitro cell-derived ECM that elicits quantifiable results, and the admirable goal to improve endogenous cartilage repair. However, I recommend the authors clarify their conclusions and add more information regarding reproducibility, which was one limitation of primary-cell-derived ECMs. 

      We thank the reviewer for the positive evaluation of our work.  

      Overcoming the limitations of native/autologous/allogeneic ECMs such as complete decellularization and reduction of batch-to-batch variability was not specifically addressed in the data provided herein. For the maintenance of ECM organization and complexity following lyophilization, evidence of complete decellularization was not addressed, but could be easily evaluated using polarized light microscopy and quantification of human DNA for example in constructs pre and post-lyophilization. 

      We appreciate the reviewer comments and acknowledge the lack of information in the first version of our manuscript. In line with our previous study (Pigeot et al., Advanced Materials 2021), the ectopic evaluation of our cartilage pellets was strictly done with lyophilized tissues using immunocompromised animals. Lyophilized tissues are thus considered devitalized, and not decellularized. Instead, the osteochondral defect experiment was performed with decellularized tissues in order to be able to implant the grafts in the rat immuno-competent model. This is now specified consistently throughout the manuscript. The decellularization process is also now incorporated accordingly in the method section (page 14, paragraph 2). We also provide quantifications of GAGs and DNAs from tissue pre- and post-decellularization (Supplementary figure 6A and 6B), described in the result section of the manuscript (page 9, paragraph 1). The decellularization step led to 97-98% of DNA removal.

      Importantly, we do not claim full maintenance of ECM integrity following lyophilization nor decellularization.  This is now clarified in the discussion (page 12, paragraph 2). However, we report their capacity to instruct skeletal regeneration in multiple contexts despite extensive processing.

      It would be ideal to see minimization of batch-to-batch variability using this approach, as mitigation of using a sole cell line is likely not sufficient (considering that the sole cell line-derived Matrigel does exhibit batch-to-batch and manufacturer-to-manufacturer variability). I recommend adding details regarding experimental design and outcomes not initially considered. Inter- and intraexperimental reproducibility was not adequately addressed. The size of in vitro-derived cartilage pellets was not quantified, and it is not clear that more than one independent 'differentiation' was performed from each gene-edited MSC line to generate in vitro replicates and constructs that were implanted in vivo. 

      We thank the Reviewer for the comment on variability/reproducibility concern. Using a cell line does confer higher robustness but indeed does not grant unlimited consistency of batch production. We now temper our claims in the discussion and mention the need to regularly recharacterize cell lines properties upon passages (page 12, paragraph 2). Using our edited lines, we have generated multiple batches of cartilage grafts for their in vitro characterization or in vivo performance assessment. We have now compiled batch variations of GAG content and pellet volume, provided as Supplementary figure 5. This revealed that batches are indeed not identical (nor each pellets), but the production remains consistent.

      The use of descriptive language in describing conclusions may mislead the reader and should be modified accordingly throughout the manuscript. For example, although this reviewer agrees with the comparative statements made by the authors regarding parental and gene-edited MSC lines, non-quantifiable terms such as 'frank' 'superior' (example, line 242) are inappropriate and should rather be discussed in terms of significance. Another example is 'rich-collagenous matrix,' which was not substantiated by uniform immunostaining for type II collagen (line 189). 

      We thank the Reviewer for the constructive suggestions. We have revised the language accordingly throughout the manuscript. 

      I have similar recommendations regarding conclusive statements from the rat implantation model, which was appropriately used for the purpose of evaluating the response of native skeletal cells to the different cell-derived ECMs. Interpretations of these results should be described with more accuracy. For example, increased TRAP staining does not indicate reduced active bone formation (line 237). Many would not conclude that GAGs were retained in the RUNX2-KO line graft subchondral region based on the histology. Quantification of % chondral regeneration using histology is not accurate as it is greatly influenced by the location in the defect from which the section was taken. Chondral regeneration is usually semi-quantified from gross observations of the cartilage surface immediately following excision. The statements regarding integration (example line 290) are not founded by histological evidence, which should show high magnification of the periphery of the graft adjacent to the native tissue. 

      We have revised our language relative to the TRAP staining description (page 9, paragraph 2). We also agree with the reviewer on the semi-quantitative approach of our methodology,  which we transparently disclosed both in the main text (page 9, paragraph 3) and method section (page 18, paragraph 2). The sectioning location does influence the analysis, but to prevent this we performed an assessment at different depth (top, middle, bottom for each sample). This is now implemented in our method section (page 18, paragraph 3). On the tissue integration, we now provide higher magnification images of the implant/host tissue area (Figure 5F).

      Reviewer #3 (Public Review): 

      Summary: 

      In this study, the authors have started off using an immortalized human cell line and then geneedited it to decrease the levels of VEGF1 (in order to influence vascularization), and the levels of Runx2 (to decrease chondro/osteogenesis). They first transplanted these cells with a collagen scaffold. The modified cells showed a decrease in vascularization when VEGF1 was decreased, and suggested an increase in cartilage formation. 

      In another study, the matrix generated by these cells was subsequently remodeled into a bone marrow organ. When RUNX2 was decreased, the cells did not mineralize in vitro, and their matrices expressed types I and II collagen but not type X collagen in vitro, in comparison with unedited cells. In vivo, the author claims that remodeling of the matrices into bone was somewhat inhibited. Lastly, they utilized matrices generated by RUNX2 edited cells to regenerate chondro-osteal defects. They suggest that the edited cells regenerated cartilage in comparison with unedited cells. 

      Strengths: 

      - The notion that inducing changes in the ECM by genetically editing the cells is a novel one, as it has long been thought that ECM composition influences cell activity. 

      - If successful, it may be possible to make off-the-shelf ECMS to carry out different types of tissue repair. 

      We thank the Reviewer for the critical evaluation of our work and the highlighted novelty of it.  

      Weaknesses: 

      - The authors have not generated histologically identifiable cartilage or bone in their transplants of the cells with a type I scaffold. 

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage tissue of similar quality than the MSOD-B. Of note, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We now provide here additional stainings of generated tissues pre-lyophilization. This is implemented in Figure 1D and Figure 3D.

      On the contested formation of bone in vivo by our ECMs grafts, we have provided compelling qualitative evidence via Masson´s Trichrome stainings and quantification of mineralized volume by µCT. Both cortical bone and trabecular structures were identified ectopically. Those are standard evaluation methods in the field, we would be happy to receive additional suggestions by the Reviewer. 

      - In many cases, they did not generate histologically identifiable cartilage with their cell-free-edited scaffold. They did generate small amounts of bone but this is most likely due to BMPs that were synthesized by the cells and trapped in the matrix. 

      We now appreciate that the Reviewer agrees on the successful formation of bone induced by our engineered grafts. We however still respectfully disagree with the “small amount of bone” statement since our MSOD-B and MSOD-B VEGF KO cartilage grafts led to the full generation of a mature ectopic bone organ (that is, also composed of extensive marrow). This has been assessed qualitatively and quantitatively. 

      We agree with the Reviewer on the key role of BMP-2 in the remodeling process into bone and bone marrow, which we have extensively described in our previous publication (Pigeot et al., Advanced Materials 2021). However, the low amount of BMP-2 (in the dozens of nanogram/tissue range) embedded in the matrix is not sufficient per se to induce ectopic endochondral ossification. It is the combined presence of GAGs in the matrix -thus cartilage- that allows the success of bone formation.  

      - There is a great deal of missing detail in the manuscript. 

      We have incorporated additional methodological details describing the lyophilization/decellularization process of our tissues prior to evaluation (see Material and Methods section). We also have included a description of the MSOD-B line and implemented genetic elements (Supplementary Figure 1A).  

      - The in vivo study is underpowered, the results are not well documented pictorially, and are not convincing. 

      We believe our group size supports our conclusions confirmed by statistical assessment. We have provided additional stainings and images of higher magnifications (Figure 5) for both the ectopic and orthotopic in vivo evaluation.  

      - Given the fact that they have genetically modified cells, they could have done analyses of ECM components to determine what was different between the lines, both at the transcriptome and the protein level. Consequently, the study is purely descriptive and does not provide any mechanistic understanding of what mixture of matrix components and growth factors works best for cartilage or bone. But this presupposes that they actually induced the formation of bona fide cartilage, at least. 

      We thank the Reviewer for the suggestion. However, our study did not aim at understanding what ECM graft composition work best for cartilage nor bone regeneration respectively. Instead, we propose the exploitation of our cellular tools to interrogate the function of key ECM constituents and their impact in skeletal regeneration. We once more confirm that we generated cartilage grafts which is now better supported by additional histological assessment before lyophilization.

    2. eLife Assessment

      The study presents a potentially valuable approach to genetically modify cells to produce extracellular matrices with altered compositions, termed cell-laid, engineered extracellular matrices (eECM). The evidence supporting the authors' conclusions regarding the utility of eECM for endogenous repair is solid, although there are some disagreements on the chondrogenicity of lyophilized constructs which was viewed as lacking robust evidence for endochondral ossification.

    3. Reviewer #1 (Public review):

      Summary:

      The authors aimed to modify the characteristics of the extracellular matrix (ECM) produced by immortalized mesenchymal stem cells (MSCs) by employing the CRISPR/Cas9 system to knock out specific genes. Initially, they established VEGF-KO cell lines, demonstrating that these cells retained chondrogenic and angiogenic properties. Additionally, lyophilized carriage tissues produced by these cells exhibited retained osteogenic properties.

      Subsequently, the authors established RUNX2-KO cell lines, which exhibited reduced COLX expression during chondrogenic differentiation and notably diminished osteogenic properties in vitro. Transplantation of lyophilized carriage tissues produced by RUNX2-KO cell lines into osteochondral defects in rat knee joints resulted in the regeneration of articular cartilage tissues as well as bone tissues, a phenomenon not observed with tissues derived from parental cells. This suggests that gene-edited MSCs represent a valuable cell source for producing ECM with enhanced quality.

      Strengths:

      The enhanced cartilage regeneration observed with ECM derived from RUNX2-KO cells supports the authors' strategy of creating gene-edited MSCs capable of producing ECM with superior quality. Immortalized cell lines offer a limitless source of off-the-shelf material for tissue regeneration.

      Weaknesses:

      Most of the data align with anticipated outcomes, offering limited novelty to advance scientific understanding. Methodologically, the chondrogenic differentiation properties of immortalized MSCs appeared deficient, evidenced by Safranin-O staining of 3D tissues and histological findings lacking robust evidence for endochondral differentiation. This presents a critical limitation, particularly as authors propose the implantation of cartilage tissues for in vivo experiments. Instead, the bulk of data stemmed from type I collagen scaffold with factors produced by MSCs stimulated by TGFβ.

      In the revised version, the authors presented Safranin-O staining results of pellets prior to lyophilization. The inset of figures showing entire pellets revealed that Safranin-O-positive areas were limited, suggesting that cells in the negative regions had not differentiated into chondrocytes. In Figure 3F, DAPI staining showed devitalized cells in the outer layer but was negative in the central part, indicating the absence of cells in these areas and incomplete differentiation induction.

      The rationale for establishing VEGF-KO cell lines remains unclear, and the authors' explanation in the revised manuscript is still equivocal. While they mention that VEGF is a late marker for endochondral ossification, the data in Figures 1D and 1E clearly show that VEGF-KO affects the early phase of endochondral ossification.

      Insufficient depth was given to elucidate the disparity in osteogenic properties between those observed in ectopic bone formation and those observed in transplantation into osteochondral defects.

      In the ectopic bone formation study, most of the collagenous matrix observed at 2 weeks was resorbed by 6 weeks, with only a small amount contributing to bone formation in MSOD-B cells (Figs. 2I and 4C). This finding does not align with the micro-CT data presented in Figures 2H and 4B. For the micro-CT experiments, it would be more appropriate to use a standard window for bone and present the data accordingly.

      While the regeneration of articular cartilage in RUNX2-KO ECM presents intriguing results, the study lacked an exploration into underlying mechanisms, such as histological analyses at earlier time points.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors have started off using an immortalized human cell line and then gene edited it to decrease the levels of VEGF1 (in order to influence vascularization), and the levels of Runx2 (to decrease osteogenesis). They first transplanted these cells with a collagen scaffold. The modified cells showed a decrease in vascularization when VEGF1 was decreased, and suggested an increase in cartilage formation.

      In another study, matrix generated by these cells subsequently remodeled into a bone marrow organ. When RUNX2 was decreased, the cells did not mineralize in vitro, and their matrices expressed types I and II collagen but not type X collagen in vitro, in comparison with unedited cells. In vivo, the author claims that remodeling of the matrices into bone was somewhat inhibited. Lastly, they utilized matrices generated by RUNX2-edited cells to regenerate chondro-osteal defects. They suggest that the edited cells regenerated cartilage in comparison with unedited cells.

      Strengths:

      - The notion that inducing changes in the ECM by genetically editing the cells is a novel one, as it has long been thought that ECM composition influences cell activity.<br /> - If successful, it may be possible to make off the shelf ECMS to carry out different types of tissue repair.

      Weaknesses:

      - The authors have not demonstrated robust cartilage formation (quantitation would be useful).<br /> - Measuring total GAG content does not prove the presence of cartilage<br /> - There are numerous overstatements about forming and implanting cartilage.<br /> - Although it is implied, RUNX2 deletion did not improve cartilage formation by the modified cells.<br /> - In the control line, MSOD-B there were variability in the amount of safranin O positive material in various histological panels in the figures.; more quantitation is needed.<br /> - In the in vivo articular defect experiments, an untreated injured joint is needed as a negative control.<br /> - Statements about bone generation are often not reflective of the microCT data presented.<br /> - The discussion over-interprets the results.

    1. eLife Assessment

      This important study provides solid evidence to support the anti-tumor potential of citalopram, originally an anti-depression drug, in hepatocellular carcinoma (HCC). In addition to their previous report on directly targeting tumor cells via glucose transporter 1 (GLUT1), they tried to uncover additional working mechanisms of citalopram in HCC treatment in the current study. The data here suggested that citalopram may regulate the phagocytotic function of TAM via C5aR1 or CD8+T cell function to suppress HCC growth in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      In their previous publication (Dong et al. Cell Reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptor of citalopram in the previous report, the authors focused on exploring the potential of the immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their in vitro data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1. Next, they tried to investigate the direct link between citalopram and CD8+T cells by including an additional MASH-associated HCC mouse model. Their data suggest that citalopram may upregulate the glycolytic metabolism of CD8+T cells, probability via GLUT3 but not GLUT1-mediated glucose uptake. Lastly, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against a tumor. Although the data is informative, the rationale for working on additional mechanisms and logical links among different parts is not clear. In addition, some of the conclusion is also not fully supported by the current data.

      Strengths:

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed an immune regulatory role on TAM via a new target C5aR1 in HCC.

      Weaknesses:

      (1) The authors concluded that citalopram had a 'potential immune-dependent effect' based on the tumor weight difference between Rag-/- and C57 mice in Figure 1. However, tumor weight differences may also be attributed to a non-immune regulatory pathway. In addition, how do the authors calculate relative tumor weight? What is the rationale for using relative one but not absolute tumor weight to reflect the anti-tumor effect?

      (2) The authors used shSlc6a4 tumor cell lines to demonstrate that citalopram's effects are independent of the conventional SERT receptor (Figure 1C-F). However, this does not entirely exclude the possibility that SERT may still play a role in this context, as it can be expressed in other cells within the tumor microenvironment. What is the expression profiling of Slc6a4 in the HCC tumor microenvironment? In addition, in Figure 1F, the tumor growth of shSlc6a4 in C57 mice displayed a decreased trend, suggesting a possible role of Slc6a4.

      (3) Why did the authors choose to study phagocytosis in Figures 3G-H? As an important player, TAM regulates tumor growth via various mechanisms.

      (4) The information on unchanged deposition of C5a has been mentioned in this manuscript (Figures 3D and 3F), the authors should explain further in the manuscript, for example, C5a could bind to receptors other than C5aR1 and/or C5a bind to C5aR1 by different docking anchors compared with citalopram.

      (5) Figure 3I-M - the flow cytometry data suggested that citalopram treatment altered the proportions of total TAM, M1 and M2 subsets, CD4+ and CD8+T cells, DCs, and B cells. Why does the author conclude that the enhanced phagocytosis of TAM was one of the major mechanisms of citalopram? As the overall TAM number was regulated, the contribution of phagocytosis to tumor growth may be limited.

      (6) Figure 4 - what is the rationale for using the MASH-associated HCC mouse model to study metabolic regulation in CD8+T cells? The tumor microenvironment and tumor growth would be quite different. In addition, how does this part link up with the mechanisms related to C5aR1 and TAM? The authors also brought GLUT1 back in the last part and focused on CD8+T cell metabolism, which was totally separated from previous data.

      (7) Figure 5, the authors illustrated their mechanism that citalopram regulates CD8+T cell anti-tumor immunity through proinflammatory TAM with no experimental evidence. Using only CD206 and MHCII to represent TAM subsets obviously is not sufficient.

    3. Reviewer #2 (Public review):

      Summary:

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target. However, certain aspects of experimental design and clinical relevance could be further developed to strengthen the study's impact.

      Strength:

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a thorough strategy for HCC therapy. By emphasizing the potential for existing drugs like citalopram to be repurposed, the study also underscores the feasibility of translational applications.

      Major weaknesses/suggestions:

      The dataset and signature database used for GSEA analyses are not clearly specified, limiting reproducibility. The manuscript does not fully explore the potential promiscuity of citalopram's interactions across GLUT1, C5aR1, and SERT1, which could provide a deeper understanding of binding selectivity. The absence of GLUT1 knockdown or knockout experiments in macrophages prevents a complete assessment of GLUT1's role in macrophage versus tumor cell metabolism. Furthermore, there is minimal discussion of clinical data on SSRI use in HCC patients. Incorporating survival outcomes based on SSRI treatment could strengthen the study's translational relevance.

      By addressing these limitations, the manuscript could make an even stronger contribution to the fields of cancer immunotherapy and drug repurposing.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their previous publication (Dong et al. Cell Reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptor of citalopram in the previous report, the authors focused on exploring the potential of the immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their in vitro data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1. Next, they tried to investigate the direct link between citalopram and CD8+T cells by including an additional MASH-associated HCC mouse model. Their data suggest that citalopram may upregulate the glycolytic metabolism of CD8+T cells, probability via GLUT3 but not GLUT1-mediated glucose uptake. Lastly, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against a tumor. Although the data is informative, the rationale for working on additional mechanisms and logical links among different parts is not clear. In addition, some of the conclusion is also not fully supported by the current data.

      Thanks very much for your insightful evaluation and the constructive suggestions. We have thoroughly studied the comments and a provisional point-to-point response is shown as follows.

      Strengths:

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed an immune regulatory role on TAM via a new target C5aR1 in HCC.

      Thank you for your constructive comments. We believe that further investigation into the mechanisms by which citalopram modulates TAM function could provide valuable insights into its potential role in HCC therapy.

      Weaknesses:

      (1) The authors concluded that citalopram had a 'potential immune-dependent effect' based on the tumor weight difference between Rag-/- and C57 mice in Figure 1. However, tumor weight differences may also be attributed to a non-immune regulatory pathway. In addition, how do the authors calculate relative tumor weight? What is the rationale for using relative one but not absolute tumor weight to reflect the anti-tumor effect?

      We appreciate your insights into the potential contributions of non-immune regulatory pathways to the observed tumor weight differences between Rag-/- and C57 mice, and we will further address this issue in our discussion. The relative tumor weight was calculated by assigning an arbitrary value of 1 to the Rag1<sup>-/-</sup> mice in the DMSO treatment group, with all other tumor weights expressed relative to this baseline. As suggested, we will include absolute tumor weight data in our revised manuscript.

      (2) The authors used shSlc6a4 tumor cell lines to demonstrate that citalopram's effects are independent of the conventional SERT receptor (Figure 1C-F). However, this does not entirely exclude the possibility that SERT may still play a role in this context, as it can be expressed in other cells within the tumor microenvironment. What is the expression profiling of Slc6a4 in the HCC tumor microenvironment? In addition, in Figure 1F, the tumor growth of shSlc6a4 in C57 mice displayed a decreased trend, suggesting a possible role of Slc6a4.

      To identify the expression patterns of Slc6a4 in different cellular contexts within the HCC tumor microenvironment, we will conduct a thorough screening of HCC datasets that include single-cell sequencing analysis. The possible role of Slc6a4 on tumor growth will be verified with in vitro loss-of-function experiments.

      (3) Why did the authors choose to study phagocytosis in Figures 3G-H? As an important player, TAM regulates tumor growth via various mechanisms.

      Thank you for your question. We focused on this aspect because citalopram targets C5aR1-expressing TAM. C5aR1 is a receptor for complement component C5a, and complement components play a significant role in mediating the phagocytosis process in macrophages. In the revised manuscript, we will emphasize this rationale clearly.

      (4) The information on unchanged deposition of C5a has been mentioned in this manuscript (Figures 3D and 3F), the authors should explain further in the manuscript, for example, C5a could bind to receptors other than C5aR1 and/or C5a bind to C5aR1 by different docking anchors compared with citalopram.

      Thank you for your insightful comment. First, we will investigate the docking anchors involved in the binding of C5a to C5aR1 and compare these interactions with those of C5aR1 and citalopram. Additionally, we will discuss the potential binding of C5a to other receptors, providing a broader perspective on the signaling mechanisms.

      (5) Figure 3I-M - the flow cytometry data suggested that citalopram treatment altered the proportions of total TAM, M1 and M2 subsets, CD4+ and CD8+T cells, DCs, and B cells. Why does the author conclude that the enhanced phagocytosis of TAM was one of the major mechanisms of citalopram? As the overall TAM number was regulated, the contribution of phagocytosis to tumor growth may be limited.

      As suggested, we will restate the conclusion to enhance clarity and better articulate the relationship between citalopram treatment, TAM populations, and their phagocytic activity. Thank you for your valuable input.

      (6) Figure 4 - what is the rationale for using the MASH-associated HCC mouse model to study metabolic regulation in CD8+T cells? The tumor microenvironment and tumor growth would be quite different. In addition, how does this part link up with the mechanisms related to C5aR1 and TAM? The authors also brought GLUT1 back in the last part and focused on CD8+T cell metabolism, which was totally separated from previous data.

      We chose the MASH-associated HCC mouse model because it closely mimics the etiology of metabolic-associated fatty liver disease (MAFLD), which is a significant contributor to the development of cirrhosis and HCC. The inclusion of CD8<sup>+</sup> T cells in our study is based on the understanding that citalopram targets GLUT1, which plays a crucial role in glucose uptake. CD8<sup>+</sup> T cell function is heavily reliant on glycolytic metabolism, making it essential to investigate how citalopram’s effects on GLUT1 influence the metabolic pathways and functionality of these immune cells. The data presented in this section primarily aim to demonstrate how citalopram influences peripheral 5-HT levels, which subsequently affects CD8<sup>+</sup> T cell functionality. By linking these findings, we will clarify how citalopram impacts both TAM and CD8<sup>+</sup> T cells. In the revised manuscript, we will enhance the background information and provide relevant data support to avoid any gaps.

      (7) Figure 5, the authors illustrated their mechanism that citalopram regulates CD8+T cell anti-tumor immunity through proinflammatory TAM with no experimental evidence. Using only CD206 and MHCII to represent TAM subsets obviously is not sufficient.

      As suggested, more relevant experimental data will be included in the revised manuscript to better characterize the TAM populations and their roles in mediating the effects of citalopram on CD8<sup>+</sup> T cells.

      Reviewer #2 (Public review):

      Summary:

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target. However, certain aspects of experimental design and clinical relevance could be further developed to strengthen the study's impact.

      Thank you for your thoughtful review and constructive feedback, and we look forward to improving our manuscript accordingly.

      Strength:

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a thorough strategy for HCC therapy. By emphasizing the potential for existing drugs like citalopram to be repurposed, the study also underscores the feasibility of translational applications.

      Your insights reinforce the significance of our findings, and we will ensure that these points are clearly articulated in the revised manuscript to enhance its impact.

      Major weaknesses/suggestions:

      The dataset and signature database used for GSEA analyses are not clearly specified, limiting reproducibility. The manuscript does not fully explore the potential promiscuity of citalopram's interactions across GLUT1, C5aR1, and SERT1, which could provide a deeper understanding of binding selectivity. The absence of GLUT1 knockdown or knockout experiments in macrophages prevents a complete assessment of GLUT1's role in macrophage versus tumor cell metabolism. Furthermore, there is minimal discussion of clinical data on SSRI use in HCC patients. Incorporating survival outcomes based on SSRI treatment could strengthen the study's translational relevance.

      By addressing these limitations, the manuscript could make an even stronger contribution to the fields of cancer immunotherapy and drug repurposing.

      We appreciate your valuable suggestions. As suggested, we will take the following actions:

      (1) GSEA analysis: we will clearly specify the datasets and signature databases used for the GSEA in the revised manuscript.

      (2) Exploration of binding selectivity: we recognize the importance of exploring the potential promiscuity of citalopram’s interactions across GLUT1, C5aR1, and SERT1. As suggested, we will include a more detailed analysis of these interactions, which will help elucidate binding selectivity and its implications for therapeutic outcomes.

      (3) GLUT1 knockdown in macrophages: to address the gap in our assessment of GLUT1’s role in macrophages, we will incorporate GLUT1 knockdown or knockout experiments in macrophages upon citalopram treatment. Moreover, a DARTS assay for GLUT1 in THP-1 cells will be conducted.

      (4) Clinical data on SSRI use in HCC patients: Related data have been reported previously in PMID: 39388353 (Cell Rep. 2024 Oct 22;43(10):114818.). As detailed below:

      “SSRIs use is associated with reduced disease progression in HCC patients

      We determined whether SSRIs for alleviating HCC are supported by real-world data. A total of 3061 patients with liver cancer were extracted from the Swedish Cancer Register. Among them, 695 patients had been administrated with post-diagnostic SSRIs. The Kaplan-Meier survival analysis suggested that patients who utilized SSRIs exhibited a significantly improved metastasis-free survival compared to those who did not use SSRIs, with a P value of log-rank test at 0.0002. Cox regression analysis showed that SSRI use was associated with a lower risk of metastasis (HR = 0.78; 95% CI, 0.62-0.99).”

      Author response image 1.

    1. eLife Assessment

      Using a unique cerebellar disruption approach in non-human primates, this study provides valuable new insight into how cerebellar inputs to the motor cortex contribute to reaching. Evidence for many claims is solid, but several analyses - especially with respect to control at the end of the reaches - could be expanded or clarified. Additional details about the behavioral task and a clearer description about the limits of the disruption approach with respect to selectivity are also warranted.

    2. Reviewer #1 (Public review):

      Summary:

      In a previous work, Prut and colleagues had shown that during reaching, high-frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report, they extend their previous work by the addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joints. More interestingly, the experiment revealed evidence for the decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      My major concerns are described below.

      If I understand the task design correctly, the monkeys did not need to stop their hand at the target. I think this design may be suboptimal for investigating the role of the cerebellum in control of reaching because a number of earlier works have found that the cerebellum's contributions are particularly significant as the movement ends, i.e., stopping at the target. For example, in mice, interposed nucleus neurons tend to be most active near the end of the reach that requires extension, and their activation produces flexion forces during the reach (Becker and Person 2019). Indeed, the inactivation of interposed neurons that project to the thalamus results in overshooting of reaching movements (Low et al. 2018). Recent work has also found that many Purkinje cells show a burst-pause pattern as the reach nears its endpoint, and stimulation of the mossy fibers tends to disrupt endpoint control (Calame et al. 2023). Thus, the fact that the current paper has no data regarding endpoint control of the reach is puzzling to me.

      Because stimulation continued after the cursor had crossed the target, it is interesting to ask whether this disruption had any effects on the movements that were task-irrelevant. The reason for asking this is because we have found that whereas during task-relevant eye or tongue movements the Purkinje cells are strongly modulated, the modulations are much more muted when similar movements are performed but are task-irrelevant (Pi et al., PNAS 2024; Hage et al. Biorxiv 2024). Thus, it is interesting to ask whether the effects of stimulation were global and affected all movements, or were the effects primarily concerned with the task-relevant movements.

      If the schematic in Figure 1 is accurate, it is difficult for me to see how any of the reaching movements can be termed single joint. In the paper, T1 is labeled as a single joint, and T2-T4 are labeled as dual-joint. The authors should provide data to justify this.

      Because at least part of this work was previously analyzed and published, information should be provided regarding which data are new.

    3. Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high-frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center outreaching movements and has been published from this laboratory in several preceding studies. I found the take-home-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, and the data was clear, convincing, and novel. My comments below highlight suggestions to improve clarity and sharpen some arguments.

      Primary comments:

      (1) Torque vs. tone: Is it known whether this type of cerebellar blockade is reducing muscle tone or inducing any type of acute co-contraction that could influence limb velocity through mechanisms different than 'atonia'? If so, the authors should discuss this information in the discussion section starting around line 336, and clarify that this motivates (if it does) the focus on 'torques' rather than muscle activation. Relatedly, besides the fact that there are joints involved, is there a reason there is so much emphasis on torque per se? If the muscle is deprived of sufficient drive, it would seem that it would be more straightforward to conceptualize the deficit as one of insufficient timed drive to a set of muscles than joint force. Some text better contextualizing the choices made here would be sufficient to address this concern. I found statements like those in the introduction "hand velocity was low initially, reflecting a primary muscle torque deficit" to be lacking in substance. Either that statement is self-evident or the alternative was not made clear. Finally, emphasize that it is a loss of self-generated torque at the shoulder that accounts for the velocity deficits. At times the phrasing makes it seem that there is a loss of some kind of passive torque.

      (2) Please clarify some of the experimental metrics: Ln 94 RESULTS. The success rate is used as a primary behavioral readout, but what constitutes success is not clearly defined in the methods. In addition to providing a clear definition in the methods section, it would also be helpful for the authors to provide a brief list of criteria used to determine a 'successful' movement in the results section before the behavioral consequences of stimulation are described. In particular, the time and positional error requirements should be clear.

      (3) Based on the polar plot in Figure 1c, it seemed odd to consider Targets 1-4 outward and 5-8 inward movements, when 1 and 5 are side-to-side. Is there a rationale for this grouping or might results be cleaner by cleanly segregating outward (targets 2-4) and inward (targets 6-8) movements? Indeed, by Figure 3 where interaction torques are measured, this grouping would seem to align with the hypothesis much more cleanly since it is with T2,T3,and T4 where clear coupling torques deficits are seen with cerebellar block.

      4. I did not follow Figure 3d. Both the figure axis labels and the description in the main text were difficult to follow. Furthermore, the color code per animal made me question whether the linear regression across the entire dataset was valid, or would be better performed within animal, and the regressions summarized across animals. The authors should look again at this section and figure.

      (5) Line 206+ The rationale for examining movement decomposition with a cerebellar block is presented as testing the role of the cerebellum in timing. Yet it is not spelled out what movement decomposition and trajectory variability have to do with motor timing per se.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, "Disentangling acute motor deficits and adaptive responses evoked by the loss of cerebellar output," Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement-related phenotypes in patients with cerebellar lesions or injuries, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they found a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on muscle torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low-velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.

      Strengths:

      Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.

      The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).

      In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.

      Weaknesses:

      The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on muscle torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas.

      The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. That said, the argument is made that this is due to difficulty in compensating for interaction torques. Even if the inward targets (i.e., targets 6-8) do not show a deficit during the acute phase, these targets still have significant interaction torques (Figure 3c). Given the interpretation of the data as presented, it is not clear why disruption of movement during the adaptive phase would not be seen for these targets as well since they also have large interaction torques. Moreover, it is difficult to delve into this issue in more detail, as the analyses in Figures 4 and 5 omit the inward targets.

      The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study.

      The text implies that increased movement decomposition and variability must be due to noise. However, this assumption is not tested. It is possible that the impairments observed are caused by disrupted commands, independent of whether these command signals are noisy. In other words, commands could be low noise but still faulty.

      Throughout the text, the use of the term 'feedforward control' seems unnecessary. To dig into the feedforward component of the deficit, the authors could quantify the trajectory errors only at the earliest time points (e.g., in Figure 5d), but even with this analysis, it is difficult to disentangle feedforward- and feedback-mediated effects when deficits are seen throughout the reach. While outside the scope of this study, it would be interesting to explore how feedback responses to limb perturbation are affected in control versus HFS conditions. However, as is, these questions are not explored, and the claim of impaired feedforward control feels overstated.

      The terminology 'single-joint' movement is a bit confusing. At a minimum, it would be nice to show kinematics during different target reaches to demonstrate that certain targets are indeed single joint movements. More of an issue, however, is that it seems like these are not actually 'single-joint' movements. For example, Figure 2c shows that target 1 exhibits high elbow and shoulder torques, but in the text, T1 is described as a 'single-joint' reach (e.g. lines 155-156). The point that I think the authors are making is that these targets have low interaction torques. If that is the case, the terminology should be changed or clarified to avoid confusion.

      The labels in Figure 3d are confusing and could use more explanation in the figure legend.

      In Figure 3d, it is stated that data from all monkeys is pooled. However, if there is a systematic bias between animals, this could generate spurious correlations. Were correlations also calculated for each animal separately to confirm the same trend between velocity and coupling torques holds for each animal?

      In Table S1, it would be nice to see target-specific success rates. The data would suggest that targets with the highest interaction torques will have the largest reduction in success rates, especially during later HFS trials. Is this the case?

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In a previous work, Prut and colleagues had shown that during reaching, high-frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report, they extend their previous work by the addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joints. More interestingly, the experiment revealed evidence for the decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      My major concerns are described below.

      If I understand the task design correctly, the monkeys did not need to stop their hand at the target. I think this design may be suboptimal for investigating the role of the cerebellum in control of reaching because a number of earlier works have found that the cerebellum's contributions are particularly significant as the movement ends, i.e., stopping at the target. For example, in mice, interposed nucleus neurons tend to be most active near the end of the reach that requires extension, and their activation produces flexion forces during the reach (Becker and Person 2019). Indeed, the inactivation of interposed neurons that project to the thalamus results in overshooting of reaching movements (Low et al. 2018). Recent work has also found that many Purkinje cells show a burst-pause pattern as the reach nears its endpoint, and stimulation of the mossy fibers tends to disrupt endpoint control (Calame et al. 2023). Thus, the fact that the current paper has no data regarding endpoint control of the reach is puzzling to me.

      We appreciate the reviewer’s point that cerebellar contributions can be particularly critical near the endpoint of a reach. In our current task design, monkeys were indeed required to hold at the target briefly—100 ms for Monkeys S and P, and 150 ms for Monkeys C and M—before receiving a reward. However, given the size of the targets and the velocity of movements, it often happened that the monkey didn’t have to stop its movement to obtain a reward. Importantly, we relaxed the task’s requirements (by increasing target size and reducing temporal constraints) to allow monkeys to perform the task under cerebellar block conditions as we found that the strict criteria in these conditions yield a low success rate. This design is suboptimal for studying endpoint accuracy which, as we now appreciate, is an important aspect of cerebellar control. In our revision, we will clarify these aspects of the task design and acknowledge that it is sub-optimal for examining the role of cerebellum in end-point control. Future studies will explicitly address this point more carefully.

      Because stimulation continued after the cursor had crossed the target, it is interesting to ask whether this disruption had any effects on the movements that were task-irrelevant. The reason for asking this is because we have found that whereas during task-relevant eye or tongue movements the Purkinje cells are strongly modulated, the modulations are much more muted when similar movements are performed but are task-irrelevant (Pi et al., PNAS 2024; Hage et al. Biorxiv 2024). Thus, it is interesting to ask whether the effects of stimulation were global and affected all movements, or were the effects primarily concerned with the task-relevant movements.

      This is a very interesting suggestion. Although our main analysis focused on target-directed reaching movements, we have the data for the between-trial movements under continuous stimulation (e.g., return to center movements). In our revised supplementary material, we will examine the effect of cerebellar block on endpoint velocities in inter-trial movements versus task-related movements.

      If the schematic in Figure 1 is accurate, it is difficult for me to see how any of the reaching movements can be termed single joint. In the paper, T1 is labeled as a single joint, and T2-T4 are labeled as dual-joint. The authors should provide data to justify this.

      The is reviewer right and movements to all targets engages shoulder and elbow but the single joint participation varied in a target-specific manner. In the manuscript, we used the term “single-joint” to indicate a target direction in which one joint remains stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5 in our experiments, the net torque (and thus acceleration) at the elbow was negligible, and hence the shoulder experienced correspondingly low coupling torque (as illustrated in Figure 3c of our manuscript). To avoid confusion, we will use the term ‘predominantly single-joint’ movements in our revised manuscript to indicate targets with low coupling torques. We will also include an additional figure in the revised supplementary material displaying the net torques at the shoulder and elbow, similar to Figures 2c and 3c. Our goal is to demonstrate that movements to targets 1 and 5 are characterized by predominantly one-joint engagement (i.e., the elbow is stationary with low net torque) and low coupling torques, rather than implying a purely isolated, single-joint motion.

      Because at least part of this work was previously analyzed and published, information should be provided regarding which data are new.

      We will include a clear statement in the Methods section specifying which components of the dataset and analyses are entirely new. While some of the same animals and stimulation protocol were presented in prior work, the inverse-dynamics modeling, analyses of progressive movement changes across trials under stimulation and invariance of motor noise to movement velocity are newly reported in this manuscript.

      Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high-frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center outreaching movements and has been published from this laboratory in several preceding studies. I found the take-home-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, and the data was clear, convincing, and novel. My comments below highlight suggestions to improve clarity and sharpen some arguments.

      Primary comments:

      (1) Torque vs. tone: Is it known whether this type of cerebellar blockade is reducing muscle tone or inducing any type of acute co-contraction that could influence limb velocity through mechanisms different than 'atonia'? If so, the authors should discuss this information in the discussion section starting around line 336, and clarify that this motivates (if it does) the focus on 'torques' rather than muscle activation. Relatedly, besides the fact that there are joints involved, is there a reason there is so much emphasis on torque per se? If the muscle is deprived of sufficient drive, it would seem that it would be more straightforward to conceptualize the deficit as one of insufficient timed drive to a set of muscles than joint force. Some text better contextualizing the choices made here would be sufficient to address this concern. I found statements like those in the introduction "hand velocity was low initially, reflecting a primary muscle torque deficit" to be lacking in substance. Either that statement is self-evident or the alternative was not made clear. Finally, emphasize that it is a loss of self-generated torque at the shoulder that accounts for the velocity deficits. At times the phrasing makes it seem that there is a loss of some kind of passive torque.

      We appreciate the reviewer’s emphasis on distinguishing reduced muscle tone and altered co-contraction patterns as possible explanations for decreased limb velocity. Our focus on torques arises from previous studies suggesting that the core deficit in cerebellar ataxia is impaired prediction of coupling torques. This point will be added in the discussion section of our revised manuscript where we will explain why we prioritize muscle torques and how muscle-level activation collectively contributes to net joint torques. Also, we will underscore that the observed velocity deficits primarily reflect a reduction of self-generated torque at the shoulder (whether acute or adaptive), rather than any reduction in passive torques.

      (2) Please clarify some of the experimental metrics: Ln 94 RESULTS. The success rate is used as a primary behavioral readout, but what constitutes success is not clearly defined in the methods. In addition to providing a clear definition in the methods section, it would also be helpful for the authors to provide a brief list of criteria used to determine a 'successful' movement in the results section before the behavioral consequences of stimulation are described. In particular, the time and positional error requirements should be clear.

      Successful trials were trials in which monkeys didn’t leave the center position before the go signal and reached the peripheral target within a specific time criteria. These values varied in different monkeys. We will include detailed definitions of our success criteria in the revised methods section of our manuscript. Specifically, we will update our methods section to include (i) the timing criteria of each phase of the trials and (ii) the size of the peripheral targets indicating the tolerance for endpoint accuracy.

      (3) Based on the polar plot in Figure 1c, it seemed odd to consider Targets 1-4 outward and 5-8 inward movements, when 1 and 5 are side-to-side. Is there a rationale for this grouping or might results be cleaner by cleanly segregating outward (targets 2-4) and inward (targets 6-8) movements? Indeed, by Figure 3 where interaction torques are measured, this grouping would seem to align with the hypothesis much more cleanly since it is with T2,T3,and T4 where clear coupling torques deficits are seen with cerebellar block.

      We acknowledge the reviewer’s observation regarding Targets 1 and 5 being side-to-side rather than strictly “outward” or “inward.” In the first section of our results, we grouped the targets in this way to emphasize the notably stronger effect of the cerebellar block on targets involving shoulder flexion (‘outward’) as compared to those involving shoulder extension (‘inwards’). For subsequent analyses we focused on the effects of cerebellar block on outward targets where movements were single-joint (Target 1) vs. multi-joint (Targets 2-4). To clarify this aspect, in our revised manuscript we will explain the rationale for grouping T1–T4 as “outward” and T5–T8 as “inward,” including how we defined them.

      (4) I did not follow Figure 3d. Both the figure axis labels and the description in the main text were difficult to follow. Furthermore, the color code per animal made me question whether the linear regression across the entire dataset was valid, or would be better performed within animal, and the regressions summarized across animals. The authors should look again at this section and figure.

      We will revise the figure labels and legend to clarify how each axis is defined. Please note that pooling the data was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were all positive but statistically significant in 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.33, p < 0.001). These points will be described in the revised manuscript.

      (5) Line 206+ The rationale for examining movement decomposition with a cerebellar block is presented as testing the role of the cerebellum in timing. Yet it is not spelled out what movement decomposition and trajectory variability have to do with motor timing per se.

      The reviewer is right and the relations between timing, decomposition and variability need to be explicitly presented. In our revision, we will explain how decomposed movements may reflect impaired temporal coordination across multiple joints—a critical cerebellar function. We will also clarify how increased variability in joint coordination can result in increased trial-to-trial variability of trajectories.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript, "Disentangling acute motor deficits and adaptive responses evoked by the loss of cerebellar output," Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement-related phenotypes in patients with cerebellar lesions or injuries, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they found a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on muscle torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low-velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.

      Strengths:

      Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.

      The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).

      In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.

      Weaknesses:

      The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on muscle torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas.

      We agree that our approach could more explicitly exploit the rapid reversibility of high-frequency stimulation (HFS) by examining post-stimulation ‘washout’ periods. However, for the present dataset, we ended the session after the set of cerebellar block trials. We plan to study the effect of cerebellar block on immediate post-block washout trials in the future.  

      The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. That said, the argument is made that this is due to difficulty in compensating for interaction torques. Even if the inward targets (i.e., targets 6-8) do not show a deficit during the acute phase, these targets still have significant interaction torques (Figure 3c). Given the interpretation of the data as presented, it is not clear why disruption of movement during the adaptive phase would not be seen for these targets as well since they also have large interaction torques. Moreover, it is difficult to delve into this issue in more detail, as the analyses in Figures 4 and 5 omit the inward targets.

      The reviewer is right and movements to Targets 6–8 (inward) were seemingly unaffected despite also involving significant interaction torques. In fact, we have already attempted to address this issue in the discussion section of the version 1 of our manuscript. Specifically, we note that while outward targets (2–4) tend to involve higher coupling torque impulses on average, this alone does not fully explain the differential impact of cerebellar block, as illustrated by discrepancies at the individual target level (e.g., target 7 vs. target 1). We proposed two possible explanations: (1) a bias toward shoulder flexion in the effect of cerebellar block—consistent with earlier studies showing ipsilateral flexor activation or tone changes following stimulation or lesioning of the deep cerebellar nuclei; and (2) a posture-related facilitation of inward (shoulder extension) movements from the central starting position.

      The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study.

      The reviewer is right that the superior cerebellar peduncle carries both descending and ascending fibers, and that cerebellar nuclei project to subcortical as well as cortical targets. However, it is also important to note that in primates the cerebellar-thalamo-cortical (CTC) pathway greatly expanded (on the expanse of the cerbello-rubro-spinal tract) in mediating cerebellar control of voluntary movements (Horne and Butler, 1995). The cerebello-subcortical pathways lost its importance over the course of evolution (Nathan and Smith, 1982, Padel et al., 1981, ten Donkelaar, 1988). In our previous study we found that the ascending spinocerebellar axons which enter the cerebellum through the SCP are weakly task-related and the descending system is quite small (Cohen et al, 2017). However, we cannot rule out an effect of HFS mediated in part through other systems. In the revised introduction section, we will clarify this point and use more careful language about the scope of our stimulation, emphasizing that HFS disrupts cerebellar communication broadly, rather than solely the cerebello-thalamo-cortical pathway.

      The text implies that increased movement decomposition and variability must be due to noise. However, this assumption is not tested. It is possible that the impairments observed are caused by disrupted commands, independent of whether these command signals are noisy. In other words, commands could be low noise but still faulty.

      We recognize the reviewer’s concern about linking movement decomposition and trial-to-trial trajectory variability with motor noise. As presented in our discussion section, we interpret these motor abnormalities as a form of motor noise in the sense that they are generated by faulty motor commands. We draw our interpretation from the findings of previous research work which show that the cerebellum aids in the state estimation of the limb and subsequent generation of accurate feedforward commands. Therefore, disruption of the cerebellar output may lead to faulty motor commands resulting in the observed asynchronous joint activations (i.e., movement decomposition) and unpredictable trajectories (i.e., increased trial-to-trial variability). Both observed deficits resemble increased motor noise.

      Throughout the text, the use of the term 'feedforward control' seems unnecessary. To dig into the feedforward component of the deficit, the authors could quantify the trajectory errors only at the earliest time points (e.g., in Figure 5d), but even with this analysis, it is difficult to disentangle feedforward- and feedback-mediated effects when deficits are seen throughout the reach. While outside the scope of this study, it would be interesting to explore how feedback responses to limb perturbation are affected in control versus HFS conditions. However, as is, these questions are not explored, and the claim of impaired feedforward control feels overstated.

      We agree that to strictly focus on feedforward control, we could have examined the measured variables in the first 50-100 ms of the movement which has been shown to be unaffected by feedback responses (Pruszynski et al. 2008, Todorov and Jordan 2002, Pruszynski and Scott 2012, Crevecoeur et al. 2013). However, in our task the amplitude of movements made by our monkeys was small and therefore the response measures we used were too small in the first 50-100 ms for a robust estimation. Also, fixing a time window led to an unfair comparison between control and cerebellar block trials, in which velocity was significantly reduced and therefore movement time was longer. Therefore, we used the peak velocity, torque-impulse at the peak velocity and maximum deviation of the hand trajectory as response measures. We will acknowledge this point in the discussion section of our revised manuscript. We will also tone down references to feedforward control throughout the text of our revised manuscript as suggested by the reviewer.

      The terminology 'single-joint' movement is a bit confusing. At a minimum, it would be nice to show kinematics during different target reaches to demonstrate that certain targets are indeed single joint movements. More of an issue, however, is that it seems like these are not actually 'single-joint' movements. For example, Figure 2c shows that target 1 exhibits high elbow and shoulder torques, but in the text, T1 is described as a 'single-joint' reach (e.g. lines 155-156). The point that I think the authors are making is that these targets have low interaction torques. If that is the case, the terminology should be changed or clarified to avoid confusion.

      Indeed, as reviewer #1 also noted, movements to target 1 and 5 are not purely single-joint but rather have relatively low coupling torques. Our intention while using the term “single-joint” was to indicate a target direction in which one joint remains stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5 in our experiments, the net torque (and thus acceleration) at the elbow was negligible, and hence the shoulder experienced correspondingly low coupling torque (as illustrated in Figure 3c of our manuscript). ). To avoid confusion, we will use the term ‘predominantly single-joint’ movements in our revised manuscript to indicate targets with low coupling torques. We will also include an additional figure in the revised supplementary material displaying the net torques at the shoulder and elbow, similar to Figures 2c and 3c. Our goal is to demonstrate that movements to targets 1 and 5 are characterized by predominantly one-joint engagement (i.e., the elbow is stationary with low net torque) and low coupling torques, rather than implying a purely isolated, single-joint motion.

      The labels in Figure 3d are confusing and could use more explanation in the figure legend.

      In Figure 3d, it is stated that data from all monkeys is pooled. However, if there is a systematic bias between animals, this could generate spurious correlations. Were correlations also calculated for each animal separately to confirm the same trend between velocity and coupling torques holds for each animal?

      We will revise the figure legend and main-text explanation for Figure 3d. Please note that pooling the data was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were positive but significant for 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.33, p < 0.001). These points will be described in the revised manuscript.

      In Table S1, it would be nice to see target-specific success rates. The data would suggest that targets with the highest interaction torques will have the largest reduction in success rates, especially during later HFS trials. Is this the case?

      We will provide a breakdown of the success rates as a function of targets. However, one should note that success/failure may depend on several factors beyond impaired limb dynamics. In a previous study (Nashef et al. 2019) we identified several causes of failure such as (i) not entering the central target in time, (ii) moving out too early from the peripheral target, (iii) Reaction time longer than permitted, or (iv) premature exit from the central target before permitted.

    1. eLife Assessment

      This valuable short paper is an ingenious use of clinical patient data to address an issue in imaging neuroscience. The authors clarify the role of face-selectivity in human fusiform gyrus by measuring both BOLD fMRI and depth electrode recordings in the same individuals; furthermore, by comparing responses in different brain regions in the two patients, they suggested that the suppression of blood oxygenation is associated with a decrease in local neural activity. While the methods are compelling and provide a rare dataset of potentially general importance, the presentation of the data in its current form is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      Measurement of BOLD MR imaging has regularly found regions of the brain that show reliable suppression of BOLD responses during specific experimental testing conditions. These observations are to some degree unexplained, in comparison with more usual association between activation of the BOLD response and excitatory activation of the neurons (most tightly linked to synaptic activity) in the same brain location. This paper finds two patients whose brains were tested with both non-invasive functional MRI and with invasive insertion of electrodes, which allowed the direct recording of neuronal activity. The electrode insertions were made within the fusiform gyrus, which is known to process information about faces, in a clinical search for the sites of intractable epilepsy in each patient. The simple observation is that the electrode location in one patient showed activation of the BOLD response and activation of neuronal firing in response to face stimuli. This is the classical association. The other patient showed an informative and different pattern of responses. In this person, the electrode location showed a suppression of the BOLD response to face stimuli and, most interestingly, an associated suppression of neuronal activity at the electrode site.

      Strengths:

      Whilst these results are not by themselves definitive, they add an important piece of evidence to a long-standing discussion about the origins of the BOLD response. The observation of decreased neuronal activation associated with negative BOLD is interesting because, at various times, exactly the opposite association has been predicted. It has been previously argued that if synaptic mechanisms of neuronal inhibition are responsible for the suppression of neuronal firing, then it would be reasonable

      Weaknesses:

      The chief weakness of the paper is that the results may be unique in a slightly awkward way. The observation of positive BOLD and neuronal activation is made at one brain site in one patient, while the complementary observation of negative BOLD and neuronal suppression actually derives from the other patient. Showing both effects in both patients would make a much stronger paper.

    3. Reviewer #2 (Public review):

      Summary:

      This is a short and straightforward paper describing BOLD fMRI and depth electrode measurements from two regions of the fusiform gyrus that show either higher or lower BOLD responses to faces vs. objects (which I will call face-positive and face-negative regions). In these regions, which were studied separately in two patients undergoing epilepsy surgery, spiking activity increased for faces relative to objects in the face-positive region and decreased for faces relative to objects in the face-negative region. Interestingly, about 30% of neurons in the face-negative region did not respond to objects and decreased their responses below baseline in response to faces (absolute suppression).

      Strengths:

      These patient data are valuable, with many recording sessions and neurons from human face-selective regions, and the methods used for comparing face and object responses in both fMRI and electrode recordings were robust and well-established. The finding of absolute suppression could clarify the nature of face selectivity in human fusiform gyrus since previous fMRI studies of the face-negative region could not distinguish whether face < object responses came from absolute suppression, or just relatively lower but still positive responses to faces vs. objects.

      Weaknesses:

      The authors claim that the results tell us about both 1) face-selectivity in the fusiform gyrus, and 2) the physiological basis of the BOLD signal. However, I would like to see more of the data that supports the first claim, and I am not sure the second claim is supported.

      (1) The authors report that ~30% of neurons showed absolute suppression, but those data are not shown separately from the neurons that only show relative reductions. It is difficult to evaluate the absolute suppression claim from the short assertion in the text alone (lines 105-106), although this is a critical claim in the paper.<br /> (2) I am not sure how much light the results shed on the physiological basis of the BOLD signal. The authors write that the results reveal "that BOLD decreases can be due to relative, but also absolute, spike suppression in the human brain" (line 120). But I think to make this claim, you would need a region that exclusively had neurons showing absolute suppression, not a region with a mix of neurons, some showing absolute suppression and some showing relative suppression, as here. The responses of both groups of neurons contribute to the measured BOLD signal, so it seems impossible to tell from these data how absolute suppression per se drives the BOLD response.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper the authors conduct two experiments an fMRI experiment and intracranial recordings of neurons in two patients P1 and P2. In both experiments, they employ a SSVEP paradigm in which they show images at a fast rate (e.g. 6Hz) and then they show face images at a slower rate (e.g. 1.2Hz), where the rest of the images are a variety of object images. In the first patient, they record from neurons over a region in the mid fusiform gyrus that is face-selective and in the second patient, they record neurons from a region more medially that is not face selective (it responds more strongly to objects than faces). Results find similar selectivity between the electrophysiology data and the fMRI data in that the location which shows higher fMRI to faces also finds face-selective neurons and the location which finds preference to non faces also shows non face preferring neurons.

      Strengths:

      The data is important in that it shows that there is a relationship between category selectivity measured from electrophysiology data and category-selective from fMRI. The data is unique as it contains a lot of single and multiunit recordings (245 units) from the human fusiform gyrus - which the authors point out - is a humanoid specific gyrus.

      Weaknesses:

      My major concerns are two-fold:<br /> (i) There is a paucity of data; Thus, more information (results and methods) is warranted; and in particular there is no comparison between the fMRI data and the SEEG data.

      (ii) One main claim of the paper is that there is evidence for suppressed responses to faces in the non-face selective region. That is, the reduction in activation to faces in the non-face selective region is interpreted as a suppression in the neural response and consequently the reduction in fMRI signal is interpreted as suppression. However, the SSVEP paradigm has no baseline (it alternates between faces and objects) and therefore it cannot distinguish between lower firing rate to faces vs suppression of response to faces.

      (1) Additional data: the paper has 2 figures: figure 1 which shows the experimental design and figure 2 which presents data, the latter shows one example neuron raster plot from each patient and group average neural data from each patient. In this reader's opinion this is insufficient data to support the conclusions of the paper. The paper will be more impactful if the researchers would report the data more comprehensively.

      (a) There is no direct comparison between the fMRI data and the SEEG data, except for a comparison of the location of the electrodes relative to the statistical parametric map generated from a contrast (Fig 2a,d). It will be helpful to build a model linking between the neural responses to the voxel response in the same location - i.e., estimate from the electrophysiology data the fMRI data (e.g. Logothetis & Wandell, 2004)

      (b) More comprehensive analyses of the SSVEP neural data: It will be helpful to show the results of the frequency analyses of the SSVEP data for all neurons to show that there are significant visual responses and significant face responses. It will be also useful to compare and quantify the magnitude of the face responses compared to the visual responses.

      (c) The neuron shown in E shows cyclical responses tied to the onset of the stimuli, is this the visual response? If so, why is there an increase in the firing rate of the neuron before the face stimulus is shown in time 0? The neuron's data seems different than the average response across neurons; This raises a concern about interpreting the average response across neurons in panel F which seems different than the single neuron responses

      (d) Related to (c) it would be useful to show raster plots of all neurons and quantify if the neural responses within a region are homogeneous or heterogeneous. This would add data relating the single neuron response to the population responses measured from fMRI. See also Nir 2009.

      (e) When reporting group average data (e.g., Fig 2C,F) it is necessary to show standard deviation of the response across neurons.

      (f) Is it possible to estimate the latency of the neural responses to face and object images from the phase data? If so, this will add important information on the timing of neural responses in the human fusiform gyrus to face and object images.

      (g) Related to (e) In total the authors recorded data from 245 units (some single units and some multiunits) and they found that both in the face and nonface selective most of the recoded neurons exhibited face -selectivity, which this reader found confusing: They write " Among all visually responsive neurons, we 87 found a very high proportion of face-selective neurons (p < 0.05) in both activated 88 and deactivated MidFG regions (P1: 98.1%; N = 51/52; P2: 86.6%; N = 110/127)'. Is the face selectivity in P1 an increase in response to faces and P2 a reduction in response to faces or in both it's an increase in response to faces

      (1) Additional methods<br /> (a) it is unclear if the SSVEP analyses of neural responses were done on the spikes or the raw electrical signal. If the former, how is the SSVEP frequency analysis done on discrete data like action potentials?<br /> (b) it is unclear why the onset time was shifted by 33ms; one can measure the phase of the response relative to the cycle onset and use that to estimate the delay between the onset of a stimulus and the onset of the response. Adding phase information will be useful.

      (2) Interpretation of suppression:

      The SSVEP paradigm alternates between 2 conditions: faces and objects and has no baseline; In other words, responses to faces are measured relative to the baseline response to objects so that any region that contains neurons that have a lower firing rate to faces than objects is bound to show a lower response in the SSVEP signal. Therefore, because the experiment does not have a true baseline (e.g. blank screen, with no visual stimulation) this experimental design cannot distinguish between lower firing rate to faces vs suppression of response to faces.<br /> The strongest evidence put forward for suppression is the response of non-visual neurons that was also reduced when patients looked at faces, but since these are non-visual neurons, it is unclear how to interpret the responses to faces.

    5. Author response:

      eLife Assessment

      This valuable short paper is an ingenious use of clinical patient data to address an issue in imaging neuroscience. The authors clarify the role of face-selectivity in human fusiform gyrus by measuring both BOLD fMRI and depth electrode recordings in the same individuals; furthermore, by comparing responses in different brain regions in the two patients, they suggested that the suppression of blood oxygenation is associated with a decrease in local neural activity. While the methods are compelling and provide a rare dataset of potentially general importance, the presentation of the data in its current form is incomplete.

      We thank the Reviewing editor and Senior editor at eLife for their positive assessment of our paper. After reading the reviewers’ comments – to which we reply below - we agree that the presentation of the data could be completed. We provide additional presentation of data in the responses below and we will slightly modify Figure 2 of the paper. However, in keeping the short format of the paper, the revised version will have the same number of figures, which support the claims made in the paper.

      Reviewer #1 (Public review):

      Summary:

      Measurement of BOLD MR imaging has regularly found regions of the brain that show reliable suppression of BOLD responses during specific experimental testing conditions. These observations are to some degree unexplained, in comparison with more usual association between activation of the BOLD response and excitatory activation of the neurons (most tightly linked to synaptic activity) in the same brain location. This paper finds two patients whose brains were tested with both non-invasive functional MRI and with invasive insertion of electrodes, which allowed the direct recording of neuronal activity. The electrode insertions were made within the fusiform gyrus, which is known to process information about faces, in a clinical search for the sites of intractable epilepsy in each patient. The simple observation is that the electrode location in one patient showed activation of the BOLD response and activation of neuronal firing in response to face stimuli. This is the classical association. The other patient showed an informative and different pattern of responses. In this person, the electrode location showed a suppression of the BOLD response to face stimuli and, most interestingly, an associated suppression of neuronal activity at the electrode site.

      Strengths:

      Whilst these results are not by themselves definitive, they add an important piece of evidence to a long-standing discussion about the origins of the BOLD response. The observation of decreased neuronal activation associated with negative BOLD is interesting because, at various times, exactly the opposite association has been predicted. It has been previously argued that if synaptic mechanisms of neuronal inhibition are responsible for the suppression of neuronal firing, then it would be reasonable

      Weaknesses:

      The chief weakness of the paper is that the results may be unique in a slightly awkward way. The observation of positive BOLD and neuronal activation is made at one brain site in one patient, while the complementary observation of negative BOLD and neuronal suppression actually derives from the other patient. Showing both effects in both patients would make a much stronger paper.

      We thank reviewer #1 for their positive evaluation of our paper. Obviously, we agree with the reviewer that the paper would be much stronger if BOTH effects – spike increase and decrease – would be found in BOTH patients in their corresponding fMRI regions (lateral and medial fusiform gyrus) (also in the same hemisphere). Nevertheless, we clearly acknowledge this limitation in the (revised) version of the manuscript (p.8: Material and Methods section).

      In the current paper, one could think that P1 shows only increases to faces, and P2 would show only decreases (irrespective of the region). However, that is not the case since 11% of P1’s face-selective units are decreases (89% are increases) and 4% of P2’s face-selective units are increases. This has now been made clearer in the manuscript (p.5).

      As the reviewer is certainly aware, the number and position of the electrodes are based on strict clinical criteria, and we will probably never encounter a situation with two neighboring (macro-micro hybrid electrodes), one with microelectrodes ending up in the lateral MidFG, the other in the medial MidFG, in the same patient. If there is no clinical value for the patient, this cannot be done.

      The only thing we can do is to strengthen these results in the future by collecting data on additional patients with an electrode either in the lateral or the medial FG, together with fMRI. But these are the only two patients we have been able to record so far with electrodes falling unambiguously in such contrasted regions and with large (and comparable) measures.

      While we acknowledge that the results may be unique because of the use of 2 contrasted patients only (and this is why the paper is a short report), the data is compelling in these 2 cases, and we are confident that it will be replicated in larger cohorts in the future.

      Reviewer #2 (Public review):

      Summary:

      This is a short and straightforward paper describing BOLD fMRI and depth electrode measurements from two regions of the fusiform gyrus that show either higher or lower BOLD responses to faces vs. objects (which I will call face-positive and facenegative regions). In these regions, which were studied separately in two patients undergoing epilepsy surgery, spiking activity increased for faces relative to objects in the face-positive region and decreased for faces relative to objects in the face-negative region. Interestingly, about 30% of neurons in the face-negative region did not respond to objects and decreased their responses below baseline in response to faces (absolute suppression).

      Strengths:

      These patient data are valuable, with many recording sessions and neurons from human face-selective regions, and the methods used for comparing face and object responses in both fMRI and electrode recordings were robust and well-established. The finding of absolute suppression could clarify the nature of face selectivity in human fusiform gyrus since previous fMRI studies of the face-negative region could not distinguish whether face < object responses came from absolute suppression, or just relatively lower but still positive responses to faces vs. objects.

      Weaknesses:

      The authors claim that the results tell us about both 1) face-selectivity in the fusiform gyrus, and 2) the physiological basis of the BOLD signal. However, I would like to see more of the data that supports the first claim, and I am not sure the second claim is supported.

      (1) The authors report that ~30% of neurons showed absolute suppression, but those data are not shown separately from the neurons that only show relative reductions. It is difficult to evaluate the absolute suppression claim from the short assertion in the text alone (lines 105-106), although this is a critical claim in the paper.

      We thank reviewer #2 for their positive evaluation of our paper. We understand the reviewer’s point, and we partly agree. Where we respectfully disagree is that the finding of absolute suppression is critical for the claim of the paper: finding an identical contrast between the two regions in terms of RELATIVE increase/decrease of face-selective activity in fMRI and spiking activity is already novel and informative. Where we agree with the reviewer is that the absolute suppression could be more documented: it wasn’t, due to space constraints (brief report). We provide below an example of a neuron showing absolute suppression to faces. In the frequency domain, there is only a face-selective response (1.2 Hz and harmonics) but no significant response at 6 Hz (common general visual response). In the time-domain, relative to face onset, the response drops below baseline level. It means that this neuron has baseline (non-periodic) spontaneous spiking activity that is actively suppressed when a face appears.

      Author response image 1.

      (2) I am not sure how much light the results shed on the physiological basis of the BOLD signal. The authors write that the results reveal "that BOLD decreases can be due to relative, but also absolute, spike suppression in the human brain" (line 120). But I think to make this claim, you would need a region that exclusively had neurons showing absolute suppression, not a region with a mix of neurons, some showing absolute suppression and some showing relative suppression, as here. The responses of both groups of neurons contribute to the measured BOLD signal, so it seems impossible to tell from these data how absolute suppression per se drives the BOLD response.

      It is a fact that we find both kinds of responses in the same region.  We cannot tell with this technique if neurons showing relative vs. absolute suppression of responses are spatially segregated for instance (e.g., forming two separate sub-regions) or are intermingled. And we cannot tell from our data how absolute suppression per se drives the BOLD response. In our view, this does not diminish the interest and originality of the study, but the statement "that BOLD decreases can be due to relative, but also absolute, spike suppression in the human brain” will be rephrased in the revised manuscript, in the following way: "that BOLD decreases can be due to relative, or absolute (or a combination of both), spike suppression in the human brain”.

      Reviewer #3 (Public review):

      In this paper the authors conduct two experiments an fMRI experiment and intracranial recordings of neurons in two patients P1 and P2. In both experiments, they employ a SSVEP paradigm in which they show images at a fast rate (e.g. 6Hz) and then they show face images at a slower rate (e.g. 1.2Hz), where the rest of the images are a variety of object images. In the first patient, they record from neurons over a region in the mid fusiform gyrus that is face-selective and in the second patient, they record neurons from a region more medially that is not face selective (it responds more strongly to objects than faces). Results find similar selectivity between the electrophysiology data and the fMRI data in that the location which shows higher fMRI to faces also finds face-selective neurons and the location which finds preference to non faces also shows non face preferring neurons.

      Strengths:

      The data is important in that it shows that there is a relationship between category selectivity measured from electrophysiology data and category-selective from fMRI. The data is unique as it contains a lot of single and multiunit recordings (245 units) from the human fusiform gyrus - which the authors point out - is a humanoid specific gyrus.

      Weaknesses:

      My major concerns are two-fold:

      (i) There is a paucity of data; Thus, more information (results and methods) is warranted; and in particular there is no comparison between the fMRI data and the SEEG data.

      We thank reviewer #3 for their positive evaluation of our paper. If the reviewer means paucity of data presentation, we agree and we provide more presentation below, although the methods and results information appear as complete to us. The comparison between fMRI and SEEG is there, but can only be indirect (i.e., collected at different times and not related on a trial-by-trial basis for instance). In addition, our manuscript aims at providing a short empirical contribution to further our understanding of the relationship between neural responses and BOLD signal, not to provide a model of neurovascular coupling.

      (ii) One main claim of the paper is that there is evidence for suppressed responses to faces in the non-face selective region. That is, the reduction in activation to faces in the non-face selective region is interpreted as a suppression in the neural response and consequently the reduction in fMRI signal is interpreted as suppression. However, the SSVEP paradigm has no baseline (it alternates between faces and objects) and therefore it cannot distinguish between lower firing rate to faces vs suppression of response to faces.

      We understand the concern of the reviewer, but we respectfully disagree that our paradigm cannot distinguish between lower firing rate to faces vs. suppression of response to faces. Indeed, since the stimuli are presented periodically (6 Hz), we can objectively distinguish stimulus-related activity from spontaneous neuronal firing. The baseline corresponds to spikes that are non-periodic, i.e., unrelated to the (common face and object) stimulation. For a subset of neurons, even this non-periodic baseline activity is suppressed, above and beyond the suppression of the 6 Hz response illustrated on Figure 2. We mention it in the manuscript, but we agree that we do not present illustrations of such decrease in the time-domain for SU, which we did not consider as being necessary initially (please see below for such presentation).

      (1) Additional data: the paper has 2 figures: figure 1 which shows the experimental design and figure 2 which presents data, the latter shows one example neuron raster plot from each patient and group average neural data from each patient. In this reader's opinion this is insufficient data to support the conclusions of the paper. The paper will be more impactful if the researchers would report the data more comprehensively.

      We answer to more specific requests for additional evidence below, but the reviewer should be aware that this is a short report, which reaches the word limit. In our view, the group average neural data should be sufficient to support the conclusions, and the example neurons are there for illustration. And while we cannot provide the raster plots for a large number of neurons, the anonymized data will be made available upon publication of the final version of the paper.

      (a) There is no direct comparison between the fMRI data and the SEEG data, except for a comparison of the location of the electrodes relative to the statistical parametric map generated from a contrast (Fig 2a,d). It will be helpful to build a model linking between the neural responses to the voxel response in the same location - i.e., estimate from the electrophysiology data the fMRI data (e.g., Logothetis & Wandell, 2004).

      As mentioned above the comparison between fMRI and SEEG is indirect (i.e., collected at different times and not related on a trial-by-trial basis for instance) and would not allow to make such a model.

      (b) More comprehensive analyses of the SSVEP neural data: It will be helpful to show the results of the frequency analyses of the SSVEP data for all neurons to show that there are significant visual responses and significant face responses. It will be also useful to compare and quantify the magnitude of the face responses compared to the visual responses.

      The data has been analyzed comprehensively, but we would not be able to show all neurons with such significant visual responses and face-selective responses.

      (c) The neuron shown in E shows cyclical responses tied to the onset of the stimuli, is this the visual response?

      Correct, it’s the visual response at 6 Hz.

      If so, why is there an increase in the firing rate of the neuron before the face stimulus is shown in time 0?

      Because the stimulation is continuous. What is displayed at 0 is the onset of the face stimulus, with each face stimulus being preceded by 4 images of nonface objects.

      The neuron's data seems different than the average response across neurons; This raises a concern about interpreting the average response across neurons in panel F which seems different than the single neuron responses

      The reviewer is correct, and we apologize for the confusion. This is because the average data on panel F has been notch-filtered for the 6 Hz (and harmonic responses), as indicated in the methods (p.11):  ‘a FFT notch filter (filter width = 0.05 Hz) was then applied on the 70 s single or multi-units time-series to remove the general visual response at 6 Hz and two additional harmonics (i.e., 12 and 18 Hz)’.

      Here is the same data without the notch-filter (the 6Hz periodic response is clearly visible):

      Author response image 2.

      For sake of clarity, we prefer presenting the notch-filtered data in the paper, but the revised version will make it clear in the figure caption that the average data has been notch-filtered.

      (d) Related to (c) it would be useful to show raster plots of all neurons and quantify if the neural responses within a region are homogeneous or heterogeneous. This would add data relating the single neuron response to the population responses measured from fMRI. See also Nir 2009.

      We agree with the reviewer that this is interesting, but again we do not think that it is necessary for the point made in the present paper. Responses in these regions appear rather heterogenous, and we are currently working on a longer paper with additional SEEG data (other patients tested for shorter sessions) to define and quantify the face-selective neurons in the MidFusiform gyrus with this approach (without relating it to the fMRI contrast as reported here).

      (e) When reporting group average data (e.g., Fig 2C,F) it is necessary to show standard deviation of the response across neurons.

      We agree with the reviewer and have modified Figure 2 accordingly in the revised manuscript.

      (f) Is it possible to estimate the latency of the neural responses to face and object images from the phase data? If so, this will add important information on the timing of neural responses in the human fusiform gyrus to face and object images.

      The fast periodic paradigm to measure neural face-selectivity has been used in tens of studies since its original reports:

      - in EEG: Rossion et al., 2015: https://doi.org/10.1167/15.1.18

      - in SEEG: Jonas et al., 2016: https://doi.org/10.1073/pnas.1522033113

      In this paradigm, the face-selective response spreads to several harmonics (1.2 Hz, 2.4 Hz, 3.6 Hz, etc.) (which are summed for quantifying the total face-selective amplitude). This is illustrated below by the averaged single units’ SNR spectra across all recording sessions for both participants.

      Author response image 3.

      There is no unique phase-value, each harmonic being associated with a phase-value, so that the timing cannot be unambiguously extracted from phase values. Instead, the onset latency is computed directly from the time-domain responses, which is more straightforward and reliable than using the phase. Note that the present paper is not about the specific time-courses of the different types of neurons, which would require a more comprehensive report, but which is not necessary to support the point made in the present paper about the SEEG-fMRI sign relationship.

      g) Related to (e) In total the authors recorded data from 245 units (some single units and some multiunits) and they found that both in the face and nonface selective most of the recoded neurons exhibited face -selectivity, which this reader found confusing: They write “ Among all visually responsive neurons, we found a very high proportion of face-selective neurons (p < 0.05) in both activated and deactivated MidFG regions (P1: 98.1%; N = 51/52; P2: 86.6%; N = 110/127)’. Is the face selectivity in P1 an increase in response to faces and P2 a reduction in response to faces or in both it’s an increase in response to faces

      Face-selectivity is defined as a DIFFERENTIAL response to faces compared to objects, not necessarily a larger response to faces. So yes, face-selectivity in P1 is an increase in response to faces and P2 a reduction in response to faces.

      (1) Additional methods

      (a) it is unclear if the SSVEP analyses of neural responses were done on the spikes or the raw electrical signal. If the former, how is the SSVEP frequency analysis done on discrete data like action potentials?

      The FFT is applied directly on spike trains using Matlab’s discrete Fourier Transform function. This function is suitable to be applied to spike trains in the same way as to any sampled digital signal (here, the microwires signal was sampled at 30 kHz, see Methods).

      In complementary analyses, we also attempted to apply the FFT on spike trains that had been temporally smoothed by convolving them with a 20ms square window (Le Cam et al., 2023, cited in the paper ). This did not change the outcome of the frequency analyses in the frequency range we are interested in.

      (b) it is unclear why the onset time was shifted by 33ms; one can measure the phase of the response relative to the cycle onset and use that to estimate the delay between the onset of a stimulus and the onset of the response. Adding phase information will be useful.

      The onset time was shifted by 33ms because the stimuli are presented with a sinewave contrast modulation (i.e., at 0ms, the stimulus has 0% contrast). 100% contrast is reached at half a stimulation cycle, which is 83.33ms here, but a response is likely triggered before reaching 100% contrast. To estimate the delay between the start of the sinewave (0% contrast) and the triggering of a neural response, we tested 7 SEEG participants with the same images presented in FPVS sequences either as a sinewave contrast (black line) modulation or as a squarewave (i.e. abrupt) contrast modulation (red line).  The 33ms value is based on these LFP data obtained in response to such sinewave stimulation and squarewave stimulation of the same paradigm. This delay corresponds to 4 screen refresh frames (120 Hz refresh rate = 8.33ms by frame) and 35% of the full contrast, as illustrated below (please see also Retter, T. L., & Rossion, B. (2016). Uncovering the neural magnitude and spatio-temporal dynamics of natural image categorization in a fast visual stream. Neuropsychologia, 91, 9–28).

      Author response image 4.

      (2) Interpretation of suppression:

      The SSVEP paradigm alternates between 2 conditions: faces and objects and has no baseline; In other words, responses to faces are measured relative to the baseline response to objects so that any region that contains neurons that have a lower firing rate to faces than objects is bound to show a lower response in the SSVEP signal. Therefore, because the experiment does not have a true baseline (e.g. blank screen, with no visual stimulation) this experimental design cannot distinguish between lower firing rate to faces vs suppression of response to faces.

      The strongest evidence put forward for suppression is the response of non-visual neurons that was also reduced when patients looked at faces, but since these are non-visual neurons, it is unclear how to interpret the responses to faces.

      We understand this point, but how does the reviewer know that these are non-visual neurons? Because these neurons are located in the visual cortex, they are likely to be visual neurons that are not responsive to non-face objects. In any case, as the reviewer writes, we think it’s strong evidence for suppression.

      We thank all three reviewers for their positive evaluation of our paper and their constructive comments.

    1. eLife Assessment

      This study shows that strip cropping -- planting different crops in strips on the same field -- enhances the taxonomic diversity of ground beetles relative to corresponding monocultures in multiple experiments with different crops in the Netherlands. While these findings are important for demonstrating the potential beneficial effects of this form of intercropping, the information presented is incomplete with regard to sampling design and data obtained.

    2. Reviewer #1 (Public review):

      Summary:

      This study demonstrates that strip cropping enhances the taxonomic diversity of ground beetles across organically-managed crop systems in the Netherlands. In particular, strip cropping supported 15% more ground beetle species and 30% more individuals compared to monocultures.

      Strengths:

      A well-written study with well-analyzed data of a complex design. The data could have been analyzed differently e.g. by not pooling samples, but there are pros and cons for each type of analysis and I am convinced this will not affect the main findings. A strong point is that data were collected for 4 years. This is especially strong as most data on biodiversity in cropping systems are only collected for one or two seasons. Another strong point is that several crops were included.

      Weaknesses:

      This study focused on the biodiversity of ground beetles and did not examine crop productivity. Therefore, I disagree with the claim that this study demonstrates biodiversity enhancement without compromising yield. The authors should present results on yield or, at the very least, provide a stronger justification for this statement.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate the effects of organic strip cropping on carabid richness and density as well as on crop yields. They find on average higher carabid richness and density in strip cropping and organic farming, but not in all cases.

      Strengths:

      Based on highly resolved species-level carabid data, the authors present estimates for many different crop types, some of them rarely studied, at the same time. The authors did a great job investigating different aspects of the assemblages (although some questions remain concerning the analyses) and they present their results in a visually pleasing and intuitive way.

      Weaknesses:

      The authors used data from four different strip cropping experiments and there is no real replication in space as all of these differed in many aspects (different crops, different areas between years, different combinations, design of the strip cropping (orientation and width), sampling effort and sample sizes of beetles (differing more than 35 fold between sites; L 100f); for more differences see L 237ff). The reader gets the impression that the authors stitched data from various places together that were not made to fit together. This may not be a problem per se but it surely limits the strength of the data as results for various crops may only be based on small samples from one or two sites (it is generally unclear how many samples were used for each crop/crop combination).

      One of my major concerns is that it is completely unclear where carabids were collected. As some strips were 3m wide, some others were 6m and the monoculture plots large, it can be expected that carabids were collected at different distances from the plot edge. This alone, however, was conclusively shown to affect carabid assemblages dramatically and could easily outweigh the differences shown here if not accounted for in the models (see e.g. Boetzl et al. (2024) or Knapp et al. (2019) among many other studies on within field-distributions of carabids).

      The authors hint at a related but somewhat different problem in L 137ff - carabid assemblages sampled in strips were sampled in closer proximity to each other than assemblages in monoculture fields which is very likely a problem. The authors did not check whether their results are spatially autocorrelated and this shortcoming is hard to account for as it would have required a much bigger, spatially replicated design in which distances are maintained from the beginning. This limitation needs to be stated more clearly in the manuscript.

      Similarly, we know that carabid richness and density depend strongly on crop type (see e.g. Toivonen et al. (2022)) which could have biased results if the design is not balanced (this information is missing but it seems to be the case, see e.g. Celeriac in Almere in 2022).

      A more basic problem is that the reader neither learns where traps were located, how missing traps were treated for analyses how many samples there were per crop or crop combination (in a simple way, not through Table S7 - there has to have been a logic in each of these field trials) or why there are differences in the number of samples from the same location and year (see Table S7). This information needs to be added to the methods section.

      As carabid assemblages undergo rapid phenological changes across the year, assemblages that are collected at different phenological points within and across years cannot easily be compared. The authors would need to standardize for this and make sure that the assemblages they analyze are comparable prior to analyses. Otherwise, I see the possibility that the reported differences might simply be biased by phenology.

      Surrounding landscape structure is known to affect carabid richness and density and could thus also bias observed differences between treatments at the same locations (lower overall richness => lower differences between treatments). Landscape structure has not been taken into account in any way.

      In the statistical analyses, it is unclear whether the authors used estimated marginal means (as they should) - this needs to be clarified.

      In addition, and as mentioned by Dr. Rasmann in the previous round (comment 1), the manuscript, in its current form, still suffers from simplified generalizations that 'oversell' the impact of the study and should be avoided. The authors restricted their analyses to ground beetles and based their conclusions on a design with many 'heterogeneities' - they should not draw conclusions for farmland biodiversity but stick to their system and report what they found. Although I understand the authors have previously stated that this is 'not practically feasible', the reason for this comment is simply to say that the authors should not oversell their findings.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, the authors made a sincere effort to show the effects of strip cropping, a technique of alternating crops in small strips of several meters wide, on ground beetle diversity. They state that strip cropping can be a useful tool for bending the curve of biodiversity loss in agricultural systems as strip cropping shows a relative increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures. Moreover, strip cropping has the added advantage of not having to compromise on agricultural yields.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch.

      Weaknesses:

      The evidence for strip cropping bringing added value for biodiversity is mixed at best. Yes, there is an increase in relative abundance and species richness at the field level, but it is not convincingly shown this difference is robust or can be linked to clear structural and hypothesised advantages of the strip cropping system. The same results could have been used to conclude that there are only very limited signs of real added value of strip cropping compared to monocultures.

      There are a number of reasons for this:

      (1) Significant differences disappear at crop level, as the authors themselves clearly acknowledge, meaning that there are no differences between pairs of similar crops in the strip cropping fields and their respective monoculture. This would mean the strips effectively function as "mini-monocultures". The significant relative differences at the field level could be an artifact of aggregation instead of structural differences between strip cropping and monocultures; with enough data points things tend to get significant despite large variance. This should have been elaborated further upon by the authors with additional analyses, designed to find out where differences originate and what it tells about the functioning of the system. Or it should have provided ample reason for cautioning in drawing conclusions about the supposed effectiveness of strip cropping based on these findings.

      (2) The authors report percentages calculated as relative change of species richness and abundance in strip cropping compared to monocultures after rarefaction. This is in itself correct, however, it can be rather tricky to interpret because the perspective on actual species richness and abundance in the fields and treatments is completely lost; the reported percentages are dimensionless. The authors could have provided the average cumulative number of species and abundance after rarefaction. Also, range and/or standard error would have been useful to provide information as to the scale of differences between treatments. This could provide a new perspective on the magnitude of differences between the two treatments which a dimensionless percentage cannot.

      (3) The authors appear to not have modelled the abundance of any of the dominant ground beetle species themselves. Therefore it becomes impossible to assess which important species are responsible (if any) for the differences found in activity density between stripcropping and monocultures and the possible life history traits related reasons for the differences, or lack thereof, that are found. A big advantage of using ground beetles is that many life history traits are well studied and these should be used whenever there is reason, as there clearly is in this case. Moreover, it is unclear which species are responsible for the difference in species richness found at the field level. Are these dominant species or singletons? Do the strip cropping fields contain species that are absent in the monoculture fields and are not the cause of random variation or sampling? Unfortunately, the authors do not report on any of these details of the communities that were found, which makes the results much less robust.

      (4) In the discussion they conclude that there is only a limited amount of interstrip movement by ground beetles. Otherwise, the results of the crop-level statistical tests would have shown significant deviation from corresponding monocultures. This is a clear indication that the strips function more like mini-monocultures instead of being more than the sum of its parts.

      (5) The RDA results show a modelled variable of differences in community composition between strip cropping and monoculture. Percentages of explained variation of the first RDA axis are extremely low, and even then, the effect of location and/or year appear to peak through (Figure S3), even though these are not part of the modelling. Moreover, there is no indication of clustering of strip cropping on the RDA axis, or in fact on the first principal component axis in the larger RDA models. This means the explanatory power of different treatments is also extremely low. The crop level RDA's show some clustering, but hardly any consistent pattern in either communities of crops or species correlations, indicating that differences between strip cropping and monocultures are very small.

      Furthermore, there are a number of additional weaknesses in the paper that should be addressed:

      The introduction lacks focus on the issues at hand. Too much space is taken up by facts on insect decline and land sharing vs. land sparing and not enough attention is spent on the scientific discussion underlying the statements made about crop diversification as a restoration strategy. They are simply stated as facts or as hypotheses with many references that are not mentioned or linked to in the text. An explicit link to the results found in the large number of references should be provided.

      The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similarly to intercropping, a technique that has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness? This should be the main testing point and agenda of strip cropping. Do the biodiversity benefits that have been shown for intercropping also work in strip cropping fields? The ground beetles are one way to test this. Hypotheses should originate from this and should be stated clearly and mechanistically.

      One could question how useful indicator species analysis (ISA) is for a study in which predominantly highly eurytopic species are found. These are by definition uncritical of their habitat. Is there any mechanistic hypothesis underlying a suspected difference to be found in preferences for either strip cropping or monocultures of the species that were expected to be caught? In other words, did the authors have any a priori reasons to suspect differences, or has this been an exploratory exercise from which unexplained significant results should be used with great caution?

      However, setting these objections aside there are in fact significant results with strong species associations both with monocultures and strip cropping. Unfortunately, the authors do not dig deeper into the patterns found a posteriori either. Why would some species associate so strongly with strip cropping? Do these species show a pattern of pitfall catches that deviate from other species, in that they are found in a wide range of strips with different crops in one strip cropping field and therefore may benefit from an increased abundance of food or shelter? Also, why would so many species associate with monocultures? Is this in any way logical? Could it be an artifact of the data instead of a meaningful pattern? Unfortunately, the authors do not progress along these lines in the methods and discussion at all.

      A second question raised in the introduction is whether the arable fields that form part of this study contain rare species. Unfortunately, the authors do not elaborate further on this. Do they expect rare species to be more prevalent in the strip cropping fields? Why? Has it been shown elsewhere that intercropping provides room for additional rare species?

      Considering the implications the results of this research can have on the wider discussion of bending the curve and the effects of agroecological measures, bold claims should be made with extreme restraint and be based on extensive proof and robust findings. I am not convinced by the evidence provided in this article that the claim made by the authors that strip cropping is a useful tool for bending the curve of biodiversity loss is warranted.

    5. Author response:

      We thank all reviewers for the highly detailed review and the time and effort which has been invested in this review. We have read their perspectives, questions and suggested improvements with great interest. We have reflected on the public review in detail and have made the first provisional responses which are outlined below. First, we would like to respond to four main issues pointed out by the editor and reviewers:

      (1) Lack of yield data in the manuscript: There have been yield data collected in most of the sites and years of our study, and these have already been published and cited in our manuscript. In the appendix of our manuscript, we included a table with yield data for the sites and years in which the beetle diversity was studied. These data show that strip cropping does not cause a systematic yield reduction.

      (2) Sampling design clarification: Our paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight inconsistencies in how data were collected or processed (e.g. taxonomic level of species identification). We will explain the sampling design and data analysis in more detail to increase clarity and transparency.

      (3) Additional data analysis: In the revised manuscript we will present an analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. This will give better insight of the variation in responses among ground beetle taxa.

      (4) Restrict findings to our system: We will nuance our findings further and will focus more strongly on the implications of our data on ground beetle communities, rather than on agrobiodiversity in a broader sense.

      We will further work on improving the manuscript based on reviewers feedback in the coming weeks, aiming to submit a revised version of the manuscript at the end of February.

      Detailed response to editor and reviewers:

      Editor Comments:

      (1) You only have analyzed ground beetle diversity, it would be important to add data on crop yields, which certainly must be available (note that in normal intercropping these would likely be enhanced as well).

      Most yield data have been published in three previous papers, which we already cited or will cite (one was not yet published at the time of submission). Our argumentation is based on these studies. We had also already included a table in the appendix that showed the yield data that relates specifically to our locations and years of measurement. The finding that strip cropping does not majorly affect yield is based on these findings. We will consider changing the title of our manuscript to remove the explicit focus on yield.

      (2) Considering the heterogeneous data involving different experiments it is particularly important to describe the sampling design in detail and explain how various hierarchical levels were accounted for in the analysis.

      We agree that some important details to our analysis were not described in sufficient detail. Especially reviewer 2 pointed out several relevant points that we did account for in our analyses, but which were not clear from the text in the methods section. We are convinced that our data analyses are robust and that our conclusions are supported by the data. We will revise the methods section to make our approach clearer and more transparent.

      (3) In addition to relative changes in richness and density of ground beetles you should also present the data from which these have been derived. Furthermore, you could also analyze and interpret the response of the different individual taxa to strip cropping.

      With our heterogeneous dataset it was quite complicated to show overall patterns of absolute changes in ground beetle abundance and richness, especially for the field-level analyses. As the sampling design was not always the same and occasionally samples were missing, the number of year series that made up a datapoint were different among locations and years. However, we always made sure that for the comparison of a paired monoculture and strip cropping field, the number of year series was always made equal through rarefaction. That is, the number of ground beetle(s) (species) are always expressed as the number per 2 to 6 samples. Therefore, we prefer to stick to relative changes as we are convinced that this gives a fairer representation of our complex dataset.

      We agree with the second point that both the editor and several reviewers pointed out. The indicator species analyses that we used were biased by rare species, and we now omit this analysis. Instead, we will include a GLM analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. We chose for genera here (and not species) as we could then include all locations and years within the analysis, and in most cases a genus was dominated by a single species (but notable exceptions were Amara and Harpalus, which were made up of several species). We will illustrate these findings still in a similar fashion as we did for the indicator species analysis.

      (4) Keep to your findings and don't overstate them but try to better connect them to basic ecological hypotheses potentially explaining them.

      After careful consideration of the important points that reviewers point out, we decided to nuance our points about biodiversity conservation along two key lines: (1) the extent to which ground beetles can be indicators of wider biodiversity changes; and (2) our findings that are not as straightforward positive as our narrative suggests. We still believe that strip cropping contributes positively to carabid communities, and will carefully check the text to avoid overstatements.

      Reviewer 1:

      Summary:

      This study demonstrates that strip cropping enhances the taxonomic diversity of ground beetles across organically-managed crop systems in the Netherlands. In particular, strip cropping supported 15% more ground beetle species and 30% more individuals compared to monocultures.

      Strengths:

      A well-written study with well-analyzed data of a complex design. The data could have been analyzed differently e.g. by not pooling samples, but there are pros and cons for each type of analysis and I am convinced this will not affect the main findings. A strong point is that data were collected for 4 years. This is especially strong as most data on biodiversity in cropping systems are only collected for one or two seasons. Another strong point is that several crops were included.

      We thank reviewer 1 for their kind words and agree with this strength of the paper. The paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight inconsistencies in how data were collected or processed (e.g. taxonomic level of species identification).  

      Weaknesses:

      This study focused on the biodiversity of ground beetles and did not examine crop productivity. Therefore, I disagree with the claim that this study demonstrates biodiversity enhancement without compromising yield. The authors should present results on yield or, at the very least, provide a stronger justification for this statement.

      We acknowledge that we indeed did not formally analyze yield in our study, but we have good reason for this. The claim that strip cropping does not compromise yield comes from several extensive studies (Juventia et al., 2024; Ditzler et al., 2023; Carillo-Reche et al., 2023) that were conducted in nearly all the sites and years that we included in our study. We chose not to include formal analyses of productivity for two key reasons: (1) a yield analysis would duplicate already published analyses, and (2) we prefer to focus more on the ecology of ground beetles and the effect of strip cropping on biodiversity, rather than diverging our focus also towards crop productivity. Nevertheless, we have shown the results on yield in Table S6 and refer extensively to the studies that have previously analyzed this data.

      Reviewer 2:

      Summary:

      The authors aimed to investigate the effects of organic strip cropping on carabid richness and density as well as on crop yields. They find on average higher carabid richness and density in strip cropping and organic farming, but not in all cases.

      Strengths:

      Based on highly resolved species-level carabid data, the authors present estimates for many different crop types, some of them rarely studied, at the same time. The authors did a great job investigating different aspects of the assemblages (although some questions remain concerning the analyses) and they present their results in a visually pleasing and intuitive way.

      We appreciate the kind words of reviewer 2 and their acknowledgement of the extensiveness of our dataset. In our opinion, the inclusion of many different crops is indeed a strength, rarely seen in similar studies; and we are happy that the figures are appreciated.

      Weaknesses:

      The authors used data from four different strip cropping experiments and there is no real replication in space as all of these differed in many aspects (different crops, different areas between years, different combinations, design of the strip cropping (orientation and width), sampling effort and sample sizes of beetles (differing more than 35 fold between sites; L 100f); for more differences see L 237ff). The reader gets the impression that the authors stitched data from various places together that were not made to fit together. This may not be a problem per se but it surely limits the strength of the data as results for various crops may only be based on small samples from one or two sites (it is generally unclear how many samples were used for each crop/crop combination).

      The paper indeed combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight differences in the experimental design. At the time that we did our research, there were only a handful of farmers that were employing strip cropping within the Netherlands, which greatly reduced the number of fields for our study. Therefore, we worked in the sites that were available and studied as many crops on these sites. Since there was variation in the crops grown in the sites, for some crops we have limited replication. In the revision we will explain this more clearly.

      One of my major concerns is that it is completely unclear where carabids were collected. As some strips were 3m wide, some others were 6m and the monoculture plots large, it can be expected that carabids were collected at different distances from the plot edge. This alone, however, was conclusively shown to affect carabid assemblages dramatically and could easily outweigh the differences shown here if not accounted for in the models (see e.g. Boetzl et al. (2024) or Knapp et al. (2019) among many other studies on within field-distributions of carabids).

      Point well taken and we will present a more detailed description of the sampling design in the methods. Samples were always taken at least 10 meters into the field, and always in the middle of the strip. This would indeed mean that there is a small difference between the 3- and 6m wide strips regarding distance from another strip, but this was then only a difference of 1.5 to 3 meters from the edge. A difference that, based on our own extensive experience with ground beetle communities, will not have a large impact on the findings of ground beetles. The distance from field/plot edges was similar between monocultures and strip cropped fields.

      The authors hint at a related but somewhat different problem in L 137ff - carabid assemblages sampled in strips were sampled in closer proximity to each other than assemblages in monoculture fields which is very likely a problem. The authors did not check whether their results are spatially autocorrelated and this shortcoming is hard to account for as it would have required a much bigger, spatially replicated design in which distances are maintained from the beginning. This limitation needs to be stated more clearly in the manuscript.

      This is a limitation that is hard to avoid in comparisons between strip cropping and monoculture systems because the use of a statistically robust design with sufficient replication and still using field sizes that are representative for farming practice are often not possible. We will acknowledge this limitation in the revised manuscript. To allow a fair comparison based on sufficient number of replications, we chose to combine data from several years and locations (despite this not being the ideal experimental design). This approach has the drawback that ground beetle communities are difficult to compare. Therefore, we chose to further investigate two years of data from Wageningen as the factorial design allowed a fair comparison between monocultures and strip cropping. We analyzed three crop combinations during two years, but we still cannot exclude a potential influence of spatial autocorrelation. We acknowledged this limitation in our original submission, and we will clarify this point further in the revision. 

      Similarly, we know that carabid richness and density depend strongly on crop type (see e.g. Toivonen et al. (2022)) which could have biased results if the design is not balanced (this information is missing but it seems to be the case, see e.g. Celeriac in Almere in 2022).

      The samples size ranges between 2 and 6 per combination of cropping design, crop, location and year. We believe that this will allow a meaningful analysis. Moreover, our main focus is the comparison between monoculture and strip cropping, and not the comparison between different crops. Even though we show that crop types have different ground beetle communities, we are most interested in the contrast of ground beetle communities in strip cropping and monoculture systems.  

      A more basic problem is that the reader neither learns where traps were located, how missing traps were treated for analyses how many samples there were per crop or crop combination (in a simple way, not through Table S7 - there has to have been a logic in each of these field trials) or why there are differences in the number of samples from the same location and year (see Table S7). This information needs to be added to the methods section.

      Point well taken. We will clarify this further in the revised manuscript. As we combined data from several experimental designs that originally had slightly different research questions, this in part caused differences between numbers of rounds or samples per crop, location or year.

      As carabid assemblages undergo rapid phenological changes across the year, assemblages that are collected at different phenological points within and across years cannot easily be compared. The authors would need to standardize for this and make sure that the assemblages they analyze are comparable prior to analyses. Otherwise, I see the possibility that the reported differences might simply be biased by phenology.

      We agree and we dealt with this issue by using year series instead of using individual samples of different rounds. While this approach is not perfect, it allows us to get the best possible impression of the entire ground beetle community across seasons. For our analyses we had the choice to only include data from sampling rounds that were conducted at the same time, or to include all available data. We chose to analyze all data, and made sure that the number of samples between strip cropping and monoculture fields per location, year and crop was always the same by pooling and rarefaction. In this way we have analyzed a complex multi-year, multi-crop and multi-location dataset as good as we could.

      Surrounding landscape structure is known to affect carabid richness and density and could thus also bias observed differences between treatments at the same locations (lower overall richness => lower differences between treatments). Landscape structure has not been taken into account in any way.

      We did not include landscape structure as there are only 4 sites, which does not allow a meaningful analysis of potential effects landscape structure. Studying how landscape interacts with strip cropping to influence insect biodiversity would require at least, say 15 to 20 sites, which was not feasible for this study. However, such an analysis may be possible in an ongoing project (CropMix) which includes many farms that work with strip cropping.

      In the statistical analyses, it is unclear whether the authors used estimated marginal means (as they should) - this needs to be clarified.

      In the revised manuscript we will further clarify this point.

      In addition, and as mentioned by Dr. Rasmann in the previous round (comment 1), the manuscript, in its current form, still suffers from simplified generalizations that 'oversell' the impact of the study and should be avoided. The authors restricted their analyses to ground beetles and based their conclusions on a design with many 'heterogeneities' - they should not draw conclusions for farmland biodiversity but stick to their system and report what they found. Although I understand the authors have previously stated that this is 'not practically feasible', the reason for this comment is simply to say that the authors should not oversell their findings.

      In the revised manuscript, we will nuance our findings by explaining that strip cropping is a potentially useful tool to support ground beetle biodiversity in agricultural fields, but the effects on other taxa still needs to be further explored.

      Reviewer 3:

      Summary:

      In this paper, the authors made a sincere effort to show the effects of strip cropping, a technique of alternating crops in small strips of several meters wide, on ground beetle diversity. They state that strip cropping can be a useful tool for bending the curve of biodiversity loss in agricultural systems as strip cropping shows a relative increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures. Moreover, strip cropping has the added advantage of not having to compromise on agricultural yields.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch.

      We thank reviewer 3 for their kind words and appreciation for the simple language and analysis that we used.

      Weaknesses:

      The evidence for strip cropping bringing added value for biodiversity is mixed at best. Yes, there is an increase in relative abundance and species richness at the field level, but it is not convincingly shown this difference is robust or can be linked to clear structural and hypothesised advantages of the strip cropping system. The same results could have been used to conclude that there are only very limited signs of real added value of strip cropping compared to monocultures.

      Point well taken. We agree that the effect of strip cropping on carabid beetle communities are subtle and we will nuance the text in the revised version to reflect this.

      There are a number of reasons for this:

      (1) Significant differences disappear at crop level, as the authors themselves clearly acknowledge, meaning that there are no differences between pairs of similar crops in the strip cropping fields and their respective monoculture. This would mean the strips effectively function as "mini-monocultures".

      This is indeed in line with our conclusions. Based on our data and results, the advantages of strip cropping seem mostly to occur because crops with different communities are now on a same field, rather than that within the strips you get mixtures of communities related to different crops. We discussed this in the first paragraph of the discussion in the original submission.

      The significant relative differences at the field level could be an artifact of aggregation instead of structural differences between strip cropping and monocultures; with enough data points things tend to get significant despite large variance. This should have been elaborated further upon by the authors with additional analyses, designed to find out where differences originate and what it tells about the functioning of the system. Or it should have provided ample reason for cautioning in drawing conclusions about the supposed effectiveness of strip cropping based on these findings.

      We believe that this is a misunderstanding of our approach. In the field-level analyses we pooled samples from the same field (i.e. pseudo-replicates were pooled), resulting in a relatively small sample size of 50 samples. We will explain this better in the methods section. Therefore, the statement “with enough data points things tend to get significant” is not applicable here.

      (2) The authors report percentages calculated as relative change of species richness and abundance in strip cropping compared to monocultures after rarefaction. This is in itself correct, however, it can be rather tricky to interpret because the perspective on actual species richness and abundance in the fields and treatments is completely lost; the reported percentages are dimensionless. The authors could have provided the average cumulative number of species and abundance after rarefaction. Also, range and/or standard error would have been useful to provide information as to the scale of differences between treatments. This could provide a new perspective on the magnitude of differences between the two treatments which a dimensionless percentage cannot.

      We agree that this would be the preferred approach if we would have had a perfectly balanced dataset. However, this approach is not feasible with our unbalanced design and differences in sampling effort. While we acknowledge the limitation of the interpretation of percentages, it does allow reporting relative changes for each combination of location, year and crop. The number of samples on which the percentages were based were always kept equal (through rarefaction) between the cropping systems (for each combination of location, year and crop), but not among crops, years and location. The reason for this is that we did not always have an equal number of samples available between both cropping systems, and this approach allowed us to make a better estimation whenever more samples were available. For example, sometimes we had 2 samples from a strip cropped field and 6 from the monoculture, here we would use rarefaction up to 2 samples (where we would just have a better estimation from the monoculture). In other cases, we had 4 samples in both strip cropped and monoculture field, here we chose to use rarefaction to 4 samples to get a better estimation altogether. Adding a value for actual richness or abundance to the figures would have distorted these findings, as the variation would be huge (as it would represent the number of ground beetle(s) species per 2 to 6 pitfall samples). Furthermore, the dimension that reviewer 3 describes would thus be “The number of ground beetle species / individuals per 2 to 6 samples”, not a very informative unit either. We chose to trade-off better estimations of difference between cropping systems over a more readily interpretable unit.

      (3) The authors appear to not have modelled the abundance of any of the dominant ground beetle species themselves. Therefore it becomes impossible to assess which important species are responsible (if any) for the differences found in activity density between strip cropping and monocultures and the possible life history traits related reasons for the differences, or lack thereof, that are found. A big advantage of using ground beetles is that many life history traits are well studied and these should be used whenever there is reason, as there clearly is in this case. Moreover, it is unclear which species are responsible for the difference in species richness found at the field level. Are these dominant species or singletons? Do the strip cropping fields contain species that are absent in the monoculture fields and are not the cause of random variation or sampling? Unfortunately, the authors do not report on any of these details of the communities that were found, which makes the results much less robust.

      Thank you for raising this point. We have reconsidered our indicator species analysis and found that it is rather sensitive for rare species and insensitive for changes in common species. Therefore, we will replace the indicator species analyses with a GLM analysis for the 12 most common genera of ground beetles In the revised manuscript. This will allow us to go more in depth on specific traits of the genera which abundances change depending on the cropping system. In the revised manuscript, we will also discuss these common genera more in depth, rather than focusing on rarer species. Furthermore, we will add information on rarity and habitat preference to the table that shows species abundances per location (Table S2).

      (4) In the discussion they conclude that there is only a limited amount of interstrip movement by ground beetles. Otherwise, the results of the crop-level statistical tests would have shown significant deviation from corresponding monocultures. This is a clear indication that the strips function more like mini-monocultures instead of being more than the sum of its parts.

      This is in line with our point in the first paragraph of the discussion and an important message of our manuscript.

      (5) The RDA results show a modelled variable of differences in community composition between strip cropping and monoculture. Percentages of explained variation of the first RDA axis are extremely low, and even then, the effect of location and/or year appear to peak through (Figure S3), even though these are not part of the modelling. Moreover, there is no indication of clustering of strip cropping on the RDA axis, or in fact on the first principal component axis in the larger RDA models. This means the explanatory power of different treatments is also extremely low. The crop level RDA's show some clustering, but hardly any consistent pattern in either communities of crops or species correlations, indicating that differences between strip cropping and monocultures are very small.

      We agree and we make a similar point in the first paragraph of the discussion.

      Furthermore, there are a number of additional weaknesses in the paper that should be addressed:

      The introduction lacks focus on the issues at hand. Too much space is taken up by facts on insect decline and land sharing vs. land sparing and not enough attention is spent on the scientific discussion underlying the statements made about crop diversification as a restoration strategy. They are simply stated as facts or as hypotheses with many references that are not mentioned or linked to in the text. An explicit link to the results found in the large number of references should be provided.

      We will streamline the introduction by omitting the land sharing vs. land sparing topic and better linking references to our research findings.

      The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similarly to intercropping, a technique that has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness? This should be the main testing point and agenda of strip cropping. Do the biodiversity benefits that have been shown for intercropping also work in strip cropping fields? The ground beetles are one way to test this. Hypotheses should originate from this and should be stated clearly and mechanistically.

      We agree with the reviewer and will clarify this research direction clearer in the introduction of the revised manuscript.

      One could question how useful indicator species analysis (ISA) is for a study in which predominantly highly eurytopic species are found. These are by definition uncritical of their habitat. Is there any mechanistic hypothesis underlying a suspected difference to be found in preferences for either strip cropping or monocultures of the species that were expected to be caught? In other words, did the authors have any a priori reasons to suspect differences, or has this been an exploratory exercise from which unexplained significant results should be used with great caution?

      Point well taken. We agree that the indicator species analysis has limitations and therefore now replaced this with GLM analysis for the 12 most common ground beetle genera.

      However, setting these objections aside there are in fact significant results with strong species associations both with monocultures and strip cropping. Unfortunately, the authors do not dig deeper into the patterns found a posteriori either. Why would some species associate so strongly with strip cropping? Do these species show a pattern of pitfall catches that deviate from other species, in that they are found in a wide range of strips with different crops in one strip cropping field and therefore may benefit from an increased abundance of food or shelter? Also, why would so many species associate with monocultures? Is this in any way logical? Could it be an artifact of the data instead of a meaningful pattern? Unfortunately, the authors do not progress along these lines in the methods and discussion at all.

      We thank reviewer 3 for these valuable perspectives. In the revised manuscript, we will further explore the species/genera that respond to cropping systems and discuss these findings in more detail.

      A second question raised in the introduction is whether the arable fields that form part of this study contain rare species. Unfortunately, the authors do not elaborate further on this. Do they expect rare species to be more prevalent in the strip cropping fields? Why? Has it been shown elsewhere that intercropping provides room for additional rare species?

      The answer is simply no, we did not find more rare species in strip cropping. In the revised manuscript, we will add a column for rarity (according to waarneming.nl) in the table showing abundances of species per location. We only found two rare species, one of which we only found a single individual and one that was more related to the open habitat created by a failed wheat field. We will discuss this more in depth in the discussion.

      Considering the implications the results of this research can have on the wider discussion of bending the curve and the effects of agroecological measures, bold claims should be made with extreme restraint and be based on extensive proof and robust findings. I am not convinced by the evidence provided in this article that the claim made by the authors that strip cropping is a useful tool for bending the curve of biodiversity loss is warranted.

      We believe that strip cropping can be a useful tool because farmers readily adopt it and it can result in modest biodiversity gains without yield loss. However, strip cropping is indeed not a silver bullet (which we also don’t claim). We will nuance the implications of our study in the revised manuscript.

    1. eLife Assessment

      This is a useful and potentially significant set of experiments. The authors found that cmk-1 and tax-6 act in separate habituation processes, primarily in AFD, and both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). While the study is significant, it is currently somewhat incomplete as key control experiments are needed in order to support the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene, and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring the rate of heat-evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.

      Conclusions: cmk-1 and tax-6 act in separate habituation processes, primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.

      Strengths:

      The effect size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.

      Weaknesses:

      The major concern about this manuscript is the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 use the words 'habituation' or 'habituation-like' 10 times, however, they use 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however, they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to the fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where the application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity, without them, it isn't actually clear what biological process is actually being studied.

      While the discrepancy between the in vitro phosphorylation experiments and the in silico predictions was discussed, the substantial discrepancy (over 85% of the substrates in the smaller in vitro dataset were not identified in the larger dataset) between the two different in vitro datasets was not discussed. This is surprising, as these approaches were quite similar, and it may indicate a measure of unreliability in the in vitro datasets (or high false negative rates). Additionally, the rationale for, and distinction between, the two separate in vitro experiments is not made clear.

      Line 207: After reporting that both tax-6 and cnb-1 mutants have high spontaneous reversals, it is not made clear why cnb-1 is not further explored in the paper. Additionally, this spontaneous reversal data should be in a supplementary figure.

      Figure 3 -S1: This model doesn't explain why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement (presumably by reducing the inhibition by tax-6) but the +cyclo A group (inhibited tax-6) showed weaker response decrement, as here there is even further weakened inhibition of tax-6 on this process. Also, the cmk-1(lf) +cyclo A group is labeled as constitutive habituation, however, this doesn't appear to be the case in Figure 3 (seems like a similar initial level and response decrement phenotype to wildtype).

      More discussion of the significance of the sites of cmk-1 and tax-6 function in the neural circuit should take place. Additionally, incorporating the suspected loci of cmk-1 and tax-6 in the neural circuit into the model would be interesting (using proper hypothetical language). For example, as it seems like AFD is not required for the naïve reversal response but just its reduction, cmk-1 activity in AFD might be generating inhibition of the reversal response by AFD. It certainly would be understandable if this isn't workable, given extrasynaptic signaling and other unknowns, but it potentially could also be helpful in generating a working model for these complex interactions. For example, cmk-1 induces AIZ inhibition of AVA (AIZ is electrically coupled to AFD), and tax-6 reduces RIM activation of AVA (these neurons are also electrically coupled according to the diagram). RIM is also a neuropeptide-rich neuron, so this could allow it to interact with the cmk-1-related process(es) in AFD. Some discussion of possibilities like this could be informative.

      Provide an explanation for why some of the experiments in Figure 4 have such a high N, compared to other experiments.

      Because the loss of function and gain of function mutations in cmk-1 have a similar effect, it is likely that this thermosensory plasticity phenotype is sensitive to levels of cmk-1 activity. Therefore, it is not surprising that the cmk-1 promoter failed to rescue very well as these plasmid-driven rescues often result in overexpression. Given this and that the cmk-1p rescue itself was so modest, these rescue experiments are not entirely convincing (and very hard to interpret; for example, is the AFD rescue or the ASER rescue more complete? The ASER one is actually closer to the cmk-1p rescue). Given the sensitivity to cmk-1 activity levels, a degradation strategy would be more likely to deliver clear results (or perhaps even the overactivation approach used for tax-6).

    3. Reviewer #2 (Public review):

      Summary:

      The reduction in a response to a specific stimulus after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however, the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie habituation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive habituation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive habituation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive habituation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermo-nociceptive habituation. The authors propose a model based on their findings illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate habituation to thermo-nociceptive stimuli in a complex manner.

      Strengths:

      (1) Given the conservation of habituation across phylogeny, identifying genes and mechanisms that underlie nociceptive habituation in C. elegans may be relevant for understanding chronic pain in humans.

      (2) The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.

      (3) The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and habituation is elegant.

      (4) The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron-specific promoters in the nematode is a clear strength of the genetic model system.

      Weaknesses:

      (1) The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive habituation which disrupts the logical flow of the manuscript.

      (2) The physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

      (3) It is not clear if Calcineurin is already a known substrate of CaM Kinases in other systems or if this finding is new.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene, and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring the rate of heat-evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.

      Conclusions: cmk-1 and tax-6 act in separate habituation processes, primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.

      Strengths:

      The effect size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.

      Thanks a lot for these positive remarks.

      Weaknesses:

      The major concern about this manuscript is the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 use the words 'habituation' or 'habituation-like' 10 times, however, they use 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however, they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to the fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where the application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity, without them, it isn't actually clear what biological process is actually being studied.

      Thanks for the comment. As this reviewer points out, “adaptation” and “habituation” are often conflated. Many scientists (maybe not the majority though) use a less stringent definition for the word habituation, than the one presented by this reviewer. More particularly, the term habituation is used in human pain research to refer solely to the reduction of response to repeated stimuli, in the absence of a detailed assessment of the more stringent criteria mentioned here. In addition to the practice in pain research, the main reason why we steered toward ‘habituation’ from our previous publication is because it immediately conveys the idea of a response reduction, whereas ‘adaptation’ could in principle be either an up-regulation or a down-regulation of the response (again, based on various definitions). But we agree that using the word “habituation” came at the cost of triggering a confusion about the exact nature of the process, for those considering the stricter definition of the word “habituation”. In the manuscript under revision, we are changing this terminology to “adaptation”. Also following suggestions from Reviewer 2, we are strengthening the description of the protocol in the Result section and clarifying, why the adaptation phenomenon is not a ‘thermal damage’ effect or ‘fatigue’ effect in the neuro-muscular circuit controlling reversal.

      While the discrepancy between the in vitro phosphorylation experiments and the in silico predictions was discussed, the substantial discrepancy (over 85% of the substrates in the smaller in vitro dataset were not identified in the larger dataset) between the two different in vitro datasets was not discussed. This is surprising, as these approaches were quite similar, and it may indicate a measure of unreliability in the in vitro datasets (or high false negative rates).

      Thanks for the comment. This is an important aspect which we will more extensively cover in the Discussion section of the revised manuscript.

      The strong consistency of the CMK-1 recognition consensus sequences across the two in vitro dataset speaks against the unreliability of the analyses. Instead, there are a few points to highlight that explain the somewhat low degree of overlap between the two datasets, which indeed relate to the false negative rates as this reviewer suggests.

      (1) In the peptide library analysis, Trypsin cleavage prior to kinase treatment will leave a charged N-term or C- terminus and in addition remove part of the protein context required for efficient kinase recognition. This will have a variable effect across the different substrates in the peptide library, depending on the distance between the cleavage site and the phosphosite, but will not affect the native protein library. This effect increases the false negative rate in the peptide library.

      (2) The number and distribution of “available substrate phosphosites” diverge in the two libraries. Indeed, the peptide library is expected to contain a markedly larger diversity of potential CMK-1 substrate sites than the protein library (because the Trypsin digestion will reveal substrates that are normally buried in a native protein), but the depth of MS analysis is the same for the two libraries. In somewhat simplistic terms, the peptide-library analysis is prone to be saturated with abundant phosphorylated peptides, which prevent detecting all phosphosites. If the peptide analysis could have been made deeper, we would probably have increased the overlap (at the cost of increasing the number of false positive too).

      (3) We have chosen quite strict criteria and applied them separately to define each hit list; therefore, we know we have many false negatives in each list, which will naturally reduce the expected overlap.

      As we will clarify in the revised manuscript, we tend to give more trust to the protein-library dataset (since substrates are in a configuration closer to that in vivo), with those hits also present in the peptide dataset (like TAX-6 was) as the most convincing hits, as they could be validated in a second type of experiment.

      Additionally, the rationale for, and distinction between, the two separate in vitro experiments is not made clear.

      We reasoned that both substrate types have their own benefits and limitations (as discussed in the manuscript), so it was an added value to run both. We proposed that the subset of targets present in both datasets to be the most solid list of candidates. We will also reinforce our point in the revised discussion that the protein-library is likely to contain much less false positives.

      Line 207: After reporting that both tax-6 and cnb-1 mutants have high spontaneous reversals, it is not made clear why cnb-1 is not further explored in the paper. Additionally, this spontaneous reversal data should be in a supplementary figure.

      We kept the focus of the article primarily on TAX-6, because it was identified as CMK-1 target in vitro; CNB-1 was not. Moreover, we didn’t have cnb-1(gf) mutants to pursue the analysis, and we were stuck by the cnb-1(lf) constitutive high reversal rate for any further follow up. We have added a supplementary file to present the spontaneous reversals rates.

      Figure 3 -S1: This model doesn't explain why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement (presumably by reducing the inhibition by tax-6) but the +cyclo A group (inhibited tax-6) showed weaker response decrement, as here there is even further weakened inhibition of tax-6 on this process. Also, the cmk-1(lf) +cyclo A group is labeled as constitutive habituation, however, this doesn't appear to be the case in Figure 3 (seems like a similar initial level and response decrement phenotype to wildtype).

      Thanks a lot for the comment. We are glad that the presentation of our complex dataset was clear enough to bring the reader to that level of detailed reflection and interpretation on the proposed model. To address the two points raised in this reviewer’s comment, we are modifying to the model presentation and provide additional clarifications below, where we use the term adaptation instead of habituation (as in the revised Figure):

      Regarding the first point, “why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement … but the +cyclo A group showed weaker response decrement”. This is really a very good point, that cannot be easily explained if all the branches (arrows) in the model have the same weight or work as ON/OFF switches. We tried to convey the relative importance of the regulation effect via the thickness of the arrow lines (which we will clarified in the legend in the revised ms). The main ‘quantitative’ nuances to take into consideration here originate from 2 assuption of the model (which we are clarifying in the revised  manuscript):

      Assumption 1: the inhibitory effect of TAX-6 on the CMK-1 anti-adaptation branch and the inhibitory effect of TAX-6 on the CMK-1 pro-adaptation branch are not of the same magnitude (we have further enhanced the line thickness differences in the revised model, top left panel for wild type).

      Assumption 2: the two antagonistic direct effects of CMK-1 on adaptation are not of the same magnitude, most strikingly in the context of CMK-1(gf) mutants.

      In our model, the cyclosporin A treatment alone (bottom left panel) causes a strong boost on the CMK-1 inhibitory branch and a less marked boost on the CMK-1 activator branch (following assumption 1). This causes an imbalance between the two antagonist direct CMK-1-dependent drives, which reduces (but doesn’t fully block) adaptation. Indeed, we don’t observe a total block of adaptation with cyclosporin A in wild type, the effect being significantly milder than the totally non-adapting phenotypes seen, e.g., in TAX-6(gf) mutants. From there, the question is what happen in CMK-1(gf) background that would mask the anti-adaptation effect of Cyclosporin A? Here assumption 2 is relevant, and the CMK-1(gf) pro-adaptation direct branch is always prevalent and imbalance the regulation toward faster adaptation (the role of TAX-6 becoming negligible in the CMK-1(gf) background and ipso facto that of Cyclosporin A).

      Regarding the second point, “the cmk-1(lf) +cyclo A group is labeled as constitutive habituation”. We regret a confusing word choice in the first version of the manuscript; we intended to mean “normal habituation phenotype” but in the joint absence of antagonistic CMK-1 and TAX-6 regulatory signaling (so the regulation is not like in wild-type, but the phenotype ends up like in wild type). We are modifying the label to “normal adaptation” and will leave a note in the legend that an apparently normal adaptation phenotype seems to be the “default” situation when the two antagonistic regulatory pathways are shut off.

      More discussion of the significance of the sites of cmk-1 and tax-6 function in the neural circuit should take place. Additionally, incorporating the suspected loci of cmk-1 and tax-6 in the neural circuit into the model would be interesting (using proper hypothetical language). For example, as it seems like AFD is not required for the naïve reversal response but just its reduction, cmk-1 activity in AFD might be generating inhibition of the reversal response by AFD. It certainly would be understandable if this isn't workable, given extrasynaptic signaling and other unknowns, but it potentially could also be helpful in generating a working model for these complex interactions. For example, cmk-1 induces AIZ inhibition of AVA (AIZ is electrically coupled to AFD), and tax-6 reduces RIM activation of AVA (these neurons are also electrically coupled according to the diagram). RIM is also a neuropeptide-rich neuron, so this could allow it to interact with the cmk-1-related process(es) in AFD. Some discussion of possibilities like this could be informative.

      Thanks for the comment. These hypothetical inter-cellular communication pathways are indeed nice possibilities. On the other hand, we could envision several additional pathways. Following this helpful suggestion, we will expand the discussion of hypothetical models in the revised manuscript-

      Provide an explanation for why some of the experiments in Figure 4 have such a high N, compared to other experiments.

      The conditions with the highest n correspond to conditions which we have also used as ‘control’ condition for other type of experiments in the lab and as part of side projects, but which could be gathered for the present article. We have been working with cmk-1(lf) and tax-6(gf) mutants for many years… and the robust non-adapting phenotype was a reference point and a quality control when analyzing other non-adapting mutants.

      Because the loss of function and gain of function mutations in cmk-1 have a similar effect, it is likely that this thermosensory plasticity phenotype is sensitive to levels of cmk-1 activity. Therefore, it is not surprising that the cmk-1 promoter failed to rescue very well as these plasmid-driven rescues often result in overexpression. Given this and that the cmk-1p rescue itself was so modest, these rescue experiments are not entirely convincing (and very hard to interpret; for example, is the AFD rescue or the ASER rescue more complete? The ASER one is actually closer to the cmk-1p rescue). Given the sensitivity to cmk-1 activity levels, a degradation strategy would be more likely to deliver clear results (or perhaps even the overactivation approach used for tax-6).

      Thanks for the comment. We respectfully disagree with this reviewer’s statement “the loss of function and gain of function mutations in cmk-1 have a similar effect”. We suspect a confusion here, because our data clearly show that these two mutant types have an opposite phenotype. That being said, we interpret the weak rescue effect with cmk-1p as a probable result of overexpression or incomplete/imbalanced expression across neurons (as the promoter used might not include all the relevant regulatory regions). We dedicated considerable efforts to establish an endogenous CMK-1::degron knock in, for tissue-specific auxin-induced degradation (AID), but we were unfortunately not able to obtain consistent results. Unfortunately, the only useful data regarding CMK-1 place-of-action are the cell-specific rescue data already included in the report.

      Reviewer #2 (Public review):

      Summary:

      The reduction in a response to a specific stimulus after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however, the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie habituation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive habituation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive habituation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive habituation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermo-nociceptive habituation. The authors propose a model based on their findings illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate habituation to thermo-nociceptive stimuli in a complex manner.

      Strengths:

      (1) Given the conservation of habituation across phylogeny, identifying genes and mechanisms that underlie nociceptive habituation in C. elegans may be relevant for understanding chronic pain in humans.

      (2) The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.

      (3) The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and habituation is elegant.

      (4) The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron-specific promoters in the nematode is a clear strength of the genetic model system.

      Thanks a lot for these positive remarks.

      Weaknesses:

      (1) The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive habituation which disrupts the logical flow of the manuscript.

      We understand this point and we have carefully considered and (re-considered) the way to articulate the report. However, we could not present the story much differently as we would have no justification to investigate the role of TAX-6 and its interaction with CMK-1, if we would not have first identified it a phospho-target in vitro. Carefully considering this point, we found that the abstract of the first manuscript version was probably too cursory and susceptible to trigger wrong expectations among readers. We will extensively revise the abstract to clarify this point. Furthermore, we will reinforce this point in the last paragraph of the introduction.

      (2) The physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

      We do agree and will explicitly discuss this aspect in the revised Discussion section, and make is also clear from the abstract on.

      (3) It is not clear if Calcineurin is already a known substrate of CaM Kinases in other systems or if this finding is new.

      We are not aware of any studies having shown Calcineurin is a direct target of CaM kinase I. But it was found to be substrate of CaM kinase II as well as of other kinases, as we explicitly presented in the discussion section. We will complement the text mentioning we are not aware of Calcineurin having so far been reported to by a CaM kinase I substrate.

    1. eLife Assessment

      In this valuable manuscript, authors ablate cerebellar oligodendrocytes during postnatal development and show that synchrony of calcium transients in Purkinje neurons and behaviours are affected even at later stages. While the work is solid, it is incomplete in that the causal relationship between the two has not been sufficiently explored.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents convincing findings that oligodendrocytes play a regulatory role in spontaneous neural activity synchronisation during early postnatal development, with implications for adult brain function. Utilising targeted genetic approaches, the authors demonstrate how oligodendrocyte depletion impacts Purkinje cell activity and behaviours dependent on cerebellar function. Delayed myelination during critical developmental windows is linked to persistent alterations in neural circuit function, underscoring the lasting impact of oligodendrocyte activity.

      Strengths:

      (1) The research leverages the anatomically distinct olivocerebellar circuit, a well-characterized system with known developmental timelines and inputs, strengthening the link between oligodendrocyte function and neural synchronization.

      (2) Functional assessments, supported by behavioral tests, validate the findings of in vivo calcium imaging, enhancing the study's credibility.

      (3) Extending the study to assess the long-term effects of early-life myelination disruptions adds depth to the implications for both circuit function and behavior.

      Weaknesses:

      (1) The study would benefit from a closer analysis of myelination during the periods when synchrony is recorded. Direct correlations between myelination and synchronized activity would substantiate the mechanistic link and clarify if observed behavioral deficits stem from altered myelination timing.

      (2) Although the study focuses on Purkinje cells in the cerebellum, neural synchrony typically involves cross-regional interactions. Expanding the discussion on how localized Purkinje synchrony affects broader behaviors - such as anxiety, motor function, and sociality - would enhance the findings' functional significance.

      (3) The authors discuss the possibility of oligodendrocyte-mediated synapse elimination as a possible mechanism behind their findings, drawing from relevant recent literature on oligodendrocyte precursor cells. However, there are no data presented supporting this assumption. The authors should explain why they think the mechanism behind their observation extends beyond the contribution of myelination or remove this point from the discussion entirely.

      (4) It would be valuable to investigate the secondary effects of oligodendrocyte depletion on other glial cells, particularly astrocytes or microglia, which could influence long-term behavioral outcomes. Identifying whether the lasting effects stem from developmental oligodendrocyte function alone or also involve myelination could deepen the study's insights.

      (5) The authors should explore the use of different methods to disturb myelin production for a longer time, in order to further determine if the observed effects are transient or if they could have longer-lasting effects.

      (6) Throughout the paper, there are concerns about statistical analyses, particularly on the use of the Mann-Whitney test or using fields of view as biological replicates.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors use genetic tools to ablate oligodendrocytes in the cerebellum during postnatal development. They show that the oligodendrocyte numbers return to normal post-weaning. Yet, the loss of oligodendrocytes during development seems to result in decreased synchrony of calcium transients in Purkinje neurons across the cerebellum. Further, there were deficits in social behaviors and motor coordination. Finally, they suppress activity in a subset of climbing fibers to show that it results in similar phenotypes in the calcium signaling and behavioral assays. They conclude that the behavioral deficits in the oligodendrocyte ablation experiments must result from loss of synchrony.

      Strengths:

      Use of genetic tools to induce perturbations in a spatiotemporally specific manner.

      Weaknesses:

      The main weakness in this manuscript is the lack of a cohesive causal connection between the experimental manipulation performed and the phenotypes observed. Though they have taken great care to induce oligodendrocyte loss specifically in the cerebellum and at specific time windows, the subsequent experiments do not address specific questions regarding the effect of this manipulation. Calcium transients in Purkinje neurons are caused to a large extent by climbing fibers, but there is evidence for simple spikes to also underlie the dF/F signatures (Ramirez and Stell, Cell Reports, 2016). Also, it is erroneous to categorize these calcium signals as signatures of "spontaneous activity" of Purkinje neurons as they can have dual origins. Further, the effect of developmental oligodendrocyte ablation on the cerebellum has been previously reported by Mathis et al., Development, 2003. They report very severe effects such as the loss of molecular layer interneurons, stunted Purkinje neuron dendritic arbors, abnormal foliations, etc. In this context, it is hardly surprising that one would observe a reduction of synchrony in Purkinje neurons (perhaps due to loss of synaptic contacts, not only from CFs but also from granule cells). The last experiment with the expression of Kir2.1 in the inferior olive is hardly convincing. In summary, while the authors used a specific tool to probe the role of developmental oligodendrocytes in cerebellar physiology and function, they failed to answer specific questions regarding this role, which they could have done with more fine-grained experimental analysis.

    1. eLife Assessment

      In this valuable study, ectopic expression and knockdown strategies were used to assess the effects of increasing and decreasing Cyclic di-AMP on the developmental cycle in Chlamydia. The authors convincingly demonstrate that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of the transitionary gene hctA and late gene omcB. Whilst these results are intriguing, the model currently proposed is over-simplified and likely incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The paper by Lee and Ouellette explores the role of cyclic-d-AMP in chlamydial developmental progression. The manuscript uses a collection of different recombinant plasmids to up- and down-regulate cdAMP production, and then uses classical molecular and microbiological approaches to examine the effects of expression induction in each of the transformed strains.

      Strengths:

      This laboratory is a leader in the use of molecular genetic manipulation in Chlamydia trachomatis and their efforts to make such efforts mainstream is commendable. Overall, the model described and defended by these investigators is thorough and significant.

      Weaknesses:

      The biggest weakness in the document is their reliance on quantitative data that is statistically not significant, in the interpretation of results. These challenges can be addressed in a revision by the authors.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. Chlamydia are obligate intracellular bacterial pathogens that rely on eukaryotic host cells for growth. The chlamydial life cycle depends on a cell form developmental cycle that produces phenotypically distinct cell forms with specific roles during the infectious cycle. The RB cell form replicates amplifying chlamydia numbers while the EB cell form mediates entry into new host cells disseminating the infection to new hosts. Regulation of cell form development is a critical question in chlamydia biology and pathogenesis. Chlamydia must balance amplification (RB numbers) and dissemination (EB numbers) to maximize survival in its infection niche. The main findings In this manuscript show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of the transitionary gene hctA and late gene omcB. The authors also knocked down the expression of the dacA-ybbR operon and reported a reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary and is not well supported by the data.

      Describing the significance of the findings:

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well.

      Describing the strength of evidence:

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported.

      dacA-ybbR ectopic expression:

      For the dacA-ybbR ectopic expression experiments they show that hctA is induced early but there is no significant change in OmcB gene expression. This is problematic as when RBs are treated with Pen (this paper) and (DOI 10.1128/MSYSTEMS.00689-20) hctA is expressed in the aberrant cell forms but these forms do not go on to express the late genes suggesting stress events can result in changes in the developmental expression kinetic profile. The RNA-seq data are a little reassuring as many of the EB/Late genes were shown to be upregulated by dacA-ybbR ectopic expression in this assay.

      The authors also demonstrate that this ectopic expression reduces the overall growth rate but produces EBs earlier in the cycle but overall fewer EBs late in the cycle. This observation matches their model well as when RBs convert early there is less amplification of cell numbers.

      dacA knockdown and dacA(mut)

      The authors showed that dacA knockdown and ectopic expression of the dacA mutant both reduced the amount of c-di-AMP. The authors show that for both of these conditions, hctA and omcB expression is reduced at 24 hpi. This was also partially supported by the RNA-seq data for the dacA knockdown as many of the late genes were downregulated. However, a shift to an increase in RB-only genes was not readily evident. This is maybe not surprising as the chlamydial inclusion would just have an increase in RB forms and changes in cell form ratios would need more time points.

      Interestingly, the overall growth rate appears to differ in these two conditions, growth is unaffected by dacA knockdown but is significantly affected by the expression of the mutant. In both cases, EB production is repressed. The overall model they present does not support this data well as if RBs were blocked from converting into EBs then the growth rate should increase as the RB cell form replicates while the EB cell form does not. This should shift the population to replicating cells.

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper by Lee and Ouellette explores the role of cyclic-d-AMP in chlamydial developmental progression. The manuscript uses a collection of different recombinant plasmids to up- and down-regulate cdAMP production, and then uses classical molecular and microbiological approaches to examine the effects of expression induction in each of the transformed strains.

      Strengths:

      This laboratory is a leader in the use of molecular genetic manipulation in Chlamydia trachomatis and their efforts to make such efforts mainstream is commendable. Overall, the model described and defended by these investigators is thorough and significant.

      Weaknesses:

      The biggest weakness in the document is their reliance on quantitative data that is statistically not significant, in the interpretation of results. These challenges can be addressed in a revision by the authors.

      Thank you for these comments. We have generated new data, which we hope the reviewer will find more compelling. These will be included in a revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. Chlamydia are obligate intracellular bacterial pathogens that rely on eukaryotic host cells for growth. The chlamydial life cycle depends on a cell form developmental cycle that produces phenotypically distinct cell forms with specific roles during the infectious cycle. The RB cell form replicates amplifying chlamydia numbers while the EB cell form mediates entry into new host cells disseminating the infection to new hosts. Regulation of cell form development is a critical question in chlamydia biology and pathogenesis. Chlamydia must balance amplification (RB numbers) and dissemination (EB numbers) to maximize survival in its infection niche. The main findings In this manuscript show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of the transitionary gene hctA and late gene omcB. The authors also knocked down the expression of the dacA-ybbR operon and reported a reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary and is not well supported by the data.

      Thank you for your comments. Chlamydia is not an easy experimental system, but we will do our best to address the reviewer’s concerns in a revised submission.

      Describing the significance of the findings:

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well.

      Describing the strength of evidence:

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported.

      dacA-ybbR ectopic expression:

      For the dacA-ybbR ectopic expression experiments they show that hctA is induced early but there is no significant change in OmcB gene expression. This is problematic as when RBs are treated with Pen (this paper) and (DOI 10.1128/MSYSTEMS.00689-20) hctA is expressed in the aberrant cell forms but these forms do not go on to express the late genes suggesting stress events can result in changes in the developmental expression kinetic profile. The RNA-seq data are a little reassuring as many of the EB/Late genes were shown to be upregulated by dacA-ybbR ectopic expression in this assay.

      As the reviewer notes, we also generated RNAseq data, which validates that late gene transcripts (including sigma28 and sigma54 regulated genes) are statistically significantly increased earlier in the developmental cycle in parallel to increased c-di-AMP levels. The lack of statistical significance in the RT-qPCR data for omcB, which shows a trend of higher transcripts, is less concerning given the statistically significantly RNAseq dataset. We have reported the data from three replicates for the RT-qPCR and do not think it would be worthwhile to attempt more replicates in an attempt to “achieve” statistical significance.

      The authors also demonstrate that this ectopic expression reduces the overall growth rate but produces EBs earlier in the cycle but overall fewer EBs late in the cycle. This observation matches their model well as when RBs convert early there is less amplification of cell numbers.

      dacA knockdown and dacA(mut)

      The authors showed that dacA knockdown and ectopic expression of the dacA mutant both reduced the amount of c-di-AMP. The authors show that for both of these conditions, hctA and omcB expression is reduced at 24 hpi. This was also partially supported by the RNA-seq data for the dacA knockdown as many of the late genes were downregulated. However, a shift to an increase in RB-only genes was not readily evident. This is maybe not surprising as the chlamydial inclusion would just have an increase in RB forms and changes in cell form ratios would need more time points.

      Thank you for this comment. We agree that it is not surprising given the shift in cell forms. The reduction in hctA transcripts argues against a stress state as noted above by the reviewer, and the RNAseq data from dacA-KD conditions indicates at least that secondary differentiation has been delayed. We will try to clarify this in a revision.

      Interestingly, the overall growth rate appears to differ in these two conditions, growth is unaffected by dacA knockdown but is significantly affected by the expression of the mutant. In both cases, EB production is repressed. The overall model they present does not support this data well as if RBs were blocked from converting into EBs then the growth rate should increase as the RB cell form replicates while the EB cell form does not. This should shift the population to replicating cells.

      We agree that it seems that perturbing c-di-AMP production, whether by knockdown or overexpressing the mutant DacA(D164N), has an overall negative impact on chlamydial growth. We have generated new data, which we think will address this. These new data will be included in a revised manuscript.

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings.

    1. eLife Assessment

      The manuscript represents a fundamental advance in designing peptide inhibitors targeting Cdc20, a key activator and substrate-recognition subunit of the APC/C ubiquitin ligase. Supported by compelling biophysical and cellular evidence, the study lays a strong foundation for future developments in degron-based therapeutics. The unexpected findings regarding degradation efficiency highlight intriguing questions that merit further investigation. This work will interest researchers focused on peptide drug design targeting complex protein interactions.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/CCdc20 ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or apply peptides as portable motifs to achieve targeted degradation. Both of which are impactful.

      Weaknesses:

      A few minor comments:

      (1) In my opinion, more attention to the solubility issue needs to be discussed and/or tested. On page 10, what is the solubility of D2 before a modification was made? The authors mentioned that position 2 is likely solvent exposed, it is not immediately clear to me why the mutation made was from one hydrophobic residue to another. What was the level of improvement in solubility? Are there any affinity data associated with the peptide that differ with D2 only at position 2?

      (2) I'm not entirely convinced that the D19 density not observed in the crystal structure was due to crystal packing. This peptide is peculiar as it also did not induce any thermal stabilization of Cdc20 in the cellular thermal shift assay. Perhaps the binding of this peptide could be investigated in more detail (i.e., NMR?) Or at least more explanation could be provided.

    3. Reviewer #2 (Public review):

      Summary:

      The authors took a well-characterised (partly by them), important E3 ligase, in the anaphase-promoting complex, and decided to design peptide inhibitors for it based on one of the known interacting motifs (called D-box) from its substrates. They incorporate unnatural amino acids to better occupy the interaction site, improve the binding affinity, and lay foundations for future therapeutics - maybe combining their findings with additional target sites.

      Strengths:

      The paper is mostly strengths - a logical progression of experiments, very well explained and carried out to a high standard. The authors use a carefully chosen variety of techniques (including X-ray crystallography, multiple binding analyses, and ubiquitination assays) to verify their findings - and they impressively achieve their goals by honing in on tight-binders.

      Weaknesses:

      Some things are not explained fully and it would be useful to have some clarification. Why did the authors decide to model their inhibitors on the D-box motif and not the other two SLiMs that they describe? What exactly do they mean when they say their 'observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast 'pseudo-substrate' inhibitor Acm1, acts to impede polyubiquitination of the bound protein'? It's an interesting thing to think about, and probably the paper they cite explains it more but I would like to know without having to find that other paper.

    4. Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Weaknesses:

      One important finding of the study is that the strongest binders did not correlate with the fastest degradation in a cellular assay, but explanations for this behavior were not supported experimentally. Some minor issues regarding experimental replicates and details were also noted.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/C<sup>Cdc20</sup> ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or apply peptides as portable motifs to achieve targeted degradation. Both of which are impactful.

      Weaknesses:

      A few minor comments:

      (1) In my opinion, more attention to the solubility issue needs to be discussed and/or tested. On page 10, what is the solubility of D2 before a modification was made? The authors mentioned that position 2 is likely solvent exposed, it is not immediately clear to me why the mutation made was from one hydrophobic residue to another. What was the level of improvement in solubility? Are there any affinity data associated with the peptide that differ with D2 only at position 2?

      The reviewer is correct that we have not done any detailed solubility characterisation; we refer only to observations rather than quantitative analysis. We wrote that we reverted from Leu to Ala due to solubility - we will clarify this statement to say that that we reverted to Ala, as it was the residue present in D1, for which we observed a measurable affinity by SPR and saw a concentration-dependent response in the thermal shift analysis. We do not have any peptides or affinity data that explore single-site mutations with the parental peptide of D2. D2 is included in the paper because of its link to the consensus D-box sequence and thus was the logical path to the investigations into positions 3 and 7 that come later in the manuscript.

      (2) I'm not entirely convinced that the D19 density not observed in the crystal structure was due to crystal packing. This peptide is peculiar as it also did not induce any thermal stabilization of Cdc20 in the cellular thermal shift assay. Perhaps the binding of this peptide could be investigated in more detail (i.e., NMR?) Or at least more explanation could be provided.

      This section will be clarified. The lack of observed density was likely due to the relatively low affinity of D19 and also to the lack of binding of the three C-terminal residues in the crystal, and consequently it has a further reduced affinity. The current wording in the manuscript puts greater emphasis on this second aspect being a D19-specific issue, even though it applies to all four soaked peptides. The extent of peptide-induced thermal stabilisations observed by TSA and CETSA is different, with the latter experiment consistently showing smaller shifts. This observation may be due to the more complex medium (cell lysate vs. purified protein) and/or different concentrations of the proteins in solution. In the CETSA, we over-expressed a HiBiT-tagged Cdc20, which is present in addition to any endogenously expressed Cdc20. Although we did not investigate it, the near identical D-box binding sites on Cdc20 and Cdh1 would suggest that there will be cross-specificity, which could further influence the CETSA experiments.

      Reviewer #2 (Public review):

      Summary:

      The authors took a well-characterised (partly by them), important E3 ligase, in the anaphase-promoting complex, and decided to design peptide inhibitors for it based on one of the known interacting motifs (called D-box) from its substrates. They incorporate unnatural amino acids to better occupy the interaction site, improve the binding affinity, and lay foundations for future therapeutics - maybe combining their findings with additional target sites.

      Strengths:

      The paper is mostly strengths - a logical progression of experiments, very well explained and carried out to a high standard. The authors use a carefully chosen variety of techniques (including X-ray crystallography, multiple binding analyses, and ubiquitination assays) to verify their findings - and they impressively achieve their goals by honing in on tight-binders.

      Weaknesses:

      Some things are not explained fully and it would be useful to have some clarification. Why did the authors decide to model their inhibitors on the D-box motif and not the other two SLiMs that they describe?

      For completeness, in addition to the D-box we did originally construct peptides based on the ABBA and KEN-box motifs, but they did not show any shift in melting temperature of cdc20 in the thermal shift assay whereas the D-box peptides did; consequently, we focused our efforts on the D-box peptides. Moreover, there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study by Mark Hall’s lab (described in Qin et al. 2016), which tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated. They observed that whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study from David Morgan’s lab (Hartooni et al. 2022) looking at binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.

      What exactly do they mean when they say their 'observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast 'pseudo-substrate' inhibitor Acm1, acts to impede polyubiquitination of the bound protein'? It's an interesting thing to think about, and probably the paper they cite explains it more but I would like to know without having to find that other paper.

      Interesting results from a number of labs (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10. In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator.

      After further reading on this topic, we will modify the relevant piece of text from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation. Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Weaknesses:

      One important finding of the study is that the strongest binders did not correlate with the fastest degradation in a cellular assay, but explanations for this behavior were not supported experimentally. Some minor issues regarding experimental replicates and details were also noted.

      Interesting results from a number of labs (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10. In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator.

      After further reading on this topic, we will modify the relevant piece of text from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation. Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

    1. eLife Assessment

      This study presents a valuable finding on the alterations in the autophagic-lysosomal pathway in a Huntington's disease model. The evidence supporting the claims of the authors is solid. However, the observed changes in autophagy are moderate, the images were not fully represented by the quantification results, and some of the short forms used in the text are not clearly stated; these issues hinder further evaluation of the claims. The work will be of interest to neuroscientists working on HD.