10,000 Matching Annotations

Oct 2025
www.biorxiv.org www.biorxiv.org

Dependence of Contextual Modulation in Macaque V1 on Interlaminar Signal Flow

5
1. Public_Reviews 15 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 The results by Zhu et al provide valuable insights into the representation of border ownership in area V1. They used neuropixel recording to demonstrate the clustering of border ownership, and compared cross-correlation functions between neurons in different layers to demonstrate that they depend on the type of stimulus. The strength of the evidence is solid but can be improved by performing additional analyses and addressing some concerns (as raised in the previous and current review), and accounting for the differences in classical and non-classical receptive field stimulation conditions.
 
 Summary
2. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Zhu and colleagues used high-density Neuropixel probes to perform laminar recordings in V1 while presenting either small stimuli that stimulated the classical receptive field (CRF) or large stimuli whose border straddled the RF to provide nonclassical RF (nCRF) stimulation. Their main question was to understand the relative contribution of feedforward (FF), feedback (FB), and horizontal circuits to border ownership (Bown ), which they addressed by measuring cross-correlation across layers. They found differences in cross-correlation between feedback/horizontal (FH) and input layers during CRF and nCRF stimulation.
 
 Comments on revisions:
 
 In the revision, the authors have added a paragraph in the Discussion to address the question of layers 2/3 neurons leading layer 4 neurons, and have provided answers to the questions in the public review without making substantial changes in the paper. However, there were several other recommendations, which I am not sure why were not considered. I am adding those again below.
 
 * For CRF stimulation, the zero lag between 4C and 4A/B with layer 5/6 (Figure 3D last two columns on the right) was surprising to me. I just felt that this could be because layer 6 may also be getting FF inputs. Perhaps better not to club layer 5 with 6, as mentioned earlier also.
 
 * Interpreting the nCRF delays, with often negative delays, was very challenging for me. For example, 4C -> 5/6 (third column in Figure 3) has a significantly negative peak (although that does not show up in statistical analysis because it seems to be a signed test to just test if the median was greater than zero, not if the median was different from zero; line 285). What is the interpretation here? Are spikes in 5/6 causing spikes in 4C (which, as mentioned earlier, would require anatomical projections from 5/6 to 4C)? On the other hand, if FB inputs arrive in 5/6 but there are no inputs going to 4C, then why should there even be a significant cross-correlation?
 
 The only explanation I could think of is somehow an alignment of inputs in these two layers such that FH inputs come in Layer 5/6 just before FF inputs arrive in 4C, each causing a spike in a neuron in each layer which are otherwise not anatomically interconnected. But this would require both a very precise temporal coupling between FF and FH inputs arriving in these areas AND neurons in layer 5/6 which very strongly respond to FH stimulation (I thought that FH inputs are mainly modulatory and not as strong). Anyway, it would be good to see some cross correlation functions which have a negative lag (all examples in Fig 3B has positive or zero lag).
 
 * I think cross-correlation analysis would have been useful if there was data from a feedback area (say V2). In its absence, perhaps latency analysis (by just comparing the PSTH) could have revealed something interesting, given that the hypothesis is about differences in the timings in FH versus FF inputs. Do PSTHs across layers show the type of differences that are being claimed (e.g. in line 295-297)?
 
 * Line 262-63: "Notably, the rates were nearly identical under the two stimulus conditions" - I would have thought CRF stimulation would produce higher rates. Can the authors explain this?
 
 * Line 174-175: Isn't the proportion of border ownership cells in layer 4C higher than one would expect under the assumption that nCRF effects are mediated by horizontal and feedback connections which layer 4C does not receive? Can authors explain?
 
 * Figure 3D: it would also be good to show the heatmaps stacked up in the increasing order of the interelectrode distance of the pairs so that it will be easy to see how the peak lag changes with distance as well.
 
 * It will be good to show the shift in peak lag and CCG asymmetry between CRF and nCRF conditions for the same pairs, using a violin or bar plot with lines connecting each pair in Figure 3.
 
 * Line 594, 603, 628 and 630: What procedure was used to determine the size, location of the CRF, and optimal orientation manually online?
 
 * Line 733-734: Although a reference is cited, please explicitly mention the rationale for keeping the peak lag cutoff at 10 ms.
 
 * It is unclear why a grating was used for the CRF condition, instead of just having the portion of the stimulus within the RF for the nCRF condition, as the comparisons for FHi with FF are with different FF drives in each case.
 
 * Figure 5 - the scatter is enormous, can you please provide the R2 values?
 
 Review 1
3. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors present a study of how modulatory activity from outside the classical receptive field (cRF) differs from cRF stimulation. They study neural activity across the different layers of V1 in two anesthetized monkeys using Neuropixels probes. The monkeys are presented with drifting gratings and border-ownership tuning stimuli. They find that border-ownership tuning is organized into columns within V1, which is unexpected and exciting, and that the flow of activity from cell-to-cell (as judged by cross-correlograms between single units) is influenced by the type of visual stimulus: border-ownership tuning stimuli vs. drifting-grating stimuli.
 
 Strengths:
 
 The questions addressed by the study are of high interest, and the use of Neuropixels probes yields extremely high numbers of single-units and cross-correlation histograms (CCHs) which makes the results robust. The study is well-described.
 
 Comments on revisions:
 
 The results are interesting and seem robust. However, several of my main points were not addressed. The authors do not analyze or discuss the problem the border ownership stimuli do uniquely isolate feedback from feedforward influences. Here are my remaining points/recommendations:
 
 (1) In my previous review I indicated that the border-ownership signal also provides a strong feedforward drive, a black-white edge, in addition to the border ownership signal. Calling this a "nCRF stimulus" is a misnomer. Please correct this terminology and replace it by something that is appropriate, e.g. changing it into "grating stimulation" (instead of CRF stimulation) and BO-stimulation (instead of nCRF stimulation).
 
 (2) In my previous review I asked if the initial response for the border ownership stimulus show the feedforward signature. It is unclear to me why this suggestions did not lead to an analysis of the feedforward response. I repeat the text from my previous review: "The authors state that they did not look at cross-correlations during the initial response, but if they do, do they see the feedforward-dominated pattern? The jitter CCH analysis might suffice in correcting for the response transient." Can the authors address this point?
 
 (3) In my previous review I asked the authors show the average time course of the response elicited by preferred and nonpreferred border ownership stimuli across all significant neurons. It remains unclear why this plot was not provided.
 
 Review 2
4. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The paper by Zhu et al is on an important topic in visual neuroscience, the emergence in the visual cortex of signals about figure and ground. This topic also goes by the name border ownership. The paper utilizes modern recording techniques very skillfully to extend what is known about border ownership. It offers new evidence about the prevalence of border ownership signals across different cortical layers in V1 cortex. Also, it uses pairwise cross correlation to study signal flow under different conditions of visual stimulation that include the border ownership paradigm.
 
 Strengths: The paper's strengths are results of its use of multi-electrode probes to study border ownership in many neurons simultaneously across the cortical layers in V1. Also it provides new useful data about the dynamics of interaction of signals from the non-classical receptive field (NCRF) and the Classical receptive field (CRF).
 
 Weaknesses:
 
 The paper's weakness is that it does not challenge consensus beliefs about mechanisms. Also, the paper combines data about border ownership with data about the NCRF without making it clear how they are similar or different.
 
 Critique:
 
 The border ownership data on V1 offered in the paper replicate experimental results obtained by Zhou and von der Heydt (2000) and confirm the earlier results. The incremental addition is that the authors found border ownership in all cortical layers of V1, extending Zhou and von der Heydt's results that were only about layer 2/3 in V2 cortex. This is an interesting new result using the same stimuli but new measurement techniques.
 
 The cross-correlation results show that the pattern of the cross correlogram (CCG) is influenced by the visual pattern being presented. However, in the initial submitted ms. the results were not analyzed mechanistically, and the interpretation was unclear. For instance, the authors show in Figure 3 (and in Figure S2) that the peak of the CCG can indicate layer 2/3 excites layer 4C when the visual stimulus is the border ownership test pattern, a large square 8 deg on a side. More than one reviewer asked, " how can layer 2/3 excite layer 4C"? . In the revised ms. the authors added a paragraph to the Discussion to respond to the reviewers about this point. The authors could provide an even better response to the reviewers by emphasizing that, consistently, layer 5/6 neurons lead neurons in layer 4, and for the CRF pattern and even more when the NCRF patterns are used.
 
 The problems in understanding the CCG data are indirectly caused by the lack of a critical analysis of what is happening in the responses that reveal the border ownership signals, as in Fig.2. Let's put it bluntly--are border ownership signals excitatory or inhibitory? As the authors pointed out in their rebuttal, Zhang and von der Heydt (2010, JNS) did experiments to answer this question but I do not agree with the authors rebuttal letter about what Zhang and von der Heydt (2010) reported. If you examine Zhang and von der Heydt's Figure 6, you see that the major effect of stimulating border ownership neurons is suppression from the non-preferred side. That result is consistent with many papers on the NCRF (many cited by the authors) that indicate that it is mostly suppressive. That experimental fact about border ownership should be mentioned in the present paper.
 
 What I should have pointed out in the first round, but didn't understand it then, is that there is a disconnect between the the border ownership laminar analysis (Figure 2) and the laminar correlations with CCGs (Figures 3-5) because the CCGs are not limited to border ownership neurons (or at least we are not told they were limited to them). So the CCG results are not mostly about border ownership--they are about the difference between signal flow in responses to small drifting Gabor patterns vs big flashed squares. Since only 21% of all recorded neurons were border ownership neurons, it is likely that most of the CCG statistics is based on neurons that do not show border ownership. Nevertheless, Figures 3 and 4 are very useful for the study of signal flow in the NCRF. It wasn't clear to me and I think the authors could make it clearer what those figures are about. And I wonder if it might be possible to make a stronger link with border ownership by restricting the CCG analysis to pairs of neurons in which one neuron is a border ownership neuron. Are there enough data?
 
 My critique of the CCG analysis applies to Figure 5 also. That figure shows a weak correlation of CCG asymmetry with Border Ownership Index. Perhaps a stronger correlation might be present if the population were restricted to the much smaller population of neuron pairs that had at least one border ownership neuron.
 
 Review 3
5. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public review):
 
 Zhu and colleagues used high-density Neuropixel probes to perform laminar recordings in V1 while presenting either small stimuli that stimulated the classical receptive field (CRF) or large stimuli whose border straddled the RF to provide nonclassical RF (nCRF) stimulation. Their main question was to understand the relative contribution of feedforward (FF), feedback (FB), and horizontal circuits to border ownership (Bown), which they addressed by measuring crosscorrelation across layers. They found differences in cross-correlation between feedback/horizontal (FH) and input layers during CRF and nCRF stimulation.
 
 Although the data looks high quality and analyses look mostly fine, I had a lot of difficulty understanding the logic in many places. Examples of my concerns are written below.
 
 (1) What is the main question? The authors refer to nCRF stimulation emerging from either feedback from higher areas or horizontal connections from within the same area (e.g. lines 136 to 138 and again lines 223-232). I initially thought that the study would aim to distinguish between the two. However, the way the authors have clubbed the layers in 3D, the main question seems to be whether Bown is FF or FH (i.e., feedback and horizontal are clubbed). Is this correct? If so, I don't see the logic, since I can't imagine Bown to be purely FF. Thus, just showing differences between CRF stimulation (which is mainly expected to be FF) and nCRF stimulation is not surprising to me.
 
 We thank the reviewer for their thoughtful comments. As explained in the discussion, we grouped cortical layers to reduce uncertainty in precisely assigning laminar boundaries and to increase statistical power. Consequently, this limits our ability to distinguish the relative contributions of feedback inputs, primarily targeting layers 1 and 6, and horizontal connections, mainly within layers 2/3 and 5. Nevertheless, previous findings, especially regarding the rapid emergence of Bown signals, suggest that feedback is more biologically plausible than horizontal-based mechanisms.
 
 Importantly, the emergence of Bown signals in the primate brain should not be taken for granted. Direct physiological evidence that distinguishes feedforward from feedback/horizontal mechanisms has been lacking. While we agree it is unlikely that Bown is mediated solely by feedforward processing, we felt it was necessary to test this empirically, particularly using highresolution laminar recordings.
 
 As discussed, feedforward models of Bown have been proposed (e.g., Super, Romeo, and Keil, 2010; Saki and Nishimura, 2006). These could, in theory, be supported by more general nCRF modulations arising through early feedforward inhibitions, such as those observed in the retinogeniculate pathway (e.g., Webb, Tinsley, Vincent and Derrington, 2005; Blitz and Regehr, 2005; Alitto and Usrey, 2008). However, most Bown models rely heavily on response latency, yet very few studies have recorded across layers or areas simultaneously to address this directly. Notably, recent findings in area V4 show that Bown signals emerge earlier in deep layers than in granular (input) layers, suggesting a non-feedforward origin (Franken and Reynolds, 2021).
 
 Furthermore, although previous studies have shown that the nCRF can modulate firing rates and the timing of neuronal firing across layers, our findings go beyond these effects. We provide clear evidence that nCRF modulation also alters precise spike timing relationships and interlaminar coordination, and that the magnitude of nCRF modulation depends on these interlaminar interactions. This supports the idea that Bown , or more general nCRF modulation, involves more than local rate changes, reflecting layer-specific network dynamics consistent with feedback or lateral integration.
 
 (2) Choice of layers for cross-correlation analysis: In the Introduction, and also in Figure 3C, it is mentioned that FF inputs arrive in 4C and 6, while FB/Horizontal inputs arrive at "superficial" and "deep", which I take as layer 2/3 and 5. So it is not clear to me why (i) layer 4A/B is chosen for analysis for Figure 3D (I would have thought layer 6 should have been chosen instead) and (ii) why Layers 5 and 6 are clubbed.
 
 We thank the reviewer for raising this important point. The confusion likely stems from our use of the terms “superficial” and “deep” layers when describing the targets of feedback/horizontal inputs. To clarify, by “superficial” and “deep,” we specifically refer to layers 1–3 and layers 5–6, respectively, as illustrated in Figure 3C. Feedback and horizontal inputs relatively avoid entire layer 4, including both 4C and 4A/B.
 
 We also emphasize that the classification of layers as feedforward or feedback/horizontal recipients is relative rather than absolute. For example, although layer 6 receives both feedforward and feedback/horizontal inputs, it contains a higher proportion of feedback/horizontal inputs compared to layers 4C and 4A/B.
 
 We had addressed this rationale in the Discussion, but recognize it may not have been sufficiently emphasized. We have revised the main text accordingly to clarify this point for readers in the final manuscript version.
 
 (3) Addressing the main question using cross-correlation analysis: I think the nice peaks observed in Figure 3B for some pairs show how spiking in one neuron affects the spiking in another one, with the delay in cross-correlation function arising from the conduction delay. This is shown nicely during CRF stimulation in Figure 3D between 4C -> 2/3, for example. However, the delay (positive or negative) is constrained by anatomical connectivity. For example, unless there are projections from 2/3 back to 4C which causes firing in a 2/3 layer neuron to cause a spike in a layer 4 neuron, we cannot expect to get a negative delay no matter what kind of stimulation (CRF versus nCRF) is used.
 
 We thank the reviewer for the insightful comment. The observation that neurons within FHi laminar compartments (layers 2/3, 5/6) can lead those in layer 4 (4C, 4A/B) during nCRF stimulation may indeed seem unexpected. However, several anatomical pathways could mediate the propagation of Bown signals from FHi compartments to layer 4. We have revised the Discussion section in the final version of the manuscript to address this point explicitly.
 
 In Macaque V1, projections from layers 2/3 to 4A/B have been documented (Blasdel et al., 1985; Callaway and Wiser, 1996), and neurons in 4A/B often extend apical dendrites into layers 2/3 (Lund, 1988; Yoshioka et al., 1994). Although direct projections from layers 2/3 to 4C are generally sparse (Callaway, 1998), a subset of neurons in the lower part of layer 3 can give off collateral axons to 4C (Lund and Yoshioka, 1991). Additionally, some 4C neurons extend dendrites into 4B, enabling potential dendritic integration of inputs from more superficial layers (Somogyi and Cowey, 1981; Mates and Lund, 1983; Yabuta and Callaway, 1998). Sparse connections from 2/3 to layer 4 have also been reported in cat V1 (Binzegger, Douglas and Martin, 2004). Moreover, layers 2/3 may influence 4C neurons disynaptically, without requiring dense monosynaptic connections.
 
 Importantly, while CCGs can suggest possible circuit arrangements, functional connectivity may arise through mechanisms not fully captured by traditional anatomical tracing. Indeed, the apparent discrepancy between anatomical and functional data is not uncommon. For example, although 4B is known to receive anatomical input primarily from 4Cα, but not 4Cβ, photostimulation experiments have shown that 4B neurons can also be functionally driven by 4Cβ (Sawatari and Callaway, 1996). Our observation of functional inputs from layers 2/3 to layer 4 is also consistent with prior findings in rodent V1, where CCG analysis (e.g., Figure 7 in Senzai, Fernandez-Ruiz and Buzsaki, 2019) or photostimulation (Xu et al., 2016) revealed similar pathways.
 
 Layers 5/6 provide dense projections to layers 4A/B (Lund, 1988; Callaway, 1998). In particular, layer 6 pyramidal neurons, especially the subset classified as Type 1 cells, project substantially to layer 4C (Wiser and Callaway, 1996; Fitzpatrick et al., 1985).
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors present a study of how modulatory activity from outside the classical receptive field (cRF) differs from cRF stimulation. They study neural activity across the different layers of V1 in two anesthetized monkeys using Neuropixels probes. The monkeys are presented with drifting gratings and border-ownership tuning stimuli. They find that border-ownership tuning is organized into columns within V1, which is unexpected and exciting, and that the flow of activity from cellto-cell (as judged by cross-correlograms between single units) is influenced by the type of visual stimulus: border-ownership tuning stimuli vs. drifting-grating stimuli.
 
 Strengths:
 
 The questions addressed by the study are of high interest, and the use of Neuropixels probes yields extremely high numbers of single-units and cross-correlation histograms (CCHs) which makes the results robust. The study is well-described.
 
 Weaknesses:
 
 The weaknesses of the study are (a) the use of anesthetized animals, which raises questions about the nature of the modulatory signal being measured and the underlying logic of why a change in visual stimulus would produce a reversal in information flow through the cortical microcircuit and (b) the choice of visual stimuli, which do not uniquely isolate feedforward from feedback influences.
 
 (1) The modulation latency seems quite short in Figure 2C. Have the authors measured the latency of the effect in the manuscript and how it compares to the onset of the visually driven response? It would be surprising if the latency was much shorter than 70ms given previous measurements of BO and figure-ground modulation latency in V2 and V1. On the same note, it might be revealing to make laminar profiles of the modulation (i.e. preferred - non-preferred border orientation) as it develops over time. Does the modulation start in feedback recipient layers?
 
 (2) Can the authors show the average time course of the response elicited by preferred and nonpreferred border ownership stimuli across all significant neurons?
 
 We thank the reviewer for the insightful comment—this is indeed an important and often overlooked point. As noted in the Discussion, Bown modulation differs from other forms of figure-ground modulation (e.g., Lamme et al., 1998) in that it can emerge very rapidly in early visual cortex—within ~10–35 ms after response onset (Zhou et al., 2000; Sugihara et al., 2011). This rapid emergence has been interpreted as evidence for the involvement of fast feedback inputs, which can propagate up to ten times faster than horizontal connections (Girard et al., 2001). Moreover, interlaminar interactions via monosynaptic or disynaptic connections can occur on very short timescales (a few milliseconds), further complicating efforts to disentangle feedback influences based solely on latency.
 
 Thus, while the early onset of modulation in our data may appear surprising, it is consistent with prior Bown findings, and likely reflects a combination of fast feedback and rapid interlaminar processing. This makes it challenging to use conventional latency measurements to resolve laminar differences in Bown modulation. Latency comparisons are well known to be susceptible to confounds such as variability in response onset, luminance, contrast, stimulus size, and other sensory parameters.
 
 Although we did not explicitly quantify the latency of Bown modulation in this manuscript, our cross-correlation analysis provides a more sensitive and temporally resolved measure of interlaminar information flow. We therefore focused on this approach rather than laminar modulation profiles, as it more directly addresses our primary research question.
 
 (3) The logic of assuming that cRF stimulation should produce the opposite signal flow to borderownership tuning stimuli is worth discussing. I suspect the key difference between stimuli is that they used drifting gratings as the cRF stimulus, the movement of the stimulus continually refreshes the retinal image, leading to continuous feedforward dominance of the signals in V1. Had they used a static grating, the spiking during the sustained portion of the response might also show more influence of feedback/horizontal connections. Do the initial spikes fired in response to the borderownership tuning stimuli show the feedforward pattern of responses? The authors state that they did not look at cross-correlations during the initial response, but if they do, do they see the feedforward-dominated pattern? The jitter CCH analysis might suffice in correcting for the response transient.
 
 We thank the reviewer for the insightful comment. As noted in the final Results section, our CRF and nCRF stimulation paradigms differ in respects beyond the presence or absence of nonclassical modulation, including stimulus properties within the CRF.
 
 We agree with the reviewer’s speculation that drifting gratings may continually refresh the retinal image, promoting sustained feedforward dominance in V1, whereas static gratings might allow greater influence from feedback/horizontal inputs during the sustained response. Likewise, the initial response to the Bown stimulus could be dominated by feedforward activity before feedback/horizontal influences arrive.
 
 This contrast was a central motivation for our experimental design: we deliberately used two stimulus conditions — drifting gratings to emphasize feedforward processing, and Bown stimuli, which are known to engage feedback modulation — to test whether these two conditions yield different patterns of interlaminar information flow. Our results confirm that they do. While we did not separately analyze the very initial spike period, our focus is on interlaminar information flow during the sustained response, which serves as the primary measure of feedback/horizontal engagement in this study.
 
 Finally, beyond this direct comparison, we show in Figure 5 that under nCRF stimulation alone, the direction and strength of interlaminar information flow correlate with the magnitude of Bown modulation, further supporting the idea that our cross-correlation approach reveals functionally meaningful differences in cortical processing.
 
 (4) The term "nCRF stimulation" is not appropriate because the CRF is stimulated by the light/dark edge.
 
 We thank the reviewer for the comment. As noted in the Introduction, nCRF effects described in the literature invariably involve stimulation both inside and outside the CRF. Our use of the term “nCRF stimulation” refers to this experimental paradigm, rather than suggesting that the CRF itself is unstimulated. We hope this clarifies our use of the term.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The paper by Zhu et al is on an important topic in visual neuroscience, the emergence in the visual cortex of signals about figures and ground. This topic also goes by the name border ownership. The paper utilizes modern recording techniques very skillfully to extend what is known about border ownership. It offers new evidence about the prevalence of border ownership signals across different cortical layers in V1 cortex. Also, it uses pairwise cross-correlation to study signal flow under different conditions of visual stimulation that include the border ownership paradigm.
 
 Strengths:
 
 The paper's strengths are its use of multi-electrode probes to study border ownership in many neurons simultaneously across the cortical layers in V1, and its innovation of using crosscorrelation between cortical neurons -- when they are viewing border-ownership patterns or instead are viewing grating patterns restricted to the classical receptive field (CRF).
 
 Weaknesses:
 
 The paper's weaknesses are its largely incremental approach to the study of border ownership and the lack of a critical analysis of the cross-correlation data. The paper as it is now does not advance our understanding of border ownership; it mainly confirms prior work, and it does not challenge or revise consensus beliefs about mechanisms. However, it is possible that, in the rich dataset the authors have obtained, they do possess data that could be added to the paper to make it much stronger.
 
 Critique:
 
 The border ownership data on V1 offered in the paper replicates experimental results obtained by Zhou and von der Heydt (2000) and confirms the earlier results using the same analysis methods as Zhou. The incremental addition is that the authors found border ownership in all cortical layers extending Zhou's results that were only about layer 2/3.
 
 The cross-correlation results show that the pattern of the cross-correlogram (CCG) is influenced by the visual pattern being presented. However, the results are not analyzed mechanistically, and the interpretation is unclear. For instance, the authors show in Figure 3 (and in Figure S2) that the peak of the CCG can indicate layer 2/3 excites layer 4C when the visual stimulus is the border ownership test pattern, a large square 8 deg on a side. But how can layer 2/3 excite layer 4C? The authors do not raise or offer an answer to this question. Similar questions arise when considering the CCG of layer 4A/B with layer 2/3. What is the proposed pathway for layer 2/3 to excite 4A/B? Other similar questions arise for all the interlaminar CCG data that are presented. What known functional connections would account for the measured CCGs?
 
 We thank the reviewer for raising this important point. As noted in our response to a previous comment, several anatomical pathways could mediate apparent functional inputs from layers 2/3 to 4C and 4A/B. In macaque V1, projections from layers 2/3 to 4A/B have been documented (Blasdel et al., 1985; Callaway and Wiser, 1996), and neurons in 4A/B often extend apical dendrites into layers 2/3 (Lund, 1988; Yoshioka et al., 1994). Although direct projections from layers 2/3 to 4C are generally sparse (Callaway, 1998), a subset of lower layer 3 neurons can give off collateral axons to 4C (Lund and Yoshioka, 1991). Some 4C neurons also extend dendrites into 4B, potentially allowing dendritic integration of inputs from more superficial layers (Somogyi and Cowey, 1981; Mates and Lund, 1983; Yabuta and Callaway, 1998). Sparse connections from 2/3 to layer 4 have also been reported in cat V1 (Binzegger et al., 2004).
 
 Moreover, layers 2/3 may influence 4C neurons disynaptically, without requiring dense monosynaptic connections. While CCGs suggest possible circuit arrangements, functional connectivity may arise through mechanisms not fully captured by anatomical tracing, and apparent discrepancies between anatomical and functional data are not uncommon. For example, although 4B is known to receive anatomical input primarily from 4Cα, 4B neurons can also be functionally driven by 4Cβ using photostimulation (Sawatari and Callaway, 1996). Our observation of functional inputs from layers 2/3 to layer 4 is also consistent with prior findings in rodent V1, where CCG analysis (e.g., Figure 7 in Senzai, Fernandez-Ruiz and Buzsaki, 2019) or photostimulation (Xu et al., 2016) revealed similar pathways.
 
 Layers 5/6 also provide dense projections to layers 4A/B (Lund, 1988; Callaway, 1998). In particular, layer 6 pyramidal neurons, especially the subset classified as Type 1 cells, project substantially to layer 4C (Wiser and Callaway, 1996; Fitzpatrick et al., 1985).
 
 We have revised the Discussion section to explicitly address these points and clarify the potential anatomical and functional pathways underlying the measured interlaminar CCGs, highlighting how inputs from layers 2/3 and 5/6 to layer 4 can be mediated via both direct and indirect connections.
 
 The problems in understanding the CCG data are indirectly caused by the lack of a critical analysis of what is happening in the responses that reveal the border ownership signals, as in Figure 2. Let's put it bluntly - are border ownership signals excitatory or inhibitory? The reason I raise this question is that the present authors insightfully place border ownership as examples of the action of the non-classical receptive field (nCRF) of cortical cells. Most previous work on the nCRF (many papers cited by the authors) reveal the nCRF to be inhibitory or suppressive. In order to know whether nCRF signals are excitatory or inhibitory, one needs a baseline response from the CRF, so that when you introduce nCRF signals you can tell whether the change with respect to the CRF is up or down. As far as I know, prior work on border ownership has not addressed this question, and the present paper doesn't either. This is where the rich dataset that the present authors possess might be used to establish a fundamental property of border ownership.
 
 Then we must go back to consider what the consequences of knowing the sign of the border ownership signal would mean for interpreting the CCG data. If the border ownership signals from extrastriate feedback or, alternatively, from horizontal intrinsic connections, are excitatory, they might provide a shared excitatory input to pairs of cells that would show up in the CCG as a peak at 0 delay. However, if the border ownership manuscript signals are inhibitory, they might work by exciting only inhibitory neurons in V1. This could have complicated consequences for the CCG.The interpretation of the CCG data in the present version of the m is unclear (see above). Perhaps a clearer interpretation could be developed once the authors know better what the border ownership signals are.
 
 We thank the reviewer for raising this fundamental and thought-provoking question. As noted, Bown signals arise from nCRF, which has often been associated with suppressive effects. However, Zhang and von der Heydt (2010) provided important insight into this issue by systematically varying the placement of figure fragments outside the CRF while keeping an edge centered within the CRF. They found that contextual fragments on the preferred side of Bown produce facilitation, while those on the non-preferred side produce suppression. Thus, the nCRF contribution to Bown reflects both excitatory and inhibitory modulation, depending on the spatial configuration of the figure.
 
 These effects were well explained by their model in which feedback from grouping cells in higher areas selectively enhances or suppresses V1/V2 neuron responses, depending on their Bown preference. In this framework, the Bown signal itself is not inherently excitatory or inhibitory; rather, it results from the net effect of feedback, which can be either facilitative or suppressive. Importantly, it is the input that is modulated — not that the receiving neurons are necessarily inhibitory themselves.
 
 In the current study, our analysis focused on CCGs showing excessive coincident spiking, i.e., positive peaks, which are typically interpreted as evidence for shared excitatory input or excitatory connections. Due to the limited number of connections, we did not analyze inhibitory interactions, such as anti-correlations or delayed suppression in the CCGs, which would be expected if the reference neuron were inhibitory. Therefore, the CCGs we report here likely reflect the excitatory component of the Bown signal, and possibly its upstream drive via feedback. While a full separation of excitatory and inhibitory components remains an important goal for future work, our data suggest that Bown modulation is at least partially mediated through excitatory feedback input.
 
 My critique of the CCG analysis applies to Figure 5 also. I cannot comprehend the point of showing a very weak correlation of CCG asymmetry with Border Ownership Index, especially when what CCG asymmetry means is unclear mechanistically. Figure 5 does not make the paper stronger in my opinion.
 
 We thank the reviewer for this comment. As described in the Results section for Figure 5, the observation that interlaminar information flow correlates with Bown modulation is important because it demonstrates that these flow patterns are specifically related to the magnitude of Bown signals, independent of the comparisons between CRF and nCRF stimulation.
 
 In Figure 3, the authors show two CCGs that involve 4C--4C pairs. It would be nice to know more about such pairs. If there are any 6--6 pairs, what they look like also would be interesting. The authors also in Figure 3 show CCG's of two 4C--4A/B pairs and it would be quite interesting to know how such CCGs behave when CRF and nCRF stimuli are compared. In other words, the authors have shown us they have many data but have chosen not to analyze them further or to explain why they chose not to analyze them. It might help the paper if the authors would present all the CCG types they have. This suggestion would be helpful when the authors know more about the sign of border ownership signals, as discussed at length above.
 
 We thank the reviewer for the insightful comment. The rationale for selecting specific laminar pairs is described in the Results section after Figure 3C and further discussed in the Discussion. In brief, we focused on CCGs computed from pairs in which one neuron resided in laminar compartments receiving feedback/horizontal inputs (layers 2/3 and 5/6) and the other within compartments relatively devoid of these inputs (layers 4C and 4A/B).
 
 To mitigate uncertainty in defining exact laminar boundaries and to maximize statistical power, we combined some anatomical layers into distinct laminar compartments. This approach allowed us to compare the relative spike timing between neuronal pairs during CRF and nCRF stimulation. If feedback/horizontal inputs contribute more during nCRF than CRF stimulation, we expect this to be reflected in the lead-lag relationships of the CCGs. While other pairs (e.g., 5/6–5/6 or 4C– 4A/B) could in principle be analyzed, the hypothesized patterns for these pairs are less clear, and thus they were not the focus of our study. Nonetheless, these additional pairs represent interesting directions for future work.
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.18.590176v3
www.biorxiv.org www.biorxiv.org

Raw signal segmentation for estimating RNA modification from Nanopore direct RNA sequencing data

5
1. Public_Reviews 15 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This study presents SegPore, a valuable new method for processing direct RNA nanopore sequencing data, which improves the segmentation of raw signals into individual bases and boosts the accuracy of modified base detection. The evidence presented to benchmark SegPore is solid, and the authors provide a fully documented implementation of the method. SegPore will be of particular interest to researchers studying RNA modifications.
 
 Summary
2. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 In this manuscript, the authors describe a new computational method (SegPore), which segments the raw signal from nanopore direct RNA-Seq data to improve the identification of RNA modifications. In addition to signal segmentation, SegPore includes a Gaussian Mixture Model approach to differentiate modified and unmodified bases. SegPore uses Nanopolish to define a first segmentation, which is then refined into base and transition blocks. SegPore also includes a modification prediction model that is included in the output. The authors evaluate the segmentation in comparison to Nanopolish and Tombo (RNA002) as well as f5c and Uncalled 4 (RNA004), and they evaluate the impact on m6A RNA modification detection using data with known m6A sites. In comparison to existing methods, SegPore appears to improve the ability to detect m6A, suggesting that this approach could be used to improve the analysis of direct RNA-Seq data.
 
 Strengths:
 
 SegPore address an important problem (signal data segmentation). By refining the signal into transition and base blocks, noise appears to be reduced, leading to improved m6A identification at the site level as well as for single read predictions. The authors provide a fully documented implementation, including a GPU version that reduces run time. The authors provide a detailed methods description, and the approach to refine segments appears to be new.
 
 Weaknesses:
 
 The authors show that SegPore reduces noise compared to other methods, however the improvement in accuracy appears to be relatively small for the task of identifying m6A. To run SegPore, the GPU version is essential, which could limit the application of this method in practice.
 
 Review 1
3. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The work seeks to improve detection of RNA m6A modifications using Nanopore sequencing through improvements in raw data analysis. These improvements are said to be in the segmentation of the raw data, although the work appears to position the alignment of raw data to the reference sequence and some further processing as part of the segmentation, and result statistics are mostly shown on the 'data-assigned-to-kmer' level. As such, the title, abstract and introduction stating the improvement of just the 'segmentation' does not seem to match the work the manuscript actually presents, as the wording seems a bit too limited for the work involved. The work itself shows minor improvements in m6Anet when replacing Nanopolish' eventalign with this new approach, but clear improvements in the distributions of data assigned per kmer. However, these assignments were improved well enough to enable m6A calling from them directly, both at site-level and at read-level.
 
 A large part of the improvements shown appear to stem from the addition of extra, non-base/kmer specific, states in the segmentation/assignment of the raw data, removing a significant portion of what can be considered technical noise for further analysis. Previous methods enforced assignment of (almost) all raw data, forcing a technically optimal alignment that may lead to suboptimal results in downstream processing as datapoints could be assigned to neighbouring kmers instead, while random noise that is assigned to the correct kmer may also lead to errors in modification detection.
 
 For an optimal alignment between the raw signal and the reference sequence, this approach may yield improvements for downstream processing using other tools. Additionally, the GMM used for calling the m6A modifications provides a useful, simple and understandable logic to explain the reason a modification was called, as opposed to the black models that are nowadays often employed for these types of tasks.
 
 Weaknesses:
 
 The manuscript suggests the eventalign results are improved compared to Nanopolish. While this is believably shown to be true (Table 1), the effect on the use case presented, downstream differentiation between modified and unmodified status on a base/kmer, is likely limited for during downstream modification calling the noisy distributions are often 'good enough'. E.g. Nanopolish uses the main segmentation+alignment for a first alignment and follows up with a form of targeted local realignment/HMM test for modification calling (and for training too), decreasing the need for the near-perfect segmentation+alignment this work attempts to provide. Any tool applying a similar strategy probably largely negates the problems this manuscript aims to improve upon. Should a use-case come up where this downstream optimisation is not an option, SegPore might provide the necessary improvements in raw data alignment.
 
 Appraisal:
 
 The authors have shown their methods ability to identify noise in the raw signal and remove their values from the segmentation and alignment, reducing its influences for further analyses. Figures directly comparing the values per kmer do show a visibly improved assignment of raw data per kmer. As a replacement for Nanopolish' eventalign it seems to have a rather limited, but improved effect, on m6Anet results. At the single read level modification modification calling this work does appear to improve upon CHEUI.
 
 Impact:
 
 With the current developments for Nanopore based modification calling largely focusing on Artificial Intelligence, Neural Networks and the likes, improvements made in interpretable approaches provide an important alternative that enables deeper understanding of the data rather than providing a tool that plainly answers the question of wether a base is modified or not, without further explanation. The work presented is best viewed in context of a workflow where one aims to get an optimal alignment between raw signal data and the reference base sequence for further processing. For example, as presented, as a possible replacement for Nanopolish' eventalign. Here it might enable data exploration and downstream modification calling without the need for local realignments or other approaches that re-consider the distribution of raw data around the target motif, such as a 'local' Hidden Markov Model or Neural Networks. These possibilities are useful for a deeper understanding of the data and further tool development for modification detection works beyond m6A calling.
 
 Review 2
4. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Nucleotide modifications are important regulators of biological function, however, until recently, their study has been limited by the availability of appropriate analytical methods. Oxford Nanopore direct RNA sequencing preserves nucleotide modifications, permitting their study, however many different nucleotide modifications lack an available base-caller to accurately identify them. Furthermore, existing tools are computationally intensive, and their results can be difficult to interpret.
 
 Cheng et al. present SegPore, a method designed to improve the segmentation of direct RNA sequencing data and boost the accuracy of modified base detection.
 
 Strengths:
 
 This method is well described and has been benchmarked against a range of publicly available base callers that have been designed to detect modified nucleotides.
 
 Weaknesses:
 
 However, the manuscript has a significant drawback in its current version. The most recent nanopore RNA base callers can distinguish between different ribonucleotide modifications, however, SegPore has not been benchmarked against these models.
 
 The manuscript would be strengthened by benchmarking against the rna004_130bps_hac@v5.1.0 and rna004_130bps_sup@v5.1.0 dorado models, which are reported to detect m5C, m6A_DRACH, inosine_m6A and PseU.
 
 A clear demonstration that SegPore also outperforms the newer RNA base caller models will confirm the utility of this method.
 
 Review 3
5. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 We thank all the reviewers for their constructive comments. We have carefully considered your feedback and revised the manuscript accordingly. The major concern raised was the applicability of SegPore to the RNA004 dataset. To address this, we compared SegPore with f5c and Uncalled4 on RNA004, and found that SegPore demonstrated improved performance, as shown in Table 2 of the revised manuscript.
 
 Following the reviewers’ recommendations, we updated Figures 3 and 4. Additionally, we added one table and three supplementary figures to the revised manuscript:
 
 · Table 2: Segmentation benchmark on RNA004 data
 
 · Supplementary Figure S4: RNA translocation hypothesis illustrated on RNA004 data
 
 · Supplementary Figure S5: Illustration of Nanopolish raw signal segmentation with eventalign results
 
 · Supplementary Figure S6: Running time of SegPore on datasets of varying sizes
 
 Below, we provide a point-by-point response to your comments.
 
 Reviewer #1 (Public review):
 
 Summary:
 
 In this manuscript, the authors describe a new computational method (SegPore), which segments the raw signal from nanopore-direct RNA-Seq data to improve the identification of RNA modifications. In addition to signal segmentation, SegPore includes a Gaussian Mixture Model approach to differentiate modified and unmodified bases. SegPore uses Nanopolish to define a first segmentation, which is then refined into base and transition blocks. SegPore also includes a modification prediction model that is included in the output. The authors evaluate the segmentation in comparison to Nanopolish and Tombo, and they evaluate the impact on m6A RNA modification detection using data with known m6A sites. In comparison to existing methods, SegPore appears to improve the ability to detect m6A, suggesting that this approach could be used to improve the analysis of direct RNA-Seq data.
 
 Strengths:
 
 SegPore addresses an important problem (signal data segmentation). By refining the signal into transition and base blocks, noise appears to be reduced, leading to improved m6A identification at the site level as well as for single-read predictions. The authors provide a fully documented implementation, including a GPU version that reduces run time. The authors provide a detailed methods description, and the approach to refine segments appears to be new.
 
 Weaknesses:
 
 In addition to Nanopolish and Tombo, f5c and Uncalled4 can also be used for segmentation, however, the comparison to these methods is not shown.
 
 The method was only applied to data from the RNA002 direct RNA-Sequencing version, which is not available anymore, currently, it remains unclear if the methods still work on RNA004.
 
 Thank you for your comments.
 
 To clarify the background, there are two kits for Nanopore direct RNA sequencing: RNA002 (the older version) and RNA004 (the newer version). Oxford Nanopore Technologies (ONT) introduced the RNA004 kit in early 2024 and has since discontinued RNA002. Consequently, most public datasets are based on RNA002, with relatively few available for RNA004 (as of 30 June 2025).
 
 Nanopolish and Tombo were developed for raw signal segmentation and alignment using RNA002 data, whereas f5c and Uncalled4are the only two software supporting RNA004 data. Since the development of SegPore began in January 2022, we initially focused on RNA002 due to its data availability. Accordingly, our original comparisons were made against Nanopolish and Tombo using RNA002 data.
 
 We have now updated SegPore to support RNA004 and compared its performance against f5c and Uncalled4 on three public RNA004 datasets.
 
 As shown in Table 2 of the revised manuscript, SegPore outperforms both f5c and Uncalled4 in raw signal segmentation. Moreover, the jiggling translocation hypothesis underlying SegPore is further supported, as shown in Supplementary Figure S4.
 
 The overall improvement in accuracy appears to be relatively small.
 
 Thank you for the comment.
 
 We understand that the improvements shown in Tables 1 and 2 may appear modest at first glance due to the small differences in the reported standard deviation (std) values. However, even small absolute changes in std can correspond to substantial relative reductions in noise, especially when the total variance is low.
 
 To better quantify the improvement, we assume that approximately 20% of the std for Nanopolish, Tombo, f5c, and Uncalled4 arises from noise. Using this assumption, we calculate the relative noise reduction rate of SegPore as follows:
 
 Noise reduction rate = (baseline std − SegPore std) / (0.2 × baseline std)
 
 Based on this formula, the average noise reduction rates across all datasets are:
 
 - SegPore vs Nanopolish: 49.52%
 
 - SegPore vs Tombo: 167.80%
 
 - SegPore vs f5c: 9.44%
 
 - SegPore vs Uncalled4: 136.70%
 
 These results demonstrate that SegPore can reduce the noise level by at least 9% given a noise level of 20%, which we consider a meaningful improvement for downstream tasks, such as base modification detection and signal interpretation. The high noise reduction rates observed in Tombo and Uncalled4 (over 100%) suggest that their actual noise proportion may be higher than our 20% assumption.
 
 We acknowledge that this 20% noise level assumption is an approximation. Our intention is to illustrate that SegPore provides measurable improvements in relative terms, even when absolute differences appear small.
 
 The run time and resources that are required to run SegPore are not shown, however, it appears that the GPU version is essential, which could limit the application of this method in practice.
 
 Thank you for your comment.
 
 Detailed instructions for running SegPore are provided in github (https://github.com/guangzhaocs/SegPore). Regarding computational resources, SegPore currently requires one CPU core and one Nvidia GPU to perform the segmentation task efficiently.
 
 We present SegPore’s runtime for typical datasets in Supplementary Figure S6 in the revised manuscript. For a typical 1 GB fast5 file, the segmentation takes approximately 9.4 hours using a single NVIDIA DGX‑1 V100 GPU and one CPU core.
 
 Currently, GPU acceleration is essential to achieve practical runtimes with SegPore. We acknowledge that this requirement may limit accessibility in some environments. To address this, we are actively working on a full C++ implementation of SegPore that will support CPU-only execution. While development is ongoing, we aim to release this version in a future update.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The work seeks to improve the detection of RNA m6A modifications using Nanopore sequencing through improvements in raw data analysis. These improvements are said to be in the segmentation of the raw data, although the work appears to position the alignment of raw data to the reference sequence and some further processing as part of the segmentation, and result statistics are mostly shown on the 'data-assigned-to-kmer' level.
 
 As such, the title, abstract, and introduction stating the improvement of just the 'segmentation' does not seem to match the work the manuscript actually presents, as the wording seems a bit too limited for the work involved.
 
 The work itself shows minor improvements in m6Anet when replacing Nanopolish eventalign with this new approach, but clear improvements in the distributions of data assigned per kmer. However, these assignments were improved well enough to enable m6A calling from them directly, both at site-level and at read-level.
 
 Strengths:
 
 A large part of the improvements shown appear to stem from the addition of extra, non-base/kmer specific, states in the segmentation/assignment of the raw data, removing a significant portion of what can be considered technical noise for further analysis. Previous methods enforced the assignment of all raw data, forcing a technically optimal alignment that may lead to suboptimal results in downstream processing as data points could be assigned to neighbouring kmers instead, while random noise that is assigned to the correct kmer may also lead to errors in modification detection.
 
 For an optimal alignment between the raw signal and the reference sequence, this approach may yield improvements for downstream processing using other tools. Additionally, the GMM used for calling the m6A modifications provides a useful, simple, and understandable logic to explain the reason a modification was called, as opposed to the black models that are nowadays often employed for these types of tasks.
 
 Weaknesses:
 
 The work seems limited in applicability largely due to the focus on the R9's 5mer models. The R9 flow cells are phased out and not available to buy anymore. Instead, the R10 flow cells with larger kmer models are the new standard, and the applicability of this tool on such data is not shown. We may expect similar behaviour from the raw sequencing data where the noise and transition states are still helpful, but the increased kmer size introduces a large amount of extra computing required to process data and without knowledge of how SegPore scales, it is difficult to tell how useful it will really be. The discussion suggests possible accuracy improvements moving to 7mers or 9mers, but no reason why this was not attempted.
 
 Thank you for pointing out this important limitation. Please refer to our response to Point 1 of Reviewer 1 for SegPore’s performance on RNA004 data. Notably, the jiggling behavior is also observed in RNA004 data, and SegPore achieves better performance than both f5c and Uncalled4.
 
 The increased k-mer size in RNA004 affects only the training phase of SegPore (refer to Supplementary Note 1, Figure 5 for details on the training and testing phases). Once the baseline means and standard deviations for each k-mer are established, applying SegPore to RNA004 data proceeds similarly to RNA002. This is because each k-mer in the reference sequence has, at most, two states (modified and unmodified). While the larger k-mer size increases the size of the parameter table, it does not increase the computational complexity during segmentation. Although estimating the initial k-mer parameter table requires significant time and effort on our part, it does not affect the runtime for end users applying SegPore to RNA004 data.
 
 Extending SegPore from 5-mers to 7-mers or 9-mers for RNA002 data would require substantial effort to retrain the model and generate sufficient training data. Additionally, such an extension would make SegPore’s output incompatible with widely used upstream and downstream tools such as Nanopolish and m6Anet, complicating integration and comparison. For these reasons, we leave this extension for future work.
 
 The manuscript suggests the eventalign results are improved compared to Nanopolish. While this is believably shown to be true (Table 1), the effect on the use case presented, downstream differentiation between modified and unmodified status on a base/kmer, is likely limited as during actual modification calling the noisy distributions are usually 'good enough', and not skewed significantly in one direction to really affect the results too terribly.
 
 Thank you for your comment. While current state-of-the-art (SOTA) methods perform well on benchmark datasets, there remains significant room for improvement. Most SOTA evaluations are based on limited datasets, primarily covering DRACH motifs in human and mouse transcriptomes. However, m6A modifications can also occur in non-DRACH motifs, where current models may underperform. Additionally, other RNA modifications—such as pseudouridine, inosine, and m5C—are less studied, and their detection may benefit from improved signal modeling.
 
 We would also like to emphasize that raw signal segmentation and RNA modification detection are distinct tasks. SegPore focuses on the former, providing a cleaner, more interpretable signal that can serve as a foundation for downstream tasks. Improved segmentation may facilitate the development of more accurate RNA modification detection algorithms by the community.
 
 Scientific progress often builds incrementally through targeted improvements to foundational components. We believe that enhancing signal segmentation, as SegPore does, contributes meaningfully to the broader field—the full impact will become clearer as the tool is adopted into more complex workflows.
 
 Furthermore, looking at alternative approaches where this kind of segmentation could be applied, Nanopolish uses the main segmentation+alignment for a first alignment and follows up with a form of targeted local realignment/HMM test for modification calling (and for training too), decreasing the need for the near-perfect segmentation+alignment this work attempts to provide. Any tool applying a similar strategy probably largely negates the problems this manuscript aims to improve upon.
 
 We thank the reviewer for this insightful comment.
 
 To clarify, Nanopolish provides three independent commands: polya, eventalign, and call-methylation.
 
 - The polya command identifies the adapter, poly(A) tail, and transcript region in the raw signal.
 
 - The eventalign command aligns the raw signal to a reference sequence, assigning a signal segment to individual k-mers in the reference.
 
 - The call-methylation command detects methylated bases from DNA sequencing data.
 
 The eventalign command corresponds to “the main segmentation+alignment for a first alignment,” while call-methylation corresponds to “a form of targeted local realignment/HMM test for modification calling,” as mentioned in the reviewer’s comment. SegPore’s segmentation is similar in purpose to Nanopolish’s eventalign, while its RNA modification estimation component is similar in concept to Nanopolish’s call-methylation.
 
 We agree the general idea may appear similar, but the implementations are entirely different. Importantly, Nanopolish’s call-methylation is designed for DNA sequencing data, and its models are not trained to recognize RNA modifications. This means they address distinct research questions and cannot be directly compared on the same RNA modification estimation task. However, it is valid to compare them on the segmentation task, where SegPore exhibits better performance (Table 1).
 
 We infer the reviewer may suggest that because m6Anet is a deep neural network capable of learning from noisy input, the benefit of more accurate segmentation (such as that provided by SegPore) might be limited. This concern may arise from the limited improvement of SegPore+m6Anet over Nanopolish+m6Anet in bulk analysis (Figure 3). Several factors may contribute to this observation:
 
 (i) For reads aligned to the same gene in the in vivo data, alignment may be inaccurate due to pseudogenes or transcript isoforms.
 
 (ii) The in vivo benchmark data are inherently more complex than in vitro datasets and may contain additional modifications (e.g., m5C, m7G), which can confound m6A calling by altering the signal baselines of k-mers.
 
 (iii) m6Anet is trained on events produced by Nanopolish and may not be optimal for SegPore-derived events.
 
 (iv) The benchmark dataset lacks a modification-free (IVT) control sample, making it difficult to establish a true baseline for each k-mer.
 
 In the IVT data (Figure 4), SegPore shows a clear improvement in single-molecule m6A identification, with a 3~4% gain in both ROC-AUC and PR-AUC. This demonstrates SegPore’s practical benefit for applications requiring higher sensitivity at the molecule level.
 
 As noted earlier, SegPore’s contribution lies in denoising and improving the accuracy of raw signal segmentation, which is a foundational step in many downstream analyses. While it may not yet lead to a dramatic improvement in all applications, it already provides valuable insights into the sequencing process (e.g., cleaner signal profiles in Figure 4) and enables measurable gains in modification detection at the single-read level. We believe SegPore lays the groundwork for developing more accurate and generalizable RNA modification detection tools beyond m6A.
 
 We have also added the following sentence in the discussion to highlight SegPore’s limited performance in bulk analysis:
 
 “The limited improvement of SegPore combined with m6Anet over Nanopolish+m6Anet in bulk in vivo analysis (Figure 3) may be explained by several factors: potential alignment inaccuracies due to pseudogenes or transcript isoforms, the complexity of in vivo datasets containing additional RNA modifications (e.g., m5C, m7G) affecting signal baselines, and the fact that m6Anet is specifically trained on events produced by Nanopolish rather than SegPore. Additionally, the lack of a modification-free control (in vitro transcribed) sample in the benchmark dataset makes it difficult to establish true baselines for each k-mer. Despite these limitations, SegPore demonstrates clear improvement in single-molecule m6A identification in IVT data (Figure 4), suggesting it is particularly well suited for in vitro transcription data analysis.”
 
 Finally, in the segmentation/alignment comparison to Nanopolish, the latter was not fitted(/trained) on the same data but appears to use the pre-trained model it comes with. For the sake of comparing segmentation/alignment quality directly, fitting Nanopolish on the same data used for SegPore could remove the influences of using different training datasets and focus on differences stemming from the algorithm itself.
 
 In the segmentation benchmark (Table 1), SegPore uses the fixed 5-mer parameter table provided by ONT. The hyperparameters of the HHMM are also fixed and not estimated from the raw signal data being segmented. Only in the m6A modification task, SegPore does perform re-estimation of the baselines for the modified and unmodified states of k-mers. Therefore, the comparison with Nanopolish is fair, as both tools rely on pre-defined models during segmentation.
 
 Appraisal:
 
 The authors have shown their method's ability to identify noise in the raw signal and remove their values from the segmentation and alignment, reducing its influences for further analyses. Figures directly comparing the values per kmer do show a visibly improved assignment of raw data per kmer. As a replacement for Nanopolish eventalign it seems to have a rather limited, but improved effect, on m6Anet results. At the single read level modification modification calling this work does appear to improve upon CHEUI.
 
 Impact:
 
 With the current developments for Nanopore-based modification largely focusing on Artificial Intelligence, Neural Networks, and the like, improvements made in interpretable approaches provide an important alternative that enables a deeper understanding of the data rather than providing a tool that plainly answers the question of whether a base is modified or not, without further explanation. The work presented is best viewed in the context of a workflow where one aims to get an optimal alignment between raw signal data and the reference base sequence for further processing. For example, as presented, as a possible replacement for Nanopolish eventalign. Here it might enable data exploration and downstream modification calling without the need for local realignments or other approaches that re-consider the distribution of raw data around the target motif, such as a 'local' Hidden Markov Model or Neural Networks. These possibilities are useful for a deeper understanding of the data and further tool development for modification detection works beyond m6A calling.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Nucleotide modifications are important regulators of biological function, however, until recently, their study has been limited by the availability of appropriate analytical methods. Oxford Nanopore direct RNA sequencing preserves nucleotide modifications, permitting their study, however, many different nucleotide modifications lack an available base-caller to accurately identify them. Furthermore, existing tools are computationally intensive, and their results can be difficult to interpret.
 
 Cheng et al. present SegPore, a method designed to improve the segmentation of direct RNA sequencing data and boost the accuracy of modified base detection.
 
 Strengths:
 
 This method is well-described and has been benchmarked against a range of publicly available base callers that have been designed to detect modified nucleotides.
 
 Weaknesses:
 
 However, the manuscript has a significant drawback in its current version. The most recent nanopore RNA base callers can distinguish between different ribonucleotide modifications, however, SegPore has not been benchmarked against these models.
 
 I recommend that re-submission of the manuscript that includes benchmarking against the rna004_130bps_hac@v5.1.0 and rna004_130bps_sup@v5.1.0 dorado models, which are reported to detect m5C, m6A_DRACH, inosine_m6A and PseU. A clear demonstration that SegPore also outperforms the newer RNA base caller models will confirm the utility of this method.
 
 Thank you for highlighting this important limitation. While Dorado, the new ONT basecaller, is publicly available and supports modification-aware basecalling, suitable public datasets for benchmarking m5C, inosine, m6A, and PseU detection on RNA004 are currently lacking. Dorado’s modification-aware models are trained on ONT’s internal data, which is not publicly released. Therefore, it is not currently feasible to evaluate or directly compare SegPore’s performance against Dorado for m5C, inosine, m6A, and PseU detection.
 
 We would also like to emphasize that SegPore’s main contribution lies in raw signal segmentation, which is an upstream task in the RNA modification detection pipeline. To assess its performance in this context, we benchmarked SegPore against f5c and Uncalled4 on public RNA004 datasets for segmentation quality. Please refer to our response to Point 1 of Reviewer 1 for details.
 
 Our results show that the characteristic “jiggling” behavior is also observed in RNA004 data (Supplementary Figure S4), and SegPore achieves better segmentation performance than both f5c and Uncalled4 (Table 2).
 
 Recommendations for the authors:
 
 Reviewing Editor:
 
 Please note that we also received the following comments on the submission, which we encourage you to take into account:
 
 took a look at the work and for what I saw it only mentions/uses RNA002 chemistry, which is deprecated, effectively making this software unusable by anyone any more, as RNA002 is not commercially available. While the results seem promising, the authors need to show that it would work for RNA004. Notably, there is an alternative software for resquiggling for RNA004 (not Tombo or Nanopolish, but the GPU-accelerated version of Nanopolish (f5C), which does support RNA004. Therefore, they need to show that SegPore works for RNA004, because otherwise it is pointless to see that this method works better than others if it does not support current sequencing chemistries and only works for deprecated chemistries, and people will keep using f5C because its the only one that currently works for RNA004. Alternatively, if there would be biological insights won from the method, one could justify not implementing it in RNA004, but in this case, RNA002 is deprecated since March 2024, and the paper is purely methodological.
 
 Thank you for the comment. We agree that support for current sequencing chemistries is essential for practical utility. While SegPore was initially developed and benchmarked on RNA002 due to the availability of public data, we have now extended SegPore to support RNA004 chemistry.
 
 To address this concern, we performed a benchmark comparison using public RNA004 datasets against tools specifically designed for RNA004, including f5c and Uncalled4. Please refer to our response to Point 1 of Reviewer 1 for details. The results show that SegPore consistently outperforms f5c and Uncalled4 in segmentation accuracy on RNA004 data.
 
 Reviewer #2 (Recommendations for the authors):
 
 Various statements are made throughout the text that require further explanation, which might actually be defined in more detail elsewhere sometimes but are simply hard to find in the current form.
 
 (1) Page 2, “In this technique, five nucleotides (5mers) reside in the nanopore at a time, and each 5mer generates a characteristic current signal based on its unique sequence and chemical properties (16).”
 
 5mer? Still on R9 or just ignoring longer range influences, relevant? It is indeed a R9.4 model from ONT.
 
 Thank you for the observation. We apologize for the confusion and have clarified the relevant paragraph to indicate that the method is developed for RNA002 data by default. Specifically, we have added the following sentence:
 
 “Two versions of the direct RNA sequencing (DRS) kits are available: RNA002 and RNA004. Unless otherwise specified, this study focuses on RNA002 data.”
 
 (2) Page 3, “Employ models like Hidden Markov Models (HMM) to segment the signal, but they are prone to noise and inaccuracies.”
 
 That's the alignment/calling part, not the segmentation?
 
 Thank you for the comment. We apologize for the confusion. To clarify the distinction between segmentation and alignment, we added a new paragraph before the one in question to explain the general workflow of Nanopore DRS data analysis and to clearly define the task of segmentation. The added text reads:
 
 “The general workflow of Nanopore direct RNA sequencing (DRS) data analysis is as follows. First, the raw electrical signal from a read is basecalled using tools such as Guppy or Dorado, which produce the nucleotide sequence of the RNA molecule. However, these basecalled sequences do not include the precise start and end positions of each ribonucleotide (or k-mer) in the signal. Because basecalling errors are common, the sequences are typically mapped to a reference genome or transcriptome using minimap2 to recover the correct reference sequence. Next, tools such as Nanopolish and Tombo align the raw signal to the reference sequence to determine which portion of the signal corresponds to each k-mer. We define this process as the segmentation task, referred to as "eventalign" in Nanopolish. Based on this alignment, Nanopolish extracts various features—such as the start and end positions, mean, and standard deviation of the signal segment corresponding to a k-mer. This signal segment or its derived features is referred to as an "event" in Nanopolish.”
 
 We also revised the following paragraph describing SegPore to more clearly contrast its approach:
 
 “In SegPore, we first segment the raw signal into small fragments using a Hierarchical Hidden Markov Model (HHMM), where each fragment corresponds to a sub-state of a k-mer. Unlike Nanopolish and Tombo, which directly align the raw signal to the reference sequence, SegPore aligns the mean values of these small fragments to the reference. After alignment, we concatenate all fragments that map to the same k-mer into a larger segment, analogous to the "eventalign" output in Nanopolish. For RNA modification estimation, we use only the mean signal value of each reconstructed event.”
 
 We hope this revision clarifies the difference between segmentation and alignment in the context of our method and resolves the reviewer’s concern.
 
 (3) Page 4, Figure 1, “These segments are then aligned with the 5mer list of the reference sequence fragment using a full/partial alignment algorithm, based on a 5mer parameter table. For example, 𝐴𝑗 denotes the base "A" at the j-th position on the reference.”
 
 I think I do understand the meaning, but I do not understand the relevance of the Aj bit in the last sentence. What is it used for?
 
 When aligning the segments (output from Step 2) to the reference sequence in Step 3, it is possible for multiple segments to align to the same k-mer. This can occur particularly when the reference contains consecutive identical bases, such as multiple adenines (A). For example, as shown in Fig. 1A, Step 3, the first two segments (μ₁ and μ₂) are aligned to the first 'A' in the reference sequence, while the third segment is aligned to the second 'A'. In this case, the reference sequence AACTGGTTTC...GTC, which contains exactly two consecutive 'A's at the start. This notation helps to disambiguate segment alignment in regions with repeated bases.
 
 Additionally, this figure and its subscript include mapping with Guppy and Minimap2 but do not mention Nanopolish at all, while that seems an equally important step in the preprocessing (pg5). As such it is difficult to understand the role Nanopolish exactly plays. It's also not mentioned explicitly in the SegPore Workflow on pg15, perhaps it's part of step 1 there?
 
 We thank the reviewer for pointing this out. We apologize for the confusion. As mentioned in the public response to point 3 of Reviewer 2, SegPore uses Nanopolish to identify the poly(A) tail and transcript regions from the raw signal. SegPore then performs segmentation and alignment on the transcript portion only. This step is indeed part of Step 1 in the preprocessing workflow, as described in Supplementary Note 1, Section 3.
 
 To clarify this in the main text, we have updated the preprocessing paragraph on page 6 to explicitly describe the role of Nanopolish:
 
 “We begin by performing basecalling on the input fast5 file using Guppy, which converts the raw signal data into ribonucleotide sequences. Next, we align the basecalled sequences to the reference genome using Minimap2, generating a mapping between the reads and the reference sequences. Nanopolish provides two independent commands: "polya" and "eventalign". The "polya" command identifies the adapter, poly(A) tail, and transcript region in the raw signal, which we refer to as the poly(A) detection results. The raw signal segment corresponding to the poly(A) tail is used to standardize the raw signal for each read. The "eventalign" command aligns the raw signal to a reference sequence, assigning a signal segment to individual k-mers in the reference. It also computes summary statistics (e.g., mean, standard deviation) from the signal segment for each k-mer. Each k-mer together with its corresponding signal features is termed an event. These event features are then passed into downstream tools such as m6Anet and CHEUI for RNA modification detection. For full transcriptome analysis (Figure 3), we extract the aligned raw signal segment and reference sequence segment from Nanopolish's events for each read by using the first and last events as start and end points. For in vitro transcription (IVT) data with a known reference sequence (Figure 4), we extract the raw signal segment corresponding to the transcript region for each input read based on Nanopolish’s poly(A) detection results.”
 
 Additionally, we revised the legend of Figure 1A to explicitly include Nanopolish in step 1 as follows:
 
 “The raw current signal fragments are paired with the corresponding reference RNA sequence fragments using Nanopolish.”
 
 (4) Page 5, “The output of Step 3 is the "eventalign," which is analogous to the output generated by the Nanopolish "eventalign" command.”
 
 Naming the function of Nanopolish, the output file, and later on (pg9) the alignment of the newly introduced methods the exact same "eventalign" is very confusing.
 
 Thank you for the helpful comment. We acknowledge the potential confusion caused by using the term “eventalign” in multiple contexts. To improve clarity, we now consistently use the term “events” to refer to the output of both Nanopolish and SegPore, rather than using "eventalign" as a noun. We also added the following sentence to Step 3 (page 6) to clearly define what an “event” refers to in our manuscript:
 
 “An "event" refers to a segment of the raw signal that is aligned to a specific k-mer on a read, along with its associated features such as start and end positions, mean current, standard deviation, and other relevant statistics.”
 
 We have revised the text throughout the manuscript accordingly to reduce ambiguity and ensure consistent terminology.
 
 (5) Page 5, “Once aligned, we use Nanopolish's eventalign to obtain paired raw current signal segments and the corresponding fragments of the reference sequence, providing a precise association between the raw signals and the nucleotide sequence.”
 
 I thought the new method's HHMM was supposed to output an 'eventalign' formatted file. As this is not clearly mentioned elsewhere, is this a mistake in writing? Is this workflow dependent on Nanopolish 'eventalign' function and output or not?
 
 We apologize for the confusion. To clarify, SegPore is not dependent on Nanopolish’s eventalign function for generating the final segmentation results. As described in our response to your comment point 2 and elaborated in the revised text on page 4, SegPore uses its own HHMM-based segmentation model to divide the raw signal into small fragments, each corresponding to a sub-state of a k-mer. These fragments are then aligned to the reference sequence based on their mean current values.
 
 As explained in the revised manuscript:
 
 “In SegPore, we first segment the raw signal into small fragments using a Hierarchical Hidden Markov Model (HHMM), where each fragment corresponds to a sub-state of a k-mer. Unlike Nanopolish and Tombo, which directly align the raw signal to the reference sequence, SegPore aligns the mean values of these small fragments to the reference. After alignment, we concatenate all fragments that map to the same k-mer into a larger segment, analogous to the "eventalign" output in Nanopolish. For RNA modification estimation, we use only the mean signal value of each reconstructed event.”
 
 To avoid ambiguity, we have also revised the sentence on page 5 to more clearly distinguish the roles of Nanopolish and SegPore in the workflow. The updated sentence now reads:
 
 “Nanopolish provides two independent commands: "polya" and "eventalign". The "polya" command identifies the adapter, poly(A) tail, and transcript region in the raw signal, which we refer to as the poly(A) detection results. The raw signal segment corresponding to the poly(A) tail is used to standardize the raw signal for each read. The "eventalign" command aligns the raw signal to a reference sequence, assigning a signal segment to individual k-mers in the reference. It also computes summary statistics (e.g., mean, standard deviation) from the signal segment for each k-mer. Each k-mer together with its corresponding signal features is termed an event. These event features are then passed into downstream tools such as m6Anet and CHEUI for RNA modification detection. For full transcriptome analysis (Figure 3), we extract the aligned raw signal segment and reference sequence segment from Nanopolish's events for each read by using the first and last events as start and end points. For in vitro transcription (IVT) data with a known reference sequence (Figure 4), we extract the raw signal segment corresponding to the transcript region for each input read based on Nanopolish’s poly(A) detection results.”
 
 (6) Page 5, “Since the polyA tail provides a stable reference, we normalize the raw current signals across reads, ensuring that the mean and standard deviation of the polyA tail are consistent across all reads.”
 
 Perhaps I misread this statement: I interpret it as using the PolyA tail to do the normalization, rather than using the rest of the signal to do the normalization, and that results in consistent PolyA tails across all reads.
 
 If it's the latter, this should be clarified, and a little detail on how the normalization is done should be added, but if my first interpretation is correct:
 
 I'm not sure if its standard deviation is consistent across reads. The (true) value spread in this section of a read should be fairly limited compared to the rest of the signal in the read, so the noise would influence the scale quite quickly, and such noise might be introduced to pores wearing down and other technical influences. Is this really better than using the non-PolyA tail part of the reads signal, using Median Absolute Deviation to scale for a first alignment round, then re-fitting the signal scaling using Theil Sen on the resulting alignments (assigned read signal vs reference expected signal), as Tombo/Nanopolish (can) do?
 
 Additionally, this kind of normalization should have been part of the Nanopolish eventalign already, can this not be re-used? If it's done differently it may result in different distributions than the ONT kmer table obtained for the next step.
 
 Thank you for this detailed and thoughtful comment. We apologize for the confusion. The poly(A) tail–based normalization is indeed explained in Supplementary Note 1, Section 3, but we agree that the motivation needed to be clarified in the main text.
 
 We have now added the following sentence in the revised manuscript (before the original statement on page 5 to provide clearer context:
 
 “Due to inherent variability between nanopores in the sequencing device, the baseline levels and standard deviations of k-mer signals can differ across reads, even for the same transcript. To standardize the signal for downstream analyses, we extract the raw current signal segments corresponding to the poly(A) tail of each read. Since the poly(A) tail provides a stable reference, we normalize the raw current signals across reads, ensuring that the mean and standard deviation of the poly(A) tail are consistent across all reads. This step is crucial for reducing…..”
 
 We chose to use the poly(A) tail for normalization because it is sequence-invariant—i.e., all poly(A) tails consist of identical k-mers, unlike transcript sequences which vary in composition. In contrast, using the transcript region for normalization can introduce biases: for instance, reads with more diverse k-mers (having inherently broader signal distributions) would be forced to match the variance of reads with more uniform k-mers, potentially distorting the baseline across k-mers.
 
 In our newly added RNA004 benchmark experiment, we used the default normalization provided by f5c, which does not include poly(A) tail normalization. Despite this, SegPore was still able to mask out noise and outperform both f5c and Uncalled4, demonstrating that our segmentation method is robust to different normalization strategies.
 
 (7) Page 7, “The initialization of the 5mer parameter table is a critical step in SegPore's workflow. By leveraging ONT's established kmer models, we ensure that the initial estimates for unmodified 5mers are grounded in empirical data.”
 
 It looks like the method uses Nanopolish for a first alignment, then improves the segmentation matching the reference sequence/expected 5mer values. I thought the Nanopolish model/tables are based on the same data, or similarly obtained. If they are different, then why the switch of kmer model? Now the original alignment may have been based on other values, and thus the alignment may seem off with the expected kmer values of this table.
 
 Thank you for this insightful question. To clarify, SegPore uses Nanopolish only to identify the poly(A) tail and transcript regions from the raw signal. In the bulk in vivo data analysis, we use Nanopolish’s first event as the start and the last event as the end to extract the aligned raw signal chunk and its corresponding reference sequence. Since SegPore relies on Nanopolish solely to delineate the transcript region for each read, it independently aligns the raw signals to the reference sequence without refining or adjusting Nanopolish’s segmentation results.
 
 While SegPore's 5-mer parameter table is initially seeded using ONT’s published unmodified k-mer models, we acknowledge that empirical signal values may deviate from these reference models due to run-specific technical variation and the presence of RNA modifications. For this reason, SegPore includes a parameter re-estimation step to refine the mean and standard deviation values of each k-mer based on the current dataset.
 
 The re-estimation process consists of two layers. In the outer layer, we select a set of 5mers that exhibit both modified and unmodified states based on the GMM results (Section 6 of Supplementary Note 1), while the remaining 5mers are assumed to have only unmodified states. In the inner layer, we align the raw signals to the reference sequences using the 5mer parameter table estimated in the outer layer (Section 5 of Supplementary Note 1). Based on the alignment results, we update the 5mer parameter table in the outer layer. This two-layer process is generally repeated for 3~5 iterations until the 5mer parameter table converges.This re-estimation ensures that:
 
 (1) The adjusted 5mer signal baselines remain close to the ONT reference (for consistency);
 
 (2) The alignment score between the observed signal and the reference sequence is optimized (as detailed in Equation 11, Section 5 of Supplementary Note 1);
 
 (3) Only 5mers that show a clear difference between the modified and unmodified components in the GMM are considered subject to modification.
 
 By doing so, SegPore achieves more accurate signal alignment independent of Nanopolish’s models, and the alignment is directly tuned to the data under analysis.
 
 (8) Page 9, “The output of the alignment algorithm is an eventalign, which pairs the base blocks with the 5mers from the reference sequence for each read (Fig. 1C).”
 
 “Modification prediction
 
 After obtaining the eventalign results, we estimate the modification state of each motif using the 5mer parameter table.”
 
 This wording seems to have been introduced on page 5 but (also there) reads a bit confusingly as the name of the output format, file, and function are now named the exact same "eventalign". I assume the obtained eventalign results now refer to the output of your HHMM, and not the original Nanopolish eventalign results, based on context only, but I'd rather have a clear naming that enables more differentiation.
 
 We apologize for the confusion. We have revised the sentence as follows for clarity:
 
 “A detailed description of both alignment algorithms is provided in Supplementary Note 1. The output of the alignment algorithm is an alignment that pairs the base blocks with the 5mers from the reference sequence for each read (Fig. 1C). Base blocks aligned to the same 5-mer are concatenated into a single raw signal segment (referred to as an “event”), from which various features—such as start and end positions, mean current, and standard deviation—are extracted. Detailed derivation of the mean and standard deviation is provided in Section 5.3 in Supplementary Note 1. In the remainder of this paper, we refer to these resulting events as the output of eventalign analysis or the segmentation task. ”
 
 (9) Page 9, “Since a single 5mer can be aligned with multiple base blocks, we merge all aligned base blocks by calculating a weighted mean. This weighted mean represents the single base block mean aligned with the given 5mer, allowing us to estimate the modification state for each site of a read.”
 
 I assume the weights depend on the length of the segment but I don't think it is explicitly stated while it should be.
 
 Thank you for the helpful observation. To improve clarity, we have moved this explanation to the last paragraph of the previous section (see response to point 8), where we describe the segmentation process in more detail.
 
 Additionally, a complete explanation of how the weighted mean is computed is provided in Section 5.3 of Supplementary Note 1. It is derived from signal points that are assigned to a given 5mer.
 
 (10) Page 10, “Afterward, we manually adjust the 5mer parameter table using heuristics to ensure that the modified 5mer distribution is significantly distinct from the unmodified distribution.”
 
 Using what heuristics? If this is explained in the supplementary notes then please refer to the exact section.
 
 Thank you for pointing this out. The heuristics used to manually adjust the 5mer parameter table are indeed explained in detail in Section 7 of Supplementary Note 1.
 
 To clarify this in the manuscript, we have revised the sentence as follows:
 
 “Afterward, we manually adjust the 5mer parameter table using heuristics to ensure that the modified 5mer distribution is significantly distinct from the unmodified distribution (see details in Section 7 of Supplementary Note 1).”
 
 (11) Page 10, “Once the table is fixed, it is used for RNA modification estimation in the test data without further updates.”
 
 By what tool/algorithm? Perhaps it is your own implementation, but with the next section going into segmentation benchmarking and using Nanopolish before this seems undefined.
 
 Thank you for pointing this out. We use our own implementation. See Algorithm 3 in Section 6 of Supplementary Note 1.
 
 We have revised the sentence for clarity:
 
 “Once a stabilized 5mer parameter table is estimated from the training data, it is used for RNA modification estimation in the test data without further updates. A more detailed description of the GMM re-estimation process is provided in Section 6 of Supplementary Note 1.”
 
 (12) Page 11, “A 5mer was considered significantly modified if its read coverage exceeded 1,500 and the distance between the means of the two Gaussian components in the GMM was greater than 5.”
 
 Considering the scaling done before also not being very detailed in what range to expect, this cutoff doesn't provide any useful information. Is this a pA value?
 
 Thank you for the observation. Yes, the value refers to the current difference measured in picoamperes (pA). To clarify this, we have revised the sentence in the manuscript to include the unit explicitly:
 
 “A 5mer was considered significantly modified if its read coverage exceeded 1,500 and the distance between the means of the two Gaussian components in the GMM was greater than 5 picoamperes (pA).”
 
 (13) Page 13, “The raw current signals, as shown in Figure 1B.”
 
 Wrong figure? Figure 2B seems logical.
 
 Thank you for catching this. You are correct—the reference should be to Figure 2B, not Figure 1B. We have corrected this in the revised manuscript.
 
 (14) Page 14, Figure 2A, these figures supposedly support the jiggle hypothesis but the examples seem to match only half the explanation. Any of these jiggles seem to be followed shortly by another in the opposite direction, and the amplitude seems to match better within each such pair than the next or previous segments. Perhaps there is a better explanation still, and this behaviour can be modelled as such instead.
 
 Thank you for your comment. We acknowledge that the observed signal patterns may appear ambiguous and could potentially suggest alternative explanations. However, as shown in Figure 2A, the red dots tend to align closely with the baseline of the previous state, while the blue dots align more closely with the baseline of the next state. We interpret this as evidence for the "jiggling" hypothesis, where k-mer temporarily oscillates between adjacent states during translocation.
 
 That said, we agree that more sophisticated models could be explored to better capture this behavior, and we welcome suggestions or references to alternative models. We will consider this direction in future work.
 
 (15) Page 15, “This occurs because subtle transitions within a base block may be mistaken for transitions between blocks, leading to inflated transition counts.”
 
 Is it really a "subtle transition" if it happens within a base block? It seems this is not a transition and thus shouldn't be named as such.
 
 Thank you for pointing this out. We agree that the term “subtle transition” may be misleading in this context. We revised the sentence to clarify the potential underlying cause of the inflated transition counts:
 
 “This may be due to a base block actually corresponding to a sub-state of a single 5mer, rather than each base block corresponding to a full 5mer, leading to inflated transition counts. To address this issue, SegPore’s alignment algorithm was refined to merge multiple base blocks (which may represent sub-states of the same 5mer) into a single 5mer, thereby facilitating further analysis.”
 
 (16) Page 15, “The SegPore "eventalign" output is similar to Nanopolish's "eventalign" command.”
 
 To the output of that command, I presume, not to the command itself.
 
 Thank you for pointing out the ambiguity. We have revised the sentence for clarity:
 
 “The final outputs of SegPore are the events and modification state predictions. SegPore’s events are similar to the outputs of Nanopolish’s "eventalign" command, in that they pair raw current signal segments with the corresponding RNA reference 5-mers. Each 5-mer is associated with various features — such as start and end positions, mean current, and standard deviation — derived from the paired signal segment.”
 
 (17) Page 15, “For selected 5mers, SegPore also provides the modification rate for each site and the modification state of that site on individual reads.”
 
 What selection? Just all kmers with a possible modified base or a more specific subset?
 
 We revised the sentence to clarify the selection criteria:
 
 “For selected 5mers that exhibit both a clearly unmodified and a clearly modified signal component, SegPore reports the modification rate at each site, as well as the modification state of that site on individual reads.”
 
 (18) Page 16, “A key component of SegPore is the 5mer parameter table, which specifies the mean and standard deviation for each 5mer in both modified and unmodified states (Figure 2A).”
 
 Wrong figure?
 
 Thank you for pointing this out. You are correct—it should be Figure 1A, not Figure 2A. We intended to visually illustrate the structure of the 5mer parameter table in Figure 1A, and we have corrected this reference in the revised manuscript.
 
 (19) Page 16, Table 1, I can't quite tell but I assume this is based on all kmers in the table, not just a m6A modified subset. A short added statement to make this clearer would help.
 
 Yes, you are right—it is averaged over all 5mers. We have revised the sentence for clarity as follows:
 
 " As shown in Table 1, SegPore consistently achieved the best performance averaged on all 5mers across all datasets..…."
 
 (20) Page 16, “Since the peaks (representing modified and unmodified states) are separable for only a subset of 5mers, SegPore can provide modification parameters for these specific 5mers. For other 5mers, modification state predictions are unavailable.”
 
 Can this be improved using some heuristics rather than the 'distance of 5' cutoff as described before? How small or big is this subset, compared to how many there should be to cover all cases?
 
 We agree that more sophisticated strategies could potentially improve performance. In this study, we adopted a relatively conservative approach to minimize false positives by using a heuristic cutoff of 5 picoamperes. This value was selected empirically and we did not explore alternative cutoffs. Future work could investigate more refined or data-driven thresholding strategies.
 
 (21) Page 16, “Tombo used the "resquiggle" method to segment the raw signals, and we standardized the segments using the polyA tail to ensure a fair comparison.”
 
 I don't know what or how something is "standardized" here.
 
 Standardized’ refers to the poly(A) tail–based signal normalization described in our response to point 6. We applied this normalization to Tombo’s output to ensure a fair comparison across methods. Without this standardization, Tombo’s performance was notably worse. We revised the sentence as follows:
 
 “Tombo used the "resquiggle" method to segment the raw signals, and we standardized the segments using the poly(A) tail to ensure a fair comparison (See preprocessing section in Materials and Methods).”
 
 (22) Page 16, “To benchmark segmentation performance, we used two key metrics: (1) the log-likelihood of the segment mean, which measures how closely the segment matches ONT's 5mer parameter table (used as ground truth), and (2) the standard deviation (std) of the segment, where a lower std indicates reduced noise and better segmentation quality. If the raw signal segment aligns correctly with the corresponding 5mer, its mean should closely match ONT's reference, yielding a high log-likelihood. A lower std of the segment reflects less noise and better performance overall.”
 
 Here the segmentation part becomes a bit odd:
 
 A: Low std can be/is achieved by dropping any noisy bits, making segments really small (partly what happens here with the transition segments). This may be 'true' here, in the sense that the transition is not really part of the segment, but the comparison table is a bit meaningless as the other tools forcibly assign all data to kmers, instead of ignoring parts as transition states. In other words, it is a benchmark that is easy to cheat by assigning more data to noise/transition states.
 
 B: The values shown are influenced by the alignment made between the read and expected reference signal. Especially Tombo tends to forcibly assign data to whatever looks the most similar nearby rather than providing the correct alignment. So the "benchmark of the segmentation performance" is more of an "overall benchmark of the raw signal alignment". Which is still a good, useful thing, but the text seems to suggest something else.
 
 Thank you for raising these important concerns regarding the segmentation benchmarking.
 
 Regarding point A, the base blocks aligned to the same 5mer are concatenated into a single segment, including the short transition blocks between them. These transition blocks are typically very short (4~10 signal points, average 6 points), while a typical 5mer segment contains around 20~60 signal points. To assess whether SegPore’s performance is inflated by excluding transition segments, we conducted an additional comparison: we removed 6 boundary signal points (3 from the start and 3 from the end) from each 5mer segment in Nanopolish and Tombo’s results to reduce potential noise. The new comparison table is shown in the following:
 
 SegPore consistently demonstrates superior performance. Its key contribution lies in its ability to recognize structured noise in the raw signal and to derive more accurate mean and standard deviation values that more faithfully represent the true state of the k-mer in the pore. The improved mean estimates are evidenced by the clearly separated peaks of modified and unmodified 5mers in Figures 3A and 4B, while the improved standard deviation is reflected in the segmentation benchmark experiments.
 
 Regarding point B, we apologize for the confusion. We have added a new paragraph to the introduction to clarify that the segmentation task indeed includes the alignment step.
 
 “The general workflow of Nanopore direct RNA sequencing (DRS) data analysis is as follows. First, the raw electrical signal from a read is basecalled using tools such as Guppy or Dorado, which produce the nucleotide sequence of the RNA molecule. However, these basecalled sequences do not include the precise start and end positions of each ribonucleotide (or k-mer) in the signal. Because basecalling errors are common, the sequences are typically mapped to a reference genome or transcriptome using minimap2 to recover the correct reference sequence. Next, tools such as Nanopolish and Tombo align the raw signal to the reference sequence to determine which portion of the signal corresponds to each k-mer. We define this process as the segmentation task, referred to as "eventalign" in Nanopolish. Based on this alignment, Nanopolish extracts various features—such as the start and end positions, mean, and standard deviation of the signal segment corresponding to a k-mer. This signal segment or its derived features is referred to as an "event" in Nanopolish. The resulting events serve as input for downstream RNA modification detection tools such as m6Anet and CHEUI.”
 
 (23) Page 17 “Given the comparable methods and input data requirements, we benchmarked SegPore against several baseline tools, including Tombo, MINES (26), Nanom6A (27), m6Anet, Epinano (28), and CHEUI (29).”
 
 It seems m6Anet is actually Nanopolish+m6Anet in Figure 3C, this needs a minor clarification here.
 
 m6Anet uses Nanopolish’s estimated events as input by default.
 
 (24) Page 18, Figure 3, A and B are figures without any indication of what is on the axis and from the text I believe the position next to each other on the x-axis rather than overlapping is meaningless, while their spread is relevant, as we're looking at the distribution of raw values for this 5mer. The figure as is is rather confusing.
 
 Thanks for pointing out the confusion. We have added concrete values to the axes in Figures 3A and 3B and revised the figure legend as follows in the manuscript:
 
 “(A) Histogram of the estimated mean from current signals mapped to an example m6A-modified genomic location (chr10:128548315, GGACT) across all reads in the training data, comparing Nanopolish (left) and SegPore (right). The x-axis represents current in picoamperes (pA).
 
 (B) Histogram of the estimated mean from current signals mapped to the GGACT motif at all annotated m6A-modified genomic locations in the training data, again comparing Nanopolish (left) and SegPore (right). The x-axis represents current in picoamperes (pA).”
 
 (25) Page 18 “SegPore's results show a more pronounced bimodal distribution in the raw signal segment mean, indicating clearer separation of modified and unmodified signals.”
 
 Without knowing the correct values around the target kmer (like Figure 4B), just the more defined bimodal distribution could also indicate the (wrongful) assignment of neighbouring kmer values to this kmer instead, hence this statement lacks some needed support, this is just one interpretation of the possible reasons.
 
 Thank you for the comment. We have added concrete values to Figures 3A and 3B to support this point. Both peaks fall within a reasonable range: the unmodified peak (125 pA) is approximately 1.17 pA away from its reference value of 123.83 pA, and the modified peak (118 pA) is around 7 pA away from the unmodified peak. This shift is consistent with expected signal changes due to RNA modifications (usually less than 10 pA), and the magnitude of the difference suggests that the observed bimodality is more likely caused by true modification events rather than misalignment.
 
 (26) Page 18 “Furthermore, when pooling all reads mapped to m6A-modified locations at the GGACT motif, SegPore showed prominent peaks (Fig. 3B), suggesting reduced noise and improved modification detection.”
 
 I don't think the prominent peaks directly suggest improved detection, this statement is a tad overreaching.
 
 We revised the sentense to the following:
 
 “SegPore exhibited more distinct peaks (Fig. 3B), indicating reduced noise and potentially enabling more reliable modification detection”.
 
 (27) Page18 “(2) direct m6A predictions from SegPore's Gaussian Mixture Model (GMM), which is limited to the six selected 5mers.”
 
 The 'six selected' refers to what exactly? Also, 'why' this is limited to them is also unclear as it is, and it probably would become clearer if it is clearly defined what this refers to.
 
 It is explained the page 16 in the SegPore’s workflow in the original manuscript as follows:
 
 “A key component of SegPore is the 5mer parameter table, which specifies the mean and standard deviation for each 5mer in both modified and unmodified states (Fig. 2A1A). Since the peaks (representing modified and unmodified states) are separable for only a subset of 5mers, SegPore can provide modification parameters for these specific 5mers. For other 5mers, modification state predictions are unavailable.”
 
 e select a small set of 5mers that show clear peaks (modified and unmodified 5mers) in GMM in the m6A site-level data analysis. These 5mers are provided in Supplementary Fig. S2C, as explained in the section “m6A site level benchmark” in the Material and Methods (page 12 in the original manuscript).
 
 “…transcript locations into genomic coordinates. It is important to note that the 5mer parameter table was not re-estimated for the test data. Instead, modification states for each read were directly estimated using the fixed 5mer parameter table. Due to the differences between human (Supplementary Fig. S2A) and mouse (Supplementary Fig. S2B), only six 5mers were found to have m6A annotations in the test data’s ground truth (Supplementary Fig. S2C). For a genomic location to be identified as a true m6A modification site, it had to correspond to one of these six common 5mers and have a read coverage of greater than 20. SegPore derived the ROC and PR curves for benchmarking based on the modification rate at each genomic location….”
 
 We have updated the sentence as follows to increase clarity:
 
 “which is limited to the six selected 5mers that exhibit clearly separable modified and unmodified components in the GMM (see Materials and Methods for details).”
 
 (28) Page 19, Figure 4C, the blue 'Unmapped' needs further explanation. If this means the segmentation+alignment resulted in simply not assigning any segment to a kmer, this would indicate issues in the resulting mapping between raw data and kmers as the data that probably belonged to this kmer is likely mapped to a neighbouring kmer, possibly introducing a bimodal distribution there.
 
 This is due to deletion event in the full alignment algorithm. See Page 8 of SupplementaryNote1:
 
 During the traceback step of the dynamic programming matrix, not every 5mer in the reference sequence is assigned a corresponding raw signal fragment—particularly when the signal’s mean deviates substantially from the expected mean of that 5mer. In such cases, the algorithm considers the segment to be generated by an unknown 5mer, and the corresponding reference 5mer is marked as unmapped.
 
 (29) Page 19, “For six selected m6A motifs, SegPore achieved an ROC AUC of 82.7% and a PR AUC of 38.7%, earning the third-best performance compared with deep leaning methods m6Anet and CHEUI (Fig. 3D).”
 
 How was this selection of motifs made, are these related to the six 5mers in the middle of Supplementary Figure S2? Are these the same six as on page 18? This is not clear to me.
 
 It is the same, see the response to point 27.
 
 (30) Page 21 “Biclustering reveals that modifications at the 6th, 7th, and 8th genomic locations are specific to certain clusters of reads (clusters 4, 5, and 6), while the first five genomic locations show similar modification patterns across all reads.”
 
 This reads rather confusingly. Both the '6th, 7th, and 8th genomic locations' and 'clusters 4,5,6' should be referred to in clearer terms. Either mark them in the figure as such or name them in the text by something that directly matches the text in the figure.
 
 We have added labels to the clusters and genomic locations Figure 4C, and revised the sentence as follows:
 
 “Biclustering reveals that modifications at g6 are specific to cluster C4, g7 to cluster C5, and g8 to cluster C6, while the first five genomic locations (g1 to g5) show similar modification patterns across all reads.”
 
 (31) Page 21, “We developed a segmentation algorithm that leverages the jiggling property in the physical process of DRS, resulting in cleaner current signals for m6A identification at both the site and single-molecule levels.”
 
 Leverages, or just 'takes into account'?
 
 We designed our HHMM specifically based on the jiggling hypothesis, so we believe that using the term “leverage” is appropriate.
 
 (32) Page 21, “Our results show that m6Anet achieves superior performance, driven by SegPore's enhanced segmentation.”
 
 Superior in what way? It barely improves over Nanopolish in Figure 3C and is outperformed by other methods in Figure 3D. The segmentation may have improved but this statement says something is 'superior' driven by that 'enhanced segmentation', so that cannot refer to the segmentation itself.
 
 We revise it as follows in the revised manuscript:
 
 ”Our results demonstrate that SegPore’s segmentation enables clear differentiation between m6A-modified and unmodified adenosines.”
 
 (33) Page 21, “In SegPore, we assume a drastic change between two consecutive 5mers, which may hold for 5mers with large difference in their current baselines but may not hold for those with small difference.”
 
 The implications of this assumption don't seem highlighted enough in the work itself and may be cause for falsely discovering bi-modal distributions. What happens if such a 5mer isn't properly split, is there no recovery algorithm later on to resolve these cases?
 
 We agree that there is a risk of misalignment, which can result in a falsely observed bimodal distribution. This is a known and largely unavoidable issue across all methods, including deep neural network–based methods. For example, many of these models rely on a CTC (Connectionist Temporal Classification) layer, which implicitly performs alignment and may also suffer from similar issues.
 
 Misalignment is more likely when the current baselines of neighboring k-mers are close. In such cases, the model may struggle to confidently distinguish between adjacent k-mers, increasing the chance that signals from neighboring k-mers are incorrectly assigned. Accurate baseline estimation for each k-mer is therefore critical—when baselines are accurate, the correct alignment typically corresponds to the maximum likelihood.
 
 We have added the following sentence to the discussion to acknowledge this limitation:
 
 “As with other RNA modification estimation methods, SegPore can be affected by misalignment errors, particularly when the baseline signals of adjacent k-mers are similar. These cases may lead to spurious bimodal signal distributions and require careful interpretation.”
 
 (34) Page 21, “Currently, SegPore models only the modification state of the central nucleotide within the 5mer. However, modifications at other positions may also affect the signal, as shown in Figure 4B. Therefore, introducing multiple states to the 5mer could help to improve the performance of the model.”
 
 The meaning of this statement is unclear to me. Is SegPore unable to combine the information of overlapping kmers around a possibly modified base (central nucleotide), or is this referring to having multiple possible modifications in a single kmer (multiple states)?
 
 We mean there can be modifications at multiple positions of a single 5mer, e.g. C m5C m6A m7G T. We have revised the sentence to:
 
 “Therefore, introducing multiple states for a 5mer to accout for modifications at mutliple positions within the same 5mer could help to improve the performance of the model.”
 
 (35) Page 22, “This causes a problem when apply DNN-based methods to new dataset without short read sequencing-based ground truth. Human could not confidently judge if a predicted m6A modification is a real m6A modification.”
 
 Grammatical errors in both these sentences. For the 'Human could not' part, is this referring to a single person's attempt or more extensively tested?
 
 Thanks for the comment. We have revised the sentence as follows:
 
 “This poses a challenge when applying DNN-based methods to new datasets without short-read sequencing-based ground truth. In such cases, it is difficult for researchers to confidently determine whether a predicted m6A modification is genuine (see Supplmentary Figure S5).”
 
 (36) Page 22, “…which is easier for human to interpret if a predicted m6A site is real.”
 
 "a" human, but also this probably meant to say 'whether' instead of 'if', or 'makes it easier'.
 
 Thanks for the advice. We have revise the sentence as follows:
 
 “One can generally observe a clear difference in the intensity levels between 5mers with an m6A and those with a normal adenosine, which makes it easier for a researcher to interpret whether a predicted m6A site is genuine.”
 
 (37) Page 22, “…and noise reduction through its GMM-based approach…”
 
 Is the GMM providing noise reduction or segmentation?
 
 Yes, we agree that it is not relevant. We have removed the sentence in the revised manuscript as follows:
 
 “Although SegPore provides clear interpretability and noise reduction through its GMM-based approach, there is potential to explore DNN-based models that can directly leverage SegPore's segmentation results.”
 
 (38) Page 23, “SegPore effectively reduces noise in the raw signal, leading to improved m6A identification at both site and single-molecule levels…”
 
 Without further explanation in what sense this is meant, 'reduces noise' seems to overreach the abilities, and looks more like 'masking out'.
 
 Following the reviewer’s suggestion, we change it to ‘mask out'’ in the revised manuscript.
 
 “SegPore effectively masks out noise in the raw signal, leading to improved m6A identification at both site and single-molecule levels.”
 
 Reviewer #3 (Recommendations for the authors):
 
 I recommend the publication of this manuscript, provided that the following comments (and the comments above) are addressed.
 
 In general, the authors state that SegPore represents an improvement on existing software. These statements are largely unquantified, which erodes their credibility. I have specified several of these in the Minor comments section.
 
 Page 5, Preprocessing: The authors comment that the poly(A) tail provides a stable reference that is crucial for the normalisation of all reads. How would this step handle reads that have variable poly(A) tail lengths? Or have interrupted poly(A) tails (e.g. in the case of mRNA vaccines that employ a linker sequence)?
 
 We apologize for the confusion. The poly(A) tail–based normalization is explained in Supplementary Note 1, Section 3.
 
 As shown in Author response image 1 below, the poly(A) tail produces a characteristic signal pattern—a relatively flat, squiggly horizontal line. Due to variability between nanopores, raw current signals often exhibit baseline shifts and scaling of standard deviations. This means that the signal may be shifted up or down along the y-axis and stretched or compressed in scale.
 
 Author response image 1.
 
 The normalization remains robust with variable poly(A) tail lengths, as long as the poly(A) region is sufficiently long. The linker sequence will be assigned to the adapter part rather than the poly(A) part.
 
 To improve clarity in the revised manuscript, we have added the following explanation:
 
 “Due to inherent variability between nanopores in the sequencing device, the baseline levels and standard deviations of k-mer signals can differ across reads, even for the same transcript. To standardize the signal for downstream analyses, we extract the raw current signal segments corresponding to the poly(A) tail of each read. Since the poly(A) tail provides a stable reference, we normalize the raw current signals across reads, ensuring that the mean and standard deviation of the poly(A) tail are consistent across all reads. This step is crucial for reducing…..”
 
 We chose to use the poly(A) tail for normalization because it is sequence-invariant—i.e., all poly(A) tails consist of identical k-mers, unlike transcript sequences which vary in composition. In contrast, using the transcript region for normalization can introduce biases: for instance, reads with more diverse k-mers (having inherently broader signal distributions) would be forced to match the variance of reads with more uniform k-mers, potentially distorting the baseline across k-mers.
 
 Page 7, 5mer parameter table: r9.4_180mv_70bps_5mer_RNA is an older kmer model (>2 years). How does your method perform with the newer RNA kmer models that do permit the detection of multiple ribonucleotide modifications? Addressing this comment is crucial because it is feasible that SegPore will underperform in comparison to the newer RNA base caller models (requiring the use of RNA004 datasets).
 
 Thank you for highlighting this important point. For RNA004, we have updated SegPore to ensure compatibility with the latest kit. In our revised manuscript, we demonstrate that the translocation-based segmentation hypothesis remains valid for RNA004, as supported by new analyses presented in the supplementary Figure S4.
 
 Additionally, we performed a new benchmark with f5c and Uncalled4 in RNA004 data in the revised manuscript (Table 2), where SegPore exhibit a better performance than f5c and Uncalled4.
 
 We agree that benchmarking against the latest Dorado models—specifically rna004_130bps_hac@v5.1.0 and rna004_130bps_sup@v5.1.0, which include built-in modification detection capabilities—would provide valuable context for evaluating the utility of SegPore. However, generating a comprehensive k-mer parameter table for RNA004 requires a large, well-characterized dataset. At present, such data are limited in the public domain. Additionally, Dorado is developed by ONT and its internal training data have not been released, making direct comparisons difficult.
 
 Our current focus is on improving raw signal segmentation quality, which are upstream tasks critical to many downstream analyses, including RNA modification detection. Future work may include benchmarking SegPore against models like Dorado once appropriate data become available.
 
 The Methods and Results sections contain redundant information - please streamline the information in these sections and reduce the redundancy. For example, the benchmarking section may be better situated in the Results section.
 
 Following your advice, we have removed redundant texts about the Segmentation benchmark from Materials and Methods in the revised manuscript.
 
 Minor comments
 
 (1) Introduction
 
 Page 3: "By incorporating these dynamics into its segmentation algorithm...". Please provide an example of how motor protein dynamics can impact RNA translocation. In particular, please elaborate on why motor protein dynamics would impact the translocation of modified ribonucleotides differently to canonical ribonucleotides. This is provided in the results, but please also include details in the Introduction.
 
 Following your advice, we added one sentence to explain how the motor protein affect the translocation of the DNA/RNA molecule in the revised manuscript.
 
 “This observation is also supported by previous reports, in which the helicase (the motor protein) translocates the DNA strand through the nanopore in a back-and-forth manner. Depending on ATP or ADP binding, the motor protein may translocate the DNA/RNA forward or backward by 0.5-1 nucleotides.”
 
 As far as we understand, this translocation mechanism is not specific to modified or unmodified nucleotides. For further details, we refer the reviewer to the original studies cited.
 
 Page 3: "This lack of interpretability can be problematic when applying these methods to new datasets, as researchers may struggle to trust the predictions without a clear understanding of how the results were generated." Please provide details and citations as to why researchers would struggle to trust the predictions of m6Anet. Is it due to a lack of understanding of how the method works, or an empirically demonstrated lack of reliability?
 
 Thank you for pointing this out. The lack of interpretability in deep learning models such as m6Anet stems primarily from their “black-box” nature—they provide binary predictions (modified or unmodified) without offering clear reasoning or evidence for each call.
 
 When we examined the corresponding raw signals, we found it difficult to visually distinguish whether a signal segment originated from a modified or unmodified ribonucleotide. The difference is often too subtle to be judged reliably by a human observer. This is illustrated in the newly added Supplementary Figure S5, which shows Nanopolish-aligned raw signals for the central 5mer GGACT in Figure 4B, displayed both uncolored and colored by modification state (according to the ground truth).
 
 Although deep neural networks can learn subtle, high-dimensional patterns in the signal that may not be readily interpretable, this opacity makes it difficult for researchers to trust the predictions—especially in new datasets where no ground truth is available. The issue is not necessarily an empirically demonstrated lack of reliability, but rather a lack of transparency and interpretability.
 
 We have updated the manuscript accordingly and included Supplementary Figure S5 to illustrate the difficulty in interpreting signal differences between modified and unmodified states.
 
 Page 3: "Instead of relying on complex, opaque features...". Please provide evidence that the research community finds the figures generated by m6Anet to be difficult to interpret, or delete the sections relating to its perceived lack of usability.
 
 See the figure provided in the response to the previous point. We added a reference to this figure in the revised manuscript.
 
 “Instead of relying on complex, opaque features (see Supplementary Figure S5), SegPore leverages baseline current levels to distinguish between…..”
 
 (2) Materials and Methods
 
 Page 5, Preprocessing: "We begin by performing basecalling on the input fast5 file using Guppy, which converts the raw signal data into base sequences.". Please change "base" to ribonucleotide.
 
 Revised as requested.
 
 Page 5 and throughout, please refer to poly(A) tail, rather than polyA tail throughout.
 
 Revised as requested.
 
 Page 5, Signal segmentation via hierarchical Hidden Markov model: "...providing more precise estimates of the mean and variance for each base block, which are crucial for downstream analyses such as RNA modification prediction." Please specify which method your HHMM method improves upon.
 
 Thank you for the suggestion. Since this section does not include a direct comparison, we revised the sentence to avoid unsupported claims. The updated sentence now reads:
 
 "...providing more precise estimates of the mean and variance for each base block, which are crucial for downstream analyses such as RNA modification prediction."
 
 Page 10, GMM for 5mer parameter table re-estimation: "Typically, the process is repeated three to five times until the 5mer parameter table stabilizes." How is the stabilisation of the 5mer parameter table quantified? What is a reasonable cut-off that would demonstrate adequate stabilisation of the 5mer parameter table?
 
 Thank you for the comment. We assess the stabilization of the 5mer parameter table by monitoring the change in baseline values across iterations. If the absolute change in baseline values for all 5mers is less than 1e-5 between two consecutive iterations, we consider the estimation to have stabilized.
 
 Page 11, M6A site level benchmark: why were these datasets selected? Specifically, why compare human and mouse ribonuclotide modification profiles? Please provide a justification and a brief description of the experiments that these data were derived from, and why they are appropriate for benchmarking SegPore.
 
 Thank you for the comment. These data are taken from a previous benchmark studie about m6A estimation from RNA002 data in the literature (https://doi.org/10.1038/s41467-023-37596-5). We think the data are appropreciate here.
 
 Thank you for the comment. The datasets used were taken from a previous benchmark study on m6A estimation using RNA002 data (https://doi.org/10.1038/s41467-023-37596-5). These datasets include human and mouse transcriptomes and have been widely used to evaluate the performance of RNA modification detection tools. We selected them because (i) they are based on RNA002 chemistry, which matches the primary focus of our study, and (ii) they provide a well-characterized and consistent benchmark for assessing m6A detection performance. Therefore, we believe they are appropriate for validating SegPore.
 
 (3) Results
 
 Page 13, RNA translocation hypothesis: "The raw current signals, as shown in Fig. 1B...". Please check/correct figure reference - Figure 1B does not show raw current signals.
 
 Thank you for pointing this out. The correct reference should be Figure 2B. We have updated the figure citation accordingly in the revised manuscript.
 
 Page 19, m6A identification at the site level: "For six selected m6A motifs, SegPore achieved an ROC AUC of 82.7% and a PR AUC of 38.7%, earning the third best performance compared with deep leaning methods m6Anet and CHEUI (Fig. 3D)." SegPore performs third best of all deep learning methods. Do the authors recommend its use in conjunction with m6Anet for m6A detection? Please clarify in the text.
 
 This sentence aims to convey that SegPore alone can already achieve good performance. If interpretability is the primary goal, we recommend using SegPore on its own. However, if the objective is to identify more potential m6A sites, we suggest using the combined approach of SegPore and m6Anet. That said, we have chosen not to make explicit recommendations in the main text to avoid oversimplifying the decision or potentially misleading readers.
 
 Page 19, m6A identification at the single molecule level: "one transcribed with m6A and the other with normal adenosine". I assume that this should be adenine? Please replace adenosine with adenine throughout.
 
 Thank you for pointing this out. We have revised the sentence to use "adenine" where appropriate. In other instances, we retain "adenosine" when referring specifically to adenine bound to a ribose sugar, which we believe is suitable in those contexts.
 
 Page 19, m6A identification at the single molecule level: "We used 60% of the data for training and 40% for testing". How many reads were used for training and how many for testing? Please comment on why these are appropriate sizes for training and testing datasets.
 
 In total, there are 1.9 million reads, with 1.14 million used for training and 0.76 million for testing (60% and 40%, respectively). We chose this split to ensure that the training set is sufficiently large to reliably estimate model parameters, while the test set remains substantial enough to robustly evaluate model performance. Although the ratio was selected somewhat arbitrarily, it balances the need for effective training with rigorous validation.
 
 (4) Discussion
 
 Page 21: "We believe that the de-noised current signals will be beneficial for other downstream tasks." Which tasks? Please list an example.
 
 We have revised the text for clarity as follows:
 
 “We believe that the de-noised current signals will be beneficial for other downstream tasks, such as the estimation of m5C, pseudouridine, and other RNA modifications.”
 
 Page 22: "One can generally observe a clear difference in the intensity levels between 5mers with a m6A and normal adenosine, which is easier for human to interpret if a predicted m6A site is real." This statement is vague and requires qualification. Please reference a study that demonstrates the human ability to interpret two similar graphs, and demonstrate how it relates to the differences observed in your data.
 
 We apologize for the confusion. We have revised the sentence as follows:
 
 “One can generally observe a clear difference in the intensity levels between 5mers with an m6A and those with a normal adenosine, which makes it easier for a researcher to interpret whether a predicted m6A site is genuine.”
 
 We believe that Figures 3A, 3B, and 4B effectively illustrate this concept.
 
 Page 23: How long does SegPore take for its analyses compared to other similar tools? How long would it take to analyse a typical dataset?
 
 We have added run-time statistics for datasets of varying sizes in the revised manuscript (see Supplementary Figure S6). This figure illustrates SegPore’s performance across different data volumes to help estimate typical processing times.
 
 (5) Figures
 
 Figure 4C. Please number the hierachical clusters and genomic locations in this figure. They are referenced in the text.
 
 Following your suggestion, we have labeled the hierarchical clusters and genomic locations in Figure 4C in the revised manuscript.
 
 In addition, we revised the corresponding sentence in the main text as follows: “Biclustering reveals that modifications at g6 are specific to cluster C4, g7 to cluster C5, and g8 to cluster C6, while the first five genomic locations (g1 to g5) show similar modification patterns across all reads.”
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.01.11.575207v4
www.biorxiv.org www.biorxiv.org

Evolution of a fuzzy ribonucleoprotein complex in viral assembly

5
1. Public_Reviews 15 Oct 2025
 
 in eLife
 
 eLife assessment
 
 This is a valuable study that combines a wide range of approaches to provide a biophysical and evolutionary mechanism that could explain why some particular mutations in the SARS-CoV-2 protein N arose during the COVID-19 pandemic. The evidence is solid and relies on multiple experimental approaches. However, some of the results were dependent on extremely high protein concentrations, which may affect certain conclusions.
 
 Summary
2. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors attempted to clarify the impact of N protein mutations on ribonucleoprotein (RNP) assembly and stability using analytical ultracentrifugation (AUC) and mass photometry (MP). These complementary approaches provide a more comprehensive understanding of the underlying processes. Both SV-AUC and MP results consistently showed enhanced RNP assembly and stability due to N protein mutations.
 
 The overall research design appears well planned, and the experiments were carefully executed.
 
 Strengths:
 
 SV-AUC, performed at higher concentrations (3 µM), captured the hydrodynamic properties of bulk assembled complexes, while MP provided crucial information on dissociation rates and complex lifetimes at nanomolar concentrations. Together, the methods offered detailed insights into association states and dissociation kinetics across a broad concentration range. This represents a thorough application of solution physicochemistry.
 
 Weaknesses:
 
 Unlike AUC, MP observes only a part of the solution. In MP, bound molecules are accumulated on the glass surface (not dissociated), thus the concentration in solution should change as time develops. How does such concentration change impact the result shown here?
 
 Review 1
3. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this manuscript, the authors apply a variety of biophysical and computational techniques to characterize the effects of mutations in the SARS-CoV-2 N protein on the formation of ribonucleoprotein particles (RNPs). They find convergent evolution in multiple repeated independent mutations strengthening binding interfaces, compensating for other mutations that reduce RNP stability but which enhance viral replication.
 
 Strengths:
 
 The authors assay the effects of a variety of mutations found in SARS-CoV-2 variants of concern using a variety of approaches, including biophysical characterization of assembly properties of RNPs, combined with computational prediction of the effects of mutations on molecular structures and interactions. The findings of the paper contribute to our increasing understanding of the principles driving viral self-assembly, and increase the foundation for potential future design of therapeutics such as assembly inhibitors.
 
 Weaknesses:
 
 For the most part, the paper is well-written, the data presented support the claims made, and the arguments are easy to follow. However, I believe that parts of the presentation could be substantially improved. I found portions of the text to be overly long and verbose and likely could be substantially edited; the use of acronyms and initialisms is pervasive, making parts of the exposition laborious to follow; and portions of the figures are too small and difficult to read/understand.
 
 Review 2
4. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This manuscript investigates how mutations in the SARS-CoV-2 nucleocapsid protein (N) alter ribonucleoprotein (RNP) assembly, stability, and viral fitness. The authors focus on mutations such as P13L, G214C, and G215C, combining biophysical assays (SV-AUC, mass photometry, CD spectroscopy, EM), VLP formation, and reverse genetics. They propose that SARS-CoV-2 exploits "fuzzy complex" principles, where distributed weak interfaces in disordered regions allow both stability and plasticity, with measurable consequences for viral replication.
 
 Strengths:
 
 (1) The paper demonstrates a comprehensive integration of structural biophysics, peptide/protein assays, VLP systems, and reverse genetics.
 
 (2) Identification of both de novo (P13L) and stabilizing (G214C/G215C) interfaces provides a mechanistic insight into RNP formation.
 
 (3) Strong application of the "fuzzy complex" framework to viral assembly, showing how weak/disordered interactions support evolvability, is a significant conceptual advance in viral capsid assembly.
 
 (4) Overall, the study provides a mechanistic context for mutations that have arisen in major SARS-CoV-2 variants (Omicron, Delta, Lambda) and a mechanistic basis for how mutations influence phenotype via altered biomolecular interactions.
 
 Weaknesses:
 
 (1) The arrangement of N dimers around LRS helices is presented in Figure 1C, but the text concedes that "the arrangement sketched in Figure 1C is not unique" (lines 144-146) and that AF3 modeling attempts yielded "only inconsistent results" (line 149). The authors should therefore present the models more cautiously as hypotheses instead. Additional alternative arrangements should be included in the Supplementary Information, so the readers do not over-interpret a single schematic model.
 
 (2) Negative-stained EM fibrils (Figure 2A) and CD spectra (Figure 2B) are presented to argue that P13L promotes β-sheet self-association. However, the claim could benefit from more orthogonal validation of β-sheet self-association. Additional confirmation via FTIR spectra or ThT fluorescence could be used to further distinguish structured β-sheets from amorphous aggregation.
 
 (3) In the main text, the authors alternate between emphasizing non-covalent effects ("a major effect of the cysteines already arises in reduced conditions without any covalent bonds," line 576) and highlighting "oxidized tetrameric N-proteins of N:G214C and N:G215C can be incorporated into RNPs". Therefore, the biological relevance of disulfide redox chemistry in viral assembly in vivo remains unclear. Discussing cellular redox plausibility and whether the authors' oxidizing conditions are meant as a mechanistic stress test rather than physiological mimicry could improve the interpretation of these results.
 
 The paper could benefit if the authors provide a summary figure or table contrasting reduced vs. oxidized conditions for G214C/G215C mutants (self-association, oligomerization state, RNP stability). Explicitly discuss whether disulfides are likely to form in infected cells.
 
 (4) VLP assays (Figure 7) show little enhancement for P13L or G215C alone, whereas Figure 8 shows that P13L provides clear fitness advantages. This discrepancy is acknowledged but not reconciled with any mechanistic or systematic rationale. The authors should consider emphasizing the limitations of VLP assays and the sources of the discrepancy with respect to Figure 8.
 
 (5) Figures 5 and 6 are dense, and the several overlays make it hard to read. The authors should consider picking the most extreme results to make a point in the main Figure 5 and move the other overlays to the Supplementary. Additionally, annotating MP peaks directly with "2×, 4×, 6× subunits" can help non-experts.
 
 (6) The paper has several names and shorthand notations for the mutants, making it hard to keep up. The authors could include a table that contains mutation keys, with each shorthand (Ancestral, Nο/No, Nλ, etc.) mapped onto exact N mutations (P13L, Δ31-33, R203K/G204R, G214C/G215C, etc.). They could then use the same glyphs (Latin vs Greek) consistently in text and figure labels.
 
 (7) The EM fibrils (Figure 2A) and CD spectra (Figure 2B) were collected at mM peptide concentrations. These are far above physiological levels and may encourage non-specific aggregation. Similarly, the authors mention" ultra-weak binding energies that require mM concentrations to significantly populate oligomers". On the other hand, the experiments with full-length protein were performed at concentrations closer to biologically relevant concentrations in the micromolar range. While I appreciate the need to work at high concentrations to detect weak interactions, this raises questions about physiological relevance. Specifically:
 
 a) Could some of the fibril/β-sheet features attributed to P13L (Figure 2A-C) reflect non-specific aggregation at high concentrations rather than bona fide self-association motifs that could play out in biologically relevant scenarios?
 
 b) How do the authors justify extrapolating from the mM-range peptide behaviors to the crowded but far lower effective concentrations in cells?
 
 The authors should consider adding a dedicated section (either in Methods or Discussion) justifying the use of high concentrations, with estimation of local concentrations in RNPs and how they compare to the in vitro ranges used here. For concentration-dependent phenomena discussed here, it is vital to ensure that the findings are not artefacts of non-physiological peptide aggregation..
 
 Review 3
5. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Author response:
 
 We thank the Reviewers and Editors for their time and insightful comments. We are encouraged by their positive assessment and we look forward to addressing the points raised. Areas of primary concern include (1) the use of high concentrations in peptide experiments; (2) improvement of the presentation and discussion of the results; and (3) clarification of the impact of surface adsorption on the mass photometry analyses.
 
 Regarding (1), we will better explain why some experiments with isolated disordered N-terminal extension were necessarily carried out at high concentrations, in order to demonstrate the potential for these peptides to weakly self-associate. While much lower nucleocapsid protein concentrations are present in the cytosol on average, and are used in our ribonucleoprotein assembly experiments, there are two important physiologically relevant cases where high local concentrations do occur: First, high effective concentrations of tethered disordered N-terminal extensions exist locally in the volume sampled by individual ribonucleoprotein complexes, and, second, high nucleocapsid concentrations are prevalent in its macromolecular condensates. Thus, weak interactions of N-terminal extensions can play a critical role strengthening fuzzy ribonucleoprotein complexes and also altering condensate properties, both of which were confirmed in our experiments. Nonetheless, we do not expect the observed fibrillar state of the concentrated isolated N-terminal peptide to be physiologically relevant, since physiologically they will always remain tethered to the full-length protein impeding fibrillar superstructures.
 
 (2) We are grateful for the Reviewers’ suggestions to enhance the clarity and accessibility of our findings and to streamline the presentation. We intend to tighten up the text and improve figures throughout, and add discussion points, as proposed.
 
 (3) We plan to add an analysis of the extent that irreversible surface adsorption decreases solute concentration in mass photometry, and discuss why this has negligible impact on the conclusions drawn under our experimental conditions.In summary, we agree these points all provide opportunities to strengthen the manuscript further and we are glad to revise our manuscript accordingly.
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.26.650775v2
arxiv.org arxiv.org

Microbiomes Through The Looking Glass

5
1. Public_Reviews 15 Oct 2025
 
 in eLife (unscoped)
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Recommendations for the Authors:
 
 Reviewer #1:
 
 We think that this manuscript brings an important contribution that will be of interest in the areas of statistical physicists, (microbiota) ecology, and (biological) data science. The evidence of their results is solid and the work improves the state-of-the-art in terms of methods. We have a few concerns that, in our opinion, the authors should address.
 
 Major concerns:
 
 (1) While the paper could be of interest for the broad audience of e-Life, the way it is written is accessible mainly to physicists. We encourage the authors to take the broad audience into account by i) explaining better the essence of what is being done at each step, ii) highlighting the relevance of the method compared to other methods, iii) discussing the ecological implications of the results.
 
 Examples on how to approach i) include: Modify or expand Figure 1 so that non-familiar readers can understand the summary of the work (e.g. with cartoons representing communities, diseased states and bacterial interactions and their relationship with the inference method); in each section, summarize at the beginning the purpose of what is going to be addressed in this section, and summarize at the end what the section has achieved; in Figure 2, replace symbols by their meaning as much as possible-the same for Figure 1, at the very least in the figure caption.
 
 Example on how to approach ii): Since the authors aim to establish a bridge between disordered systems and microbiome ecology, it could be useful to expand a bit the introduction on disordered systems for biologists/biophysicists. This could be done with an additional text box, which could also highlight the advantages of this approach in comparison to other techniques (e.g. model-free approaches can also classify healthy and diseased states).
 
 Example on how to approach iii): The authors could discuss with more depth the ecological implications of their results. For example, do they have a hypothesis on why demographic and neutral effects could dominate in healthy patients?
 
 We thank the reviewer for the observations. Following the suggestion in the revised version, each section outlines the goal of what will be addressed in that section, and summarizes what we have achieved at the end; We also updated Figure 1 and Figure 2.
 
 (i) For figure 1, we expanded and hopefully made more clear how we conceptualize the problem, use the data, andestablish our method. In Figure 2, we enriched the y labels of each panel with the name associated with the order parameter.
 
 (ii) We thank the reviewer for helping us improve the readability of the introductory part, thus providing moreinsights into disordered systems techniques for a broader audience. We have added a few explanations at the end of page 2 – to explain the advantages of such methodology compared to other strategies and models.
 
 (iii) We thank the reviewer for raising the need for a more in-depth ecological discussion of our results. A simple wayto understand why neutral effects may dominate in healthy patients is the following. Neutrality implies that species differences are mainly shaped by stochastic processes such as demographic noise, with species treated as different realizations of the same underlying stochastic ecological dynamics. In our analysis, we observe that healthy individuals tend to exhibit highly similar microbial communities, suggesting that the compositional variability among their microbiomes is compatible—at least in part—with the fluctuations expected from demographic stochasticity alone. In contrast, patients with the disease display significantly more heterogeneous microbial compositions. The diversity and structure of their gut communities cannot be satisfactorily explained by neutral demographic fluctuations alone.
 
 This discrepancy implies that additional deterministic forces—such as altered ecological interactions—are driving the divergence observed in dysbiotic states. In diseased individuals, the breakdown of such interactions leads to a structurally distinct regime that may correspond to a phase of marginal stability, as indicated by our theoretical modeling. This shift marks a transition from a community governed by neutrality and demographic noise to one dominated by non-neutral ecological forces (as depicted in Figure 4). We added these comments in the discussion section of the revised manuscript.
 
 (2) Taking into account the broader audience, we invite the authors to edit the abstract, as it seems to jump from one ecological concept to another without explicitly communicating what is the link between these concepts. From the first two sentences, the motivation seems to be species diversity, but no mention of diversity comes after the second sentence. There is no proper introduction/definition of what macroecological states are. After that, the authors switch to healthy and unhealthy states, without previously introducing any link between gut microbiota states and the host’s health (which perhaps could be good in the first or second sentence, although other framings can be as valid). After that, interactions appear in the text and are related to instability, but the reader might not know whether this is surprising or if healthy/unhealthy states are generally related to stability.
 
 We pointed out a few examples, but the authors could extend their revision on i), ii) and iii) beyond such specific comments. In our opinion, this would really benefit the paper.
 
 In response to the reviewer’s concern about conceptual clarity and structure, we substantially revised the abstract to improve its accessibility and logical flow. In the revised abstract, we now clearly link species diversity to microbiome structure and function from the outset, addressing initial confusion. We provide a concise definition of ”macroecological states,” framing them as reproducible statistical patterns reflecting community-level properties. Additionally, the revised version explicitly connects gut microbiome states to host health earlier, resolving the previous abrupt shift in focus. Finally, we conclude by highlighting how disordered systems theory advances our understanding of microbiome stability and functioning, reinforcing the novelty and broader significance of our approach. Overall, the revised abstract better serves a broad interdisciplinary audience, including readers unfamiliar with the technicalities of disordered systems or microbial ecology, while preserving the scientific depth and accuracy of our work
 
 (3) The connection with consumer-resource (CR) models is quite unusual. In Equation (12), why do the authors assume that the consumption term does not depend on R? This should be addressed, since this term is usually dependent on R in microbial ecology models.
 
 In case this is helpful, it is known that the symmetric Lotka-Volterra model emerges from time-scale separation in the MacArthur model, where resources reproduce logistically and are consumed by other species (e.g., plants eaten by herbivores). Consumer-resource models form a broad category, while the MacArthur model is a specific case featuring logistic resource growth. For microbes, a more meaningful justification of the generalized Lotka-Volterra (GLV) model from a consumer-resource perspective involves the consumer-resource dynamics in a chemostat, where time-scale separation is assumed and higher-order interactions are neglected. See, for example: a) The classic paper by MacArthur: R. MacArthur. Species packing and competitive equilibrium for many species. Theoretical Population Biology, 1(1):1-11, 1970. b) Recent works on time-scale separation in chemostat consumer-resource models: Anna Posfai et al., PRL, 2017 Sireci et al., PNAS, 2023 Akshit Goyal et al., PRX-Life, 2025
 
 We thank the reviewer for the observation. We apologize for the typo that appeared in the main text and that we promptly corrected. The Consumers-Resources model we had in mind is the classical case proposed by MacArthur, where resources are self-regulated according to a logistic growth mechanism, which leads to the generalized LotkaVolterra model we employ in our work.
 
 Minor concerns:
 
 (1) The title has a nice pun for statistical physicists, but we wonder if it can be a bit confusing for the broader audience of e-Life. Although we leave this to the author’s decision, we’d recommend considering changing the title, making it more explicit in communicating the main contribution/result of the work.
 
 Following the reviewer’s suggestion, we have introduced an explanatory subtitle: “Linking Species Interactions to Dysbiosis through a Disordered Lotka-Volterra Framework”.
 
 (2) Review the references - some preprints might have already been published: Pasqualini J. 2023, Sireci 2022, Wu 2021.
 
 We thank the reviewer for pointing our attention to this inaccuracy. We updated the references to Pasqualini and Sireci papers. To our knowledge, Wu’s paper has appeared as an arXiv preprint only.
 
 (3) Species do not generally exhibit identical carrying capacities (see Grilli, Nat. Commun., 2020; some taxa are generally more abundant than others. The authors could discuss whether the model, with the inferred parameters, can accurately reproduce the distribution of species’ mean abundances.
 
 We thank the reviewer for this insightful comment. As discussed in the revised manuscript (lines 294–299), our current model does not accurately reproduce the empirical species abundance distribution (SAD). This limitation stems from the assumption of constant carrying capacities across species. While empirical observations (e.g., Grilli et al., Nat. Commun., 2020 [1]) show heterogeneous mean abundances often following power-law or log-normal distributions. However, our model assumes constant carrying capacity, resulting in SADs devoid of fat tails, which diverge from empirical data.
 
 This simplification is implemented to maintain the analytical tractability of the disordered generalized Lotka-Volterra (dGLV) framework, a common approach also found in prior works such as Bunin (2017) and Barbier et al. (2018) [2, 3]. Introducing heterogeneity in carrying capacities, such as drawing them from a log-normal distribution, or switching to multiplicative (rather than demographic) noise, could indeed produce SADs that better align with empirical data. Nevertheless, implementing changes would significantly complicate the analytical treatment.
 
 We acknowledge these directions as promising avenues for future research. They could help enhance the empirical realism of the model and its capacity to capture observed macroecological patterns while posing new theoretical challenges for disordered systems analysis
 
 (4) A substantial number of cited works (Grilli, Nat. Commun., 2020; Zaoli & Grilli, Science Advances, 2021; Sireci et al., PNAS, 2023; Po-Yi Ho et al., eLife, 2022) suggest that environmental fluctuations play a crucial role in shaping microbiome composition and dynamics. Is the authors’ analysis consistent with this perspective? Do they expect their conclusions to remain robust if environmental fluctuations are introduced?
 
 We thank the reviewer for stressing this point. The introduction of environmental fluctuations in the model formally violates detailed balance, thereby preventing the definition of an energy function. To date, no study has integrated random interactions together with both demographic and environmental noise within a unified analytical framework. This is certainly a highly promising direction that some of the authors are already exploring. However, given the inherently out-of-equilibrium nature of the system and the absence of a free energy, we would need to adopt a Dynamical Mean-Field Theory formalism and eventually analyze the corresponding stationary equations to be solved self-consistently. We added, however, a brief note in the Discussion section.
 
 (5) The term “order parameters“ may not be intuitive for a biological audience. In any case, the authors should explicitly define each order parameter when first introduced.
 
 We thank the reviewer for the comment. We introduced the names of the order parameters as soon as they are introduced, along with a brief explanation of their meaning that may be accessible to an audience with biological background.
 
 (6) Line 242: Should ψU be ψD?
 
 We thank the reviewer for the observation. We corrected the typo.
 
 (7) Given that the authors are discussing healthy and diseased states and to avoid confusion, the authors could perhaps use another word for ’pathological’ when they refer to dynamical regimes (e.g., in Appendix 2: ’letting the system enter the pathological regime of unbounded growth’).
 
 We thank the reviewer for the helpful comment. As suggested, we used the term “unphysical” instead of “pathological” where needed.
 
 Reviewer #2:
 
 (1) A technical point that I could not understand is how the authors deal with compositional data. One reason for my confusion is that the order parameters h and q0 are fixed n data to 1/S and 1/S2, and thus I do not see how they can be informative. Same for carrying capacity, why is it not 1 if considering relative abundance?
 
 We thank the reviewer for raising this point. We acknowledge that the treatment of compositional data and the interpretation of order parameters h and q0 were not sufficiently clarified in the manuscript. Additionally, there was an imprecision in the text regarding the interpretation of these parameters.
 
 As defined in revised Eq. (4) of the manuscript, h and q0 are to be averaged over the entire dataset, summing across samples α. Specifically, and , where Sα is the number of species present in sample α and is the average over samples. These parameters are therefore informative, as they encapsulate sample-level ecological diversity, and their variation reflects biological differences between healthy and diseased states. For instance, Pasqualini et al., 2024 [4] reported significant differences in these metrics between health conditions, thereby supporting their ecological relevance.
 
 Regarding carrying capacities, we clarify that although we work with relative abundance data (i.e., compositional data), we do not fix the carrying capacity K to 1. Instead, we set K to the maximum value of xi (relative abundance) within each sample, to preserve compatibility with empirical data and allow for coexistence. While this remains a modeling assumption, it ensures better ecological realism within the constraints of the disordered GLV framework.
 
 (2) Obviously I’m missing something, so it would be nice to clarify in simple terms the logic of the argument. I understand that Lagrange multipliers are going to be used in the model analysis, and there are a lot of technical arguments presented in the paper, but I would like a much more intuitive explanation about the way the data can be used to infer order parameters if those are fixed by definition in compositional data.
 
 We thank the reviewer for the observation. The order parameters can be measured directly from the data, even in the presence of compositionality, as explained above. We can connect those parameters with the theory even for compositional data, because the only effect of adding the compositionality constraint is to shift the linear coefficient in the Hamiltonian, which corresponds to shifting the average interaction µ. However, the resulting phase diagram is mostly affected by the variance of the interactions σ2 (as µ is such that we are in the bounded phase).
 
 (3) Another point that I did not understand comes from the fact that the authors claim that interaction variance is smaller in unhealthy microbiomes. Yet they also find that those are closer to instability, and are more driven by niche processes. I would have expected the opposite to be true, more variance in the interactions leading to instability (as in May’s original paper for instance). Is this apparent paradox explained by covariations in demographic stochasticity (T) and immigration rate (lambda)? If so, I think it would be very useful to comment on that.
 
 As Altieri and coworkers showed in their PRL (2021) [5], the phase diagram of our model differs fundamentally from that of Biroli et al. (2018) [6]. In the latter, the intuitive rule – greater interaction variance yields greater instability – indeed holds. For the sake of clarity, we have attached below the resulting phase diagram obtained by Altieri et al.
 
 The apparent paradox arises because the two phase diagrams are tuned by different parameters. Consequently, even at low temperature and with weak interaction variance, our system may sit nearer to the replica-symmetrybreaking (RSB) line.
 
 Fig. 3 in the main text it is not a (σ,T) phase diagram where all other parameters are kept constant. Rather, it is a plot of the inferred σ and T parameters from the data (without showing the corresponding µ).
 
 To capture the full, non-trivial influence of all parameters on stability, we studied the so-called “replicon eigenvalue” in the RS (i.e. single equilibrium) approximation. This leading eigenvalue measures how close a given set of inferred parameters – and hence a microbiome – is to the RSB threshold. For a visual representation of these findings, refer to Figure 4.
 
 Author response image 1.
 
 (4) What do the empirical SAD look like? It would be nice to see the actual data and how the theoretical SADs compare.
 
 The empirical species abundance distributions (SADs) analyzed in our study are presented and discussed in detail in Pasqualini et al., 2024 [4]. Given the overlap in content, we chose not to reproduce these figures in the current manuscript to avoid redundancy.
 
 As we also clarify in the revised text, the theoretical SAD is derived from the disordered generalized Lotka-Volterra (dGLV) model in the unique fixed point phase typically exhibit exponential tails. These distributions do not match the heavier-tailed patterns (e.g., log-normal or power-law-like) observed in empirical microbiome data. This discrepancy stems from the simplifying assumptions of the dGLV framework, including the use of constant carrying capacities and demographic noise.
 
 In the revised manuscript, we have added a brief discussion in the revised manuscript to explicitly acknowledge this limitation and emphasize it as a direction for future refinement of the model, such as incorporating heterogeneous carrying capacities or exploring alternative noise structures.
 
 (5) Some typos: often “niche” is written “nice”.
 
 We thank the reviewer for this suggestion. After inspecting the text, we corrected the reported typos.
 
 Reviewer #3:
 
 Major comments:
 
 (1) In the S3 text, the authors say that filtered metagenomic reads were processed using the software Kaiju. The description of the pipeline does not mention how core genes were selected, which is often a crucial step in determining the abundance of a species in a metagenomic sample. In addition, the senior author of this manuscript has published a version of Kaiju that leverages marker genes classification methods (deemed Core-Kaiju), but it was not used for either this manuscript or Pasqualini et al. (2014; Tovo et al., 2020). I am not suggesting that the data necessarily needs to be reprocessed, but it would be useful to know how core genes were chosen in Pasqualini et al. and why Core-Kaiju was not used (2014).
 
 Prior to the current manuscript and the PLOS Computational Biology paper by Pasqualini et al. [4], we applied the core-Kaiju protocol to the same dataset used in both studies. However, this tool was originally developed and validated using general catalogs of culturable organisms, not specifically tuned for gut microbiomes. As a result, we have realized that in many samples Core Kajiu would filter only very few species (in some samples, the number of identified species was as low as 5–10), undermining the reliability of the analysis. Due to these limitations, we opted to use the standard Kaiju version in our work. We are actively developing an improved version of the core-Kaiju protocol that will overcome the discussed limitations and preliminary results (not shown here) indicate the robustness of the obtained patterns also in this case.
 
 (2) My understanding of Pasqualini et al. was that diseased patients experienced larger fluctuations in abundance, while in this study, they had smaller fluctuations (Figure 3a; 2024). Is this a discrepancy between the two models or is there a more nuanced interpretation?
 
 We thank the reviewer for the observation. This is only an apparent discrepancy, as the term fluctuation has different meanings in the two contexts. The fluctuations referred to by the reviewer correspond to a parameter of our theory—namely, noise in the interactions. Conversely, in Pasqualini et al. σ indicates environmental fluctuations. Nevertheless, there is no conceptual discrepancy in our results: in both studies, unhealthy microbiomes were found to be less stable. In fact, also in this study, notably Fig. 4, shows that unhealthy microbiomes lie closer to the RSB line, a phenomenon that is also associated with enhanced fluctuations.
 
 (3) Line 38-41: It would be helpful to explicitly state what “interaction patterns” are being referenced here. The final sentence could also be clarified. Do microbiomes “host“ interactions or are they better described as a property (“have”, “harbor”). The word “host” may confuse some readers since it is often used to refer to the human host. I am also not sure what point is being made by “expected to govern natural ones”. There are interactions between members of a microbiome; experimental studies have characterized some of these interactions, which we expect to relate in some way to interactions in nature. Is this what the authors are saying?
 
 Thanks. We agree that this sentence was not clear. Indeed, we are referring to pairwise species interactions and not to host-microbiome interactions. We have rewritten this part in the following way: In fact, recent work shows that the network-level properties of species-species interactions —for example, the sign balance, average strength, and connectivity of the inferred interaction matrix— shift systematically between healthy and dysbiotic gut communities (see for instance, [7, 8]). Pairwise species interactions have been quantified in simplified in-vitro consortia [9, 10]; we assume that the same classes of interactions also operate—albeit in a more complex form—in the native gut microbiome.
 
 (4) Line 43: I appreciate that the authors separated neutral vs. logistic models here.
 
 (5) Lines 51-75: The framing here is well-written and convincing. Network inference is an ongoing, active subject in ecology, and there is an unfortunate focus on inferring every individual interaction because ecologists with biology backgrounds are not trained to think about the problem in the language of statistical physics.
 
 We thank the reviewer for these positive comments.
 
 (6) Line 87: Perhaps I’m missing something obvious, but I don’t see how ρi sets the intrinsic timescale of the dynamics when its units are 1/(time*individuals), assuming the dimensions of ri are inverse time.
 
 We thank the reviewer for the observation. We corrected this phrase in the main text.
 
 (7) Lines 189-190: “as close as possible to the data” it would aid the reader if you specified the criteria meant by this statement.
 
 We thank the reviewer for the observation. We removed the sentence, as it introduced some redundancy in our argument. In the subsequent text, the proposed method is exposed in details.
 
 (8) Line 198: It would aid the reader if you provided some context for what the T - σ plane represents.
 
 We thank the referee for the helpful indication. Indeed, we have better clarified the mutual role of the demographic noise amplitude and strength of the random interaction matrix, as theoretically predicted in the PRL (2021) by Altieri and coworkers [5]. Please, find an additional paragraph on page 6 of the resubmitted version.
 
 (9) Line 217: Specifying what is meant by “internal modes“ would aid the typical life science reader.
 
 We thank the reviewer for the suggestion. Recognizing that referring to “internal modes” to describe the SAD shape in that context might cause confusion, we replaced “internal modes“ with “peaks”.
 
 (10) Line 219: Some additional justification and clarification are needed here, as some may think of “m“ as being biomass.
 
 We added a sentence to better explain this concept. “In classical and quantum field theory, the particle-particle interaction embedded in the quadratic term is typically referred to as a mass source. In the context of this study, captures quadratic fluctuations of species abundances, as also appearing in the expression of the leading eigenvalue of the stability matrix.”
 
 Minor comments:
 
 (1) I commend the authors for removing metagenomic reads that mapped to the human genome in the preprocessing stage of their pipeline. This may seem like an obvious pre-processing step, but it is unfortunately not always implemented.
 
 We thank the referee for pointing this potential issue. The data used in this work, as well as the bioinformatic workflow used to generate them has been described in detail in Pasqualini et al., 2024 [4]. As one of the main steps for preprocessing, we remove reads mapping to the human genome.
 
 (2) Line 13: “Bacterial“ excludes archaea, and while you may not have many high-abundance archaea in your human gut data, this sentence does not specify the human gut. Usually, this exclusion is averted via the term “microbial“, though sometimes researchers raise objections to the term when the data does not include fungal members (e.g., all 16S studies).
 
 We thank the reviewer for this suggestion. As to include archaeal organisms, we adopt the term “microbial“ instead of “bacterial“.
 
 (3) Line 18: This manuscript is being submitted under the “Physics of Living Systems“ tract, but it may be useful to explicitly state in the Abstract that disordered systems are a useful approach for understanding large, complex communities for the benefit of life science researchers coming from a biology background.
 
 Thank. We have modified the abstract following this suggestion.
 
 (4) Line 68: Consider using “adapted“ or something similar instead of “mutated“ if there is no specific reason for that word choice.
 
 We thank the reviewer for this suggestion, which was implemented in the text.
 
 (5) Line 111: It would be useful to define annealed and quenched for a general life science audience.
 
 We thank the reviewer for this suggestion. In the “Results” section, we have opted for “time-dependent disordered interactions” to reach a broader audience and avoid any jargon. Moreover, in the Discussion we added a detailed footnote: “In contrast to the quenched approximation, the annealed version assumes that the random couplings are not fixed but instead fluctuate over time, with their covariance governed by independent Ornstein–Uhlenbeck processes.”
 
 (6) Line 124: Likewise for the replicon sector.
 
 We thank the reviewer for the suggestion. We added a footnote on page 4, after the formula, to highlight the physical intuition behind the introduction of the replicon mode.
 
 “The replicon eigenvalue refers to a particular type of fluctuation around the saddle-point (mean-field) solution within the replica framework. When the Hessian matrix of the replicated free energy is diagonalized, fluctuations are divided into three sectors: longitudinal, anomalous, and replicon. The replicon mode is the most sensitive to criticality signaling – by its vanishing trend – the emergence of many nearly-degenerate states. It essentially describes how ‘soft’ the system is to microscopic rearrangements in configuration space.”
 
 (7) Figure 2: It would be helpful to include y-axis labels for each order parameter alongside the mathematical notation.
 
 We thank the reviewer for this suggestion. Now the y-axis of Figure 2 includes, along the mathmetical symbol, the label of the represented quantities.
 
 (8) Line 242: Subscript “U” is used to denote “Unhealthy” microbiomes, but “D” is used to denote “Diseased” in Figs. 2 and 3 (perhaps elsewhere as well).
 
 We thank the reviewer for this observation. After checking the various subscripts in the text, coherently with figure 2 and 3, we homogenized our notation, adopting the subscript “D“ for symbols related to the diseased/unhealthy condition.
 
 (9) Line 283: “not to“ should be “not due to“
 
 We thank the reviewer for this suggestion. After inspecting the text, we corrected the reported error.
 
 (10) Equations 23, 34: Extra “=“ on the RHS of the first line.
 
 We consistently follow the same formatting across all the line breaks in the equations throughout the text.
 
 We are thus resubmitting our paper, hoping to have satisfactorily addressed all referees’ concerns.
 
 References
 
 (1) Jacopo Grilli. Macroecological laws describe variation and diversity in microbial communities. Nature communications, 11(1):4743, 2020.
 
 (2) Guy Bunin. Ecological communities with lotka-volterra dynamics. Physical Review E, 95(4):042414, 2017.
 
 (3) Matthieu Barbier, Jean-Franc¸ois Arnoldi, Guy Bunin, and Michel Loreau. Generic assembly patterns in complex ecological communities. Proceedings of the National Academy of Sciences, 115(9):2156–2161, 2018.
 
 (4) Jacopo Pasqualini, Sonia Facchin, Andrea Rinaldo, Amos Maritan, Edoardo Savarino, and Samir Suweis. Emergent ecological patterns and modelling of gut microbiomes in health and in disease. PLOS Computational Biology, 20(9):e1012482, 2024.
 
 (5) Ada Altieri, Felix Roy, Chiara Cammarota, and Giulio Biroli. Properties of equilibria and glassy phases of the random lotka-volterra model with demographic noise. Physical Review Letters, 126(25):258301, 2021.
 
 (6) Giulio Biroli, Guy Bunin, and Chiara Cammarota. Marginally stable equilibria in critical ecosystems. New Journal of Physics, 20(8):083051, 2018.
 
 (7) Amir Bashan, Travis E Gibson, Jonathan Friedman, Vincent J Carey, Scott T Weiss, Elizabeth L Hohmann, and Yang-Yu Liu. Universality of human microbial dynamics. Nature, 534(7606):259–262, 2016.
 
 (8) Marcello Seppi, Jacopo Pasqualini, Sonia Facchin, Edoardo Vincenzo Savarino, and Samir Suweis. Emergent functional organization of gut microbiomes in health and diseases. Biomolecules, 14(1):5, 2023.
 
 (9) Jared Kehe, Anthony Ortiz, Anthony Kulesa, Jeff Gore, Paul C Blainey, and Jonathan Friedman. Positive interactions are common among culturable bacteria. Science advances, 7(45):eabi7159, 2021.
 
 (10) Ophelia S Venturelli, Alex V Carr, Garth Fisher, Ryan H Hsu, Rebecca Lau, Benjamin P Bowen, Susan Hromada, Trent Northen, and Adam P Arkin. Deciphering microbial interactions in synthetic human gut microbiome communities. Molecular systems biology, 14(6):e8157, 2018.
 
 AuthorResponse
2. Public_Reviews 13 Oct 2025
 
 in eLife (unscoped)
 
 eLife Assessment
 
 This important study shows how the relative importance of inter-species interactions in microbiomes can be inferred from empirical species abundance data. The methods based on statistical physics of disordered systems are compelling and rigorous, and allow for distinguishing healthy and non-healthy human gut microbiomes via differences in their inter-species interaction patterns. This work should be of broad interest to researchers in microbial ecology and theoretical biophysics.
 
 Summary
3. Public_Reviews 13 Oct 2025
 
 in eLife (unscoped)
 
 Reviewer #1 (Public review):
 
 Summary:
 
 In this manuscript, the authors develop a novel method to infer ecologically-informative parameters across healthy and diseased states of the gut microbiota, although the method is generalizable to other datasets for species abundances. The authors leverage techniques from theoretical physics of disordered systems to infer different parameters-mean and standard deviation for the strength of bacterial interspecies interactions, a bacterial immigration rate, and the strength of demographic noise-that describe the statistics of microbiota samples from two groups-one for healthy subjects and another one for subjects with chronic inflammation syndromes. To do this, the authors simulate communities with a modified version of the Generalized Lotka-Volterra model and randomly-generated interactions, and then use a moment-matching algorithm to find sets of parameters that better reproduce the data for species abundances. They find that these parameters are different for the healthy and diseased microbiota groups. The results suggest, for example, that bacterial interaction strengths, relative to noise and immigration, are more dominant of microbiota dynamics in diseased states than in healthy states.
 
 We think that this manuscript brings an important contribution that will be of interest in the areas of statistical physics, (microbiota) ecology and (biological) data science. The evidence of their results is solid and the work improves the state-of-the-art in terms of methods.
 
 Strengths:
 
 Using a fairly generic ecological model, the method can identify the change in the relative importance of different ecological forces (distribution of interspecies interactions, demographic noise and immigration) in different sample groups. The authors focus on the case of the human gut microbiota, showing that the data is consistent with a higher influence of species interactions (relative to demographic noise and immigration) in a disease microbiota state than in healthy ones.
 
 The method is novel, original and it improves the state-of-the-art methodology for the inference of ecologically-relevant parameters. The analysis provides solid evidence on the conclusions.
 
 Weaknesses:
 
 As a proof of concept for a new inference method, this text maintains a technical focus, which may require some familiarity with statistical physics. Nevertheless, the authors' clear introduction of key mathematical terms and their interpretations, along with a clear discussion of the ecological implications, make the results accessible and easy to follow.
 
 Review 1
4. Public_Reviews 13 Oct 2025
 
 in eLife (unscoped)
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This valuable work aims to infer, from microbiome data, microbial species interaction patterns associated with healthy and unhealthy human gut microbiomes. Using solid techniques from statistical physics, the authors propose that healthy and unhealthy microbiome interaction patterns substantially differ. Unhealthy microbiomes are closer to instability and single-strain dominance; whereas healthy microbiomes showcase near-neutral dynamics, mostly driven by demographic noise and immigration.
 
 Strengths:
 
 This is a well-written article, relatively easy to follow and transparent despite the high degree of technicality of the underlying theory. The authors provide a powerful inferring procedure, which bypasses the issue of having only compositional data. This work shows that embracing the complexity of microbial systems can be used to our advantage, instead of being an insurmountable obstacle. This is a powerful counterpoint to the classic reductionist view that pushes researchers to study much simpler systems, and only hope to one day scale up their findings.
 
 Weaknesses:
 
 As acknowledged by the authors themselves, this is only a proof of concept. Further research is to better understand the dynamical nature of gut-microbiomes. The authors do however point at ways in which species abundance distributions could be better reproduced by dynamical models. They also suggest that they work could explain prior empirical findings invoking the "Anna Karenina principle", where healthy microbiomes resemble one another, but disease states tend to all differ.
 
 Review 2
5. Public_Reviews 13 Oct 2025
 
 in eLife (unscoped)
 
 Reviewer #3 (Public review):
 
 Summary:
 
 I found the manuscript to be well-written. I have a few questions regarding the model, though the bulk of my comments are requests to provide definitions and additional clarity. There are concepts and approaches used in this manuscript that are clear boons for understanding the ecology of microbiomes but are rarely considered by researchers approaching the manuscript from a traditional biology background. The authors have clearly considered this in their writing of S1 and S2, so addressing these comments should be straightforward. The methods section is particularly informative and well-written, with sufficient explanations of each step of the derivation that should be informative to researchers in the microbial life sciences that are not well-versed with physics-inspired approaches to ecology dynamics.
 
 Strengths:
 
 The modeling efforts of this study primarily rely on a disordered for of the generalized Lotka-Volterra (gLV) model. This model can be appropriate for investigating certain systems and the authors are clear about when and how more mechanistic models (i.e., consumer-resource) can lead to gLV. Phenomenological models such as this have been found to be highly useful for investigating the ecology of microbiomes, so this modeling choice seems justified, and the limitations are laid out.
 
 Weaknesses:
 
 The authors use metagenomic data of diseased and healthy patients that was first processed in Pasqualini et al. (2024). The use of metagenomic data leads me into a question regarding the role of sampling effort (i.e., read counts) in shaping model parameters such as $h$. This parameter is equal to the average of 1/# species across samples because the data are compositional in nature. My understanding is that $h$ was calculated using total abundances (i.e., read counts). The number of observed species is strongly influenced by sampling effort and the authors addressed this point in their revised manuscript.
 
 However, the role of sampling effort can depend on the type of data and my instinct about the role that sampling effort plays in species detection is primarily based on 16S data. The dependency between these two variables may be less severe for the authors' metagenomic pipeline. This potential discrepancy raises a broader issue regarding the investigation of microbial macroecological patterns and the inference of ecological parameters. Often microbial macroecology researchers rely on 16S rRNA amplicon data because that type of data is abundant and comparatively low-cost. Some in microbiology and bioinformatics are increasingly pushing researchers to choose metagenomics over 16S. Sometimes this choice is valid (discovery of new MAGs, investigate allele frequency changes within species, etc.), sometimes it is driven by the false equivalence "more data = better". The outcome though is that we have a body of more-or-less established microbial macroecological patterns which rest on 16S data and are now slowly incorporating results from metagenomics. To my knowledge there has not been a systematic evaluation of the macroecological patterns that do and do not vary by one's choice in 16S vs. metagenomics. Several of the authors in this manuscript have previously compared the MAD shape for 16S and metagenomic datasets in Pasqualini et al., but moving forward a more comprehensive study seems necessary (2024). These points were addressed by the authors in their revised manuscript.
 
 Final review: The authors addressed all comments and I have no additional comments.
 
 References
 
 Pasqualini, Jacopo, et al. "Emergent ecological patterns and modelling of gut microbiomes in health and in disease." PLOS Computational Biology 20.9 (2024): e1012482.
 
 Review 3
Visit annotations in context

Tags

Review 2

AuthorResponse

Review 3

Summary

Review 1

Annotators

Public_Reviews

URL

arxiv.org/abs/2406.07465v2
www.biorxiv.org www.biorxiv.org

A recursive pathway for isoleucine biosynthesis arises from enzyme promiscuity

4
1. Public_Reviews 15 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  The study reports a potential pathway for isoleucine biosynthesis mediated by the underground activity of AHASII, which converts glyoxylate and pyruvate to 2-ketobutyrate. While the findings are valuable in revealing a possible alternative route for isoleucine production, the evidence presented remains incomplete. More comprehensive biochemical experiments are required to substantiate the physiological feasibility of this pathway.
  
  Summary
2. Public_Reviews 15 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  As presented in this short report, the focus is to only establish that acetohydroxyacid synthase II can have underground activity to generate 2-ketobutyrate (from glyoxylate and pyruvate). Additionally, the gene that encodes this protein has an inactivating point mutation in the lab strain of E. coli. In strains lacking the conventional Ile biosynthesis pathway, this enzyme gets reactivated (after short-term laboratory evolution) and putatively can contribute to producing sufficient 2-ketobutyrate, which can feed into Ile production. This is clearly a very interesting observation and finding, and the paper focuses on this single point.
  
  However, the manuscript as it currently stands is 'minimal', and just barely shows that this reaction/pathway is feasible. There is no characterization of the restored enzyme's activity, rate, or specificity. Additionally, there is no data presented on how much isoleucine can be produced, even at saturating concentrations of glyoxylate or pyruvate. This would greatly benefit from more rigorous characterization of this enzyme's activity and function, as well as better demonstration of how effective this pathway is in generating 2-ketobutyrate (and then its subsequent condensation with pyruvate).
  
  Review 1
3. Public_Reviews 15 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The manuscript by Rainaldi et al. reports a new sub-pathway for isoleucine biosynthesis by demonstrating the promiscuous activity of the native enzyme acetohydroxyacid synthase II (AHAS II). AHAS-II is primarily known to catalyze the condensation of 2-ketobutyrate (2KB) with pyruvate to form a further downstream intermediate, AHB, in the isoleucine biosynthesis pathway. However, the catalysis of pyruvate and glyoxylate condensation to produce 2KB via the ilvG encoded AHAS II is reported in this manuscript for the first time.
  
  Using an isoleucine/2KB auxotrophic E. coli strain, the authors report (i) repair of the inactivating frameshift mutation in the ilvG gene, which encodes AHAS-II, supports growth in glyoxylate-supplemented media, (ii) the promiscuity of AHAS-II in glyoxylate and pyruvate condensation, resulting in the formation of isoleucin precursors (2-KB), aiding the biosynthesis of isoleucine, and (iii) comparable efficiency of the recursive AHAS-II route to the canonical routes of isoleucin biosynthesis via computational Flux-based analysis.
  
  Strengths:
  
  The authors have used laboratory evolution to uncover a non-canonical metabolic route. The metabolomics and FBA have been used to strengthen the claim.
  
  Weaknesses:
  
  While the manuscript proposes an interesting metabolic route for the isoleucine biosynthesis, the data lack key controls, biological replicates, and consistency. The figures and methods are presented inadequately. In the current state, the data fails to support the claims made in the manuscript.
  
  Review 2
4. Public_Reviews 15 Oct 2025
  
  in eLife
  
  Author response:
  
  We gratefully acknowledge the comments on our manuscript and the time you took to read and understand our work. Nevertheless, it is the opinion of these authors that the evidence provided in the submitted paper is strong and we performed multiple replicates of the experiments. In particular, gene deletion and complementation is the accepted gold standard for studies in physiology. In the isoleucine auxotroph (IMaux) strain carrying an ilvG deletion, growth is only possible if ilvG is reintroduced on a plasmid and induced. Additionally, isotopic labeling clearly demonstrates the activity of the proposed pathway. Regardless, we agree with the reviewers that the paper and the scientific community would benefit from an in vitro characterization of the promiscuity of IlvG, so we will perform this experiment and resubmit the paper for further revision, and in this revision also provide more detail on the replicates performed.
  
  AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.26.672309v1
www.biorxiv.org www.biorxiv.org

Postural adaptations may contribute to the unique locomotor energetics seen in hopping kangaroos

3
1. Public_Reviews 15 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This valuable biomechanical analysis of kangaroo kinematics and kinetics across a range of hopping speeds and masses is a step towards understanding a long-standing problem in locomotion biomechanics: the mechanism for how kangaroos, unlike other mammals, can increase hopping speed without a concomitant increase in metabolic cost. The authors convincingly demonstrate that changes in kangaroo posture with speed increase tendon stress/strain and hence elastic energy storage/return. This greater tendon elastic energy storage/return may counteract the increased cost of generating muscular force at faster speeds and thus allows for the invariance in metabolic cost. This methodologically impressive study sets the stage for further work to investigate the relation of hopping speed to metabolic cost more definitively.
 
 Summary
2. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The study explored the biomechanics of kangaroo hopping across both speed and animal size to try and explain the unique and remarkable energetics of kangaroo locomotion.
 
 Strengths:
 
 Brings kangaroo locomotion biomechanics into the 21st century. Remarkably difficult project to accomplish. Excellent attention to detail. Clear writing and figures.
 
 General Comments
 
 This is a very impressive tour de force by an all-star collaborative team of researchers. The study represents a tremendous leap forward (pun intended) in terms of our understanding of kangaroo locomotion. Some might wonder why such an unusual species is of much interest. But, in my opinion, the classic study by Dawson and Taylor in 1973 of kangaroos launched the modern era of running biomechanics/energetics and applies to varying degrees to all animals that use bouncing gaits (running, trotting, galloping and of course hopping). The puzzling metabolic energetics findings of Dawson & Taylor (little if any increase in metabolic power despite increasing forward speed) remain a giant unsolved problem in comparative locomotor biomechanics and energetics. It is our "dark matter problem".
 
 This study is certainly a hop towards solving the problem. The study clearly shows that the ankle and to a lesser extent the mtp joint are where the action is. They show in great detail by how much and by what means the ankle joint tendons experience increased stress at faster forward speeds. Since these were zoo animals, direct measures were not feasible, but the conclusion that the tendons are storing and returning more elastic energy per hop at faster speeds is solid. The conclusion that net muscle work per hop changes little from slow to fast forward speeds is also solid. Doing less muscle work can only be good if one is trying to minimize metabolic energy consumption. However, to achieve the greater tendon stresses, there must be greater muscle forces. Unless one is willing to reject the premise of the cost of generating force hypothesis, that is an important issue to confront. Further, the present data support the Kram & Dawson finding of decreased contact times at faster forward speeds. Kram & Taylor and subsequent applications of (and challenges to) their approach support the idea that shorter contact times (tc) require recruiting more expensive muscle fibers and hence greater metabolic costs. The present authors have clarified that this study has still not tied up the metabolic energetics across speed problem and they now point out how the group is now uniquely and enviably poised to explore the problem more using a dynamic SIMM model that incorporates muscle energetics.
 
 Review 1
3. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public Review):
 
 Summary:
 
 The study explored the biomechanics of kangaroo hopping across both speed and animal size to try and explain the unique and remarkable energetics of kangaroo locomotion.
 
 Strengths:
 
 The study brings kangaroo locomotion biomechanics into the 21st century. It is a remarkably difficult project to accomplish. There is excellent attention to detail, supported by clear writing and figures.
 
 Weaknesses:
 
 The authors oversell their findings, but the mystery still persists.
 
 The manuscript lacks a big-picture summary with pointers to how one might resolve the big question.
 
 General Comments
 
 This is a very impressive tour de force by an all-star collaborative team of researchers. The study represents a tremendous leap forward (pun intended) in terms of our understanding of kangaroo locomotion. Some might wonder why such an unusual species is of much interest. But, in my opinion, the classic study by Dawson and Taylor in 1973 of kangaroos launched the modern era of running biomechanics/energetics and applies to varying degrees to all animals that use bouncing gaits (running, trotting, galloping and of course hopping). The puzzling metabolic energetics findings of Dawson & Taylor (little if any increase in metabolic power despite increasing forward speed) remain a giant unsolved problem in comparative locomotor biomechanics and energetics. It is our "dark matter problem".
 
 Thank you for the kind words.
 
 This study is certainly a hop towards solving the problem. But, the title of the paper overpromises and the authors present little attempt to provide an overview of the remaining big issues.
 
 We have modified the title to reflect this comment. “Postural adaptations may contribute to the unique locomotor energetics seen in hopping kangaroos”
 
 The study clearly shows that the ankle and to a lesser extent the mtp joint are where the action is. They clearly show in great detail by how much and by what means the ankle joint tendons experience increased stress at faster forward speeds.
 
 Since these were zoo animals, direct measures were not feasible, but the conclusion that the tendons are storing and returning more elastic energy per hop at faster speeds is solid. The conclusion that net muscle work per hop changes little from slow to fast forward speeds is also solid.
 
 Doing less muscle work can only be good if one is trying to minimize metabolic energy consumption. However, to achieve greater tendon stresses, there must be greater muscle forces. Unless one is willing to reject the premise of the cost of generating force hypothesis, that is an important issue to confront. Further, the present data support the Kram & Dawson finding of decreased contact times at faster forward speeds. Kram & Taylor and subsequent applications of (and challenges to) their approach supports the idea that shorter contact times (tc) require recruiting more expensive muscle fibers and hence greater metabolic costs. Therefore, I think that it is incumbent on the present authors to clarify that this study has still not tied up the metabolic energetics across speed problems and placed a bow atop the package.
 
 Fortunately, I am confident that the impressive collective brain power that comprises this author list can craft a paragraph or two that summarizes these ideas and points out how the group is now uniquely and enviably poised to explore the problem more using a dynamic SIMM model that incorporates muscle energetics (perhaps ala' Umberger et al.). Or perhaps they have other ideas about how they can really solve the problem.
 
 You have raised important points, thank you for this feedback. We have added a limitations and considerations section to the discussion which highlights that there are still unanswered questions. Line 311-328
 
 Considerations and limitations
 
 “First, we believe it is more likely that the changes in moment arms and EMA can be attributed to speed rather than body mass, given the marked changes in joint angles and ankle height observed at faster hopping speeds. However, our sample included a relatively narrow range of body masses (13.7 to 26.6 kg) compared to the potential range (up to 80 kg), limiting our ability to entirely isolate the effects of speed from those of mass. Future work should examine a broader range of body sizes. Second, kangaroos studied here only hopped at relatively slow speeds, which bounds our estimates of EMA and tendon stress to a less critical region. As such, we were unable to assess tendon stress at fast speeds, where increased forces would reduce tendon safety factors closer to failure. A different experimental or modelling approach may be needed, as kangaroos in enclosures seem unwilling to hop faster over force plates. Finally, we did not determine whether the EMA of proximal hindlimb joints (which are more difficult to track via surface motion capture markers) remained constant with speed. Although the hip and knee contribute substantially less work than the ankle joint (Fig. 4), the majority of kangaroo skeletal muscle is located around these proximal joints. A change in EMA at the hip or knee could influence a larger muscle mass than at the ankle, potentially counteracting or enhancing energy savings in the ankle extensor muscle-tendon units. Further research is needed to understand how posture and muscles throughout the whole body contribute to kangaroo energetics.”
 
 Additionally, we added a line “Peak GRF also naturally increased with speed together with shorter ground contact durations (Fig. 2b, Suppl. Fig 1b)” (line 238) to highlight that we are not proposing that changes in EMA alone explain the full increase in tendon stress. Both GRF and EMA contribute substantially (almost equally) to stress, and we now give more equal discussion to both. For instance, we now also evaluate how much each contributes: “If peak GRF were constant but EMA changed from the average value of a slow hop to a fast hop, then stress would increase 18%, whereas if EMA remained constant and GRF varied by the same principles, then stress would only increase by 12%. Thus, changing posture and decreasing ground contact duration both appear to influence tendon stress for kangaroos, at least for the range of speeds we examined” (Line 245-249)
 
 We have added a paragraph in the discussion acknowledging that the cost of generating force problem is not resolved by our work, concluding that “This mechanism may help explain why hopping macropods do not follow the energetic trends observed in other species (Dawson and Taylor 1973, Baudinette et al. 1992, Kram and Dawson 1998), but it does not fully resolve the cost of generating force conundrum” Line 274-276.
 
 I have a few issues with the other half of this study (i.e. animal size effects). I would enjoy reading a new paragraph by these authors in the Discussion that considers the evolutionary origins and implications of such small safety factors. Surely, it would need to be speculative, but that's OK.
 
 We appreciate this comment from the reviewer, however could not extend the study to discuss animal size effects because, as we now note in the results: “The range of body masses may not be sufficient to detect an effect of mass on ankle moment in addition to the effect of speed.” Line 193
 
 Reviewer #2 (Public Review):
 
 Summary
 
 This is a fascinating topic that has intrigued scientists for decades. I applaud the authors for trying to tackle this enigma. In this manuscript, the authors primarily measured hopping biomechanics data from kangaroos and performed inverse dynamics.
 
 While these biomechanical analyses were thorough and impressively incorporated collected anatomical data and an Opensim model, I'm afraid that they did not satisfactorily address how kangaroos can hop faster and not consume more metabolic energy, unique from other animals. Noticeably, the authors did not collect metabolic data nor did they model metabolic rates using their modelling framework. Instead, they performed a somewhat traditional inverse dynamics analysis from multiple animals hopping at a self-selected speed.
 
 In the current study, we aimed to provide a joint-level explanation for the increases of tendon stress that are likely linked to metabolic energy consumption.
 
 We have now included a limitations section in the manuscript (See response to Rev 1). We plan to expand upon muscle level energetics in the future with a more detailed musculoskeletal model.
 
 Within these analyses, the authors largely focused on ankle EMA, discussing its potential importance (because it affects tendon stress, which affects tendon strain energy, which affects muscle mechanics) on the metabolic cost of hopping. However, EMA was roughly estimated (CoP was fixed to the foot, not measured) and did not detectibly associate with hopping speed (see results Yet, the authors interpret their EMA findings as though it systematically related with speed to explain their theory on how metabolic cost is unique in kangaroos vs. other animals
 
 As noted in our methods, EMA was not calculated from a fixed centre of pressure (CoP). We did fix the medial-lateral position, owing to the fact that both feet contacted the force plate together, but the anteroposterior movement of the CoP was recorded by the force plate and thus allowed to move. We report the movement (or lack of movement) in our results. The anterior-posterior axis is the most relevant to lengthening or shortening the distance of the ‘out-lever’ R, and thereby EMA. It is necessary to assume fixed medial-lateral position because a single force trace and CoP is recorded when two feet land on the force plate. The mediallateral forces on each foot cancel out so there is no overall medial-lateral movement if the forces are symmetrical (e.g. if the kangaroo is hopping in a straight path and one foot is not in front of the other). We only used symmetrical trials so that the anterior-posterior movement of the CoP would be reliable. We have now added additional details into the text to clarify this
 
 Indeed, the relationship between R and speed (and therefore EMA and speed) was not significant. However, the significant change in ankle height with speed, combined with no systematic change in COP at midstance, demonstrates that R would be greater at faster speeds. If we consider the nonsignificant relationship between R and speed to indicate that there is no change in R, then these two results conflict. We could not find a flaw in our methods, so instead concluded that the nonsignificant relationship between R and speed may be due to a small change in R being undetectable in our data. Taking both results into account, we believe it is more likely that there is a non-detectable change in R, rather than no change in R with speed, but we presented both results for transparency. We have added an additional section into the results to make this clearer (Line 177-185) “If we consider the nonsignificant relationship between R (and EMA) and speed to indicate that there is no change in R, then it conflicts with the ankle height and CoP result. Taking both into account, we think it is more likely that there is a small, but important, change in R, rather than no change in R with speed. It may be undetectable because we expect small effect sizes compared to the measurement range and measurement error (Suppl. Fig. 3h), or be obscured by a similar change in R with body mass. R is highly dependent on the length of the metatarsal segment, which is longer in larger kangaroos (1 kg BM corresponded to ~1% longer segment, P<0.001, R2=0.449). If R does indeed increase with speed, both R and r will tend to decrease EMA at faster speeds.”
 
 These speed vs. biomechanics relationships were limited by comparisons across different animals hopping at different speeds and could have been strengthened using repeated measures design
 
 There is significant variation in speed within individuals, not just between individuals. The preferred speed of kangaroos is 2-4.5 m/s, but most individuals showed a wide speed range within this. Eight of our 16 kangaroos had a maximum speed that was 1-2m/s faster than their slowest trial. Repeated measures of these eight individuals comprises 78 out of the 100 trials. It would be ideal to collect data across the full range of speeds for all individuals, but it is not feasible in this type of experimental setting. Interference with animals such as chasing is dangerous to kangaroos as they are prone to adverse reactions to stress. We have now added additional information about the chosen hopping speeds into the results and methods sections to clarify this “The kangaroos elected to hop between 1.99 and 4.48 m s-1, with a range of speeds and number of trials for each individual (Suppl. Fig. 9).” (Line 381-382)
 
 There are also multiple inconsistencies between the authors' theory on how mechanics affect energetics and the cited literature, which leaves me somewhat confused and wanting more clarification and information on how mechanics and energetics relate
 
 We thank the reviewer for this comment. Upon rereading we now understand the reviewers position, and have made substantial revisions to the introduction and discussion (See comments below)
 
 My apologies for the less-than-favorable review, I think that this is a neat biomechanics study - but am unsure if it adds much to the literature on the topic of kangaroo hopping energetics in its current form.
 
 Again we thank the reviewer for their time and appreciate their efforts to strengthen our manuscript.
 
 Reviewer #3 (Public Review):
 
 Summary:
 
 The goal of this study is to understand how, unlike other mammals, kangaroos are able to increase hopping speed without a concomitant increase in metabolic cost. They use a biomechanical analysis of kangaroo hopping data across a range of speeds to investigate how posture, effective mechanical advantage, and tendon stress vary with speed and mass. The main finding is that a change in posture leads to increasing effective mechanical advantage with speed, which ultimately increases tendon elastic energy storage and returns via greater tendon strain. Thus kangaroos may be able to conserve energy with increasing speed by flexing more, which increases tendon strain.
 
 Strengths:
 
 The approach and effort invested into collecting this valuable dataset of kangaroo locomotion is impressive. The dataset alone is a valuable contribution.
 
 Thank you!
 
 Weaknesses:
 
 Despite these strengths, I have concerns regarding the strength of the results and the overall clarity of the paper and methods used (which likely influences how convincingly the main results come across).
 
 (1) The paper seems to hinge on the finding that EMA decreases with increasing speed and that this contributes significantly to greater tendon strain estimated with increasing speed. It is very difficult to be convinced by this result for a number of reasons:
 
 It appears that kangaroos hopped at their preferred speed. Thus the variability observed is across individuals not within. Is this large enough of a range (either within or across subjects) to make conclusions about the effect of speed, without results being susceptible to differences between subjects?
 
 Apologies, this was not clear in the manuscript. Kangaroos hopping at their preferred speed means we did not chase or startle them into high speeds to comply with ethics and enclosure limitations. Thus we did not record a wide range of speeds within the bounds of what kangaroos are capable of in the wild (up to 12 m/s), but for the range we did measure (~2-4.5 m/s), there is a large amount of variation in hopping speed within each individual kangaroo. Out of 16 individuals, eight individuals had a difference of 1-2m/s between their slowest and fastest trials, and these kangaroos accounted for 78 out of 100 trials. Of the remainder, six individuals had three for fewer trials each, and two individuals had highly repeatable speeds (3 out of 4, and 6 out of 7 trials were within 0.5 m/s). We have now removed the terminology “preferred speed” e.g line 115. We have added additional information about the chosen hopping speeds into the results and methods, including an appendix figure “The kangaroos elected to hop between 1.99 and 4.48 m s-1, with a range of speeds and number of trials for each individual (Suppl. Fig. 9).” (Line 381-382)
 
 In the literature cited, what was the range of speeds measured, and was it within or between subjects?
 
 For other literature, to our knowledge the highest speed measured is ~9.5m/s (see supplementary Fig1b) and there were multiple measures for several individuals (see methods Kram & Dawson 1998).
 
 Assuming that there is a compelling relationship between EMA and velocity, how reasonable is it to extrapolate to the conclusion that this increases tendon strain and ultimately saves metabolic cost? They correlate EMA with tendon strain, but this would still not suggest a causal relationship (incidentally the p-value for the correlation is not reported).
 
 The functions that underpin these results (e.g. moment = GRF*R) come from physical mechanics and geometry, rather than statistical correlations. Additionally, a p-value is not appropriate in the relationship between EMA and stress (rather than strain) because the relationship does not appear to be linear. We have made it clearer in the discussion that we are not proposing that entire change in stress is caused by changes in EMA, but that the increase in GRF that naturally occurs with speed will also explain some of the increase in stress, along with other potential mechanisms. The discussion has been extensively revised to reflect this.
 
 Tendon strain could be increasing with ground reaction force, independent of EMA. Even if there is a correlation between strain and EMA, is it not a mathematical necessity in their model that all else being equal, tendon stress will increase as ema decreases? I may be missing something, but nonetheless, it would be helpful for the authors to clarify the strength of the evidence supporting their conclusions.
 
 Yes, GRF also contributes to the increase in tendon stress in the mechanism we propose (Suppl. Fig. 8), see the formulas in Fig 6, and we have made this clearer in the revised discussion (see above comment). You are correct that mathematically stress is inversely proportional to EMA, which can be observed in Fig. 7a, and we did find that EMA decreases.
 
 The statistical approach is not well-described. It is not clear what the form of the statistical model used was and whether the analysis treated each trial individually or grouped trials by the kangaroo. There is also no mention of how many trials per kangaroo, or the range of speeds (or masses) tested.
 
 The methods include the statistical model with the variables that we used, as well as the kangaroo masses (13.7 to 26.6 kg, mean: 20.9 ± 3.4 kg). We did not have sufficient within individual sample size to use a linear mixed effect model including subject as a random factor, thus all trials were treated individually. We have included this information in the results section.
 
 We have now moved the range of speeds from the supplementary material to the results and figure captions. We have added information on the number of trials per kangaroo to the methods, and added Suppl. Fig. 9 showing the distribution of speeds per kangaroo.
 
 We did not group the data e.g. by using an average speed per individual for all their trials, or by comparing fast to slow groups for statistical analysis (the latter was only for display purposes in our figures, which we have now made clearer in the methods statistics section).
 
 Related to this, there is no mention of how different speeds were obtained. It seems that kangaroos hopped at a self-selected pace, thus it appears that not much variation was observed. I appreciate the difficulty of conducting these experiments in a controlled manner, but this doesn’t exempt the authors from providing the details of their approach.
 
 Apologies, this was not clear in the manuscript. Kangaroos hopping at their preferred speed means we did not chase or startle them into high speeds to comply with ethics and enclosure limitations. Thus we did not record a wide range of speeds within the bounds of what kangaroos are capable of in the wild (up to 12 m/s). We have now removed the terminology “preferred speed” e.g. line 115. We have added additional information about the chosen hopping speeds into the results and methods, including an appendix figure (see above comment). (Line 381-382)
 
 Some figures (Figure 2 for example) present means for one of three speeds, yet the speeds are not reported (except in the legend) nor how these bins were determined, nor how many trials or kangaroos fit in each bin. A similar comment applies to the mass categories. It would be more convincing if the authors plotted the main metrics vs. speed to illustrate the significant trends they are reporting.
 
 Thank you for this comment. The bins are used only for display purposes and not within the statistical analysis. We have clarified this in the revised manuscript: “The data was grouped into body mass (small 17.6±2.96 kg, medium 21.5±0.74 kg, large 24.0±1.46 kg) and speed (slow 2.52±0.25 m s-1, medium 3.11±0.16 m s-1, fast 3.79±0.27 m s-1) subsets for display purposes only”. (Line 495-497)
 
 (2) The significance of the effects of mass is not clear. The introduction and abstract suggest that the paper is focused on the effect of speed, yet the effects of mass are reported throughout as well, without a clear understanding of the significance. This weakness is further exaggerated by the fact that the details of the subject masses are not reported.
 
 Indeed, the primary aim of our study was to explore the influence of speed, given the uncoupling of energy from hopping speed in kangaroos. We included mass to ensure that the effects of speed were not driven by body mass (i.e.: that larger kangaroos hopped faster). Subject masses were reported in the first paragraph of the methods, albeit some were estimated as outlined in the same paragraph.
 
 (3) The paper needs to be significantly re-written to better incorporate the methods into the results section. Since the results come before the methods, some of the methods must necessarily be described such that the study can be understood at some level without turning to the dedicated methods section. As written, it is very difficult to understand the basis of the approach, analysis, and metrics without turning to the methods.
 
 The methods after the discussion is a requirement of the journal. We have incorporated some methods in the results where necessary but not too repetitive or disruptive, e.g. Fig. 1 caption, and specifying we are only analysing EMA for the ankle joint
 
 Reviewing Editor (Recommendations For The Authors):
 
 Below is a list of specific recommendations that the authors could address to improve the eLife assessment:
 
 (1) Based on the data presented and the fact that metabolic energy was not measured, the authors should temper their conclusions and statements throughout the manuscript regarding the link between speed and metabolic energy savings. We recommend adding text to the discussion summarizing the strengths and limitations of the evidence provided and suggesting future steps to more conclusively answer this mystery.
 
 There is a significant body of work linking metabolic energy savings to measured increases in tendon stress in macropods. However, the purpose of this paper was to address the unanswered questions about why tendon stress increases. We found that stress did not only increase due to GRF increasing with speed as expected, but also due to novel postural changes which decreased EMA. In the revised manuscript, we have tempered our conclusions to make it clearer that it is not just EMA affecting stress, and added limitations throughout the manuscript (see response to Rev 1).
 
 (2) To provide stronger evidence of a link between speed, mechanics, and metabolic savings the authors can consider estimating metabolic energy expenditure from their OpenSIM model. This is one suggestion, but the authors likely have other, possibly better ideas. Such a model should also be able to explain why the metabolic rate increases with speed during uphill hopping.
 
 Extending the model to provide direct metabolic cost estimates will be the goal of a future paper, however the models does not have detailed muscle characteristics to do this in the formulation presented here. It would be a very large undertaking which is beyond the scope of the current manuscript. As per the comment above, the results of this paper are not reliant on metabolic performance.
 
 (3) The authors attempt to relate the newly quantified hopping biomechanics to previously published metabolic data. However, all reviewers agree that the logic in many instances is not clear or contradictory. Could one potential explanation be that at slow speeds, forces and tendon strain are small, and thus muscle fascicle work is high? Then, with faster speeds, even though the cost of generating isometric force increases, this is offset by the reduction in the metabolic cost of muscular work. The paper could provide stronger support for their hypotheses with a much clearer explanation of how the kinematics relate to the mechanics and ultimately energy savings.
 
 In response to the reviewers comments, we have substantially modified the discussion to provide clearer rationale.
 
 (4) The methods and the effort expended to collect these data are impressive, but there are a number of underlying assumptions made that undermine the conclusions. This is due partly to the methods used, but also the paper's incomplete description of their methods. We provide a few examples below:
 
 It would be helpful if the authors could speak to the effect of the limited speeds tested and between-animal comparisons on the ability to draw strong conclusions from the present dataset. ·
 
 Throughout the discussion, the authors highlight the relationship between EMA and speed. However, this is misleading since there was no significant effect of speed on EMA. Speed only affected the muscle moment arm, r. At minimum, this should be clarified and the effect on EMA not be overstated. Additionally, the resulting implications on their ability to confidently say something about the effect of speed on muscle stress should be discussed.
 
 We have now provided additional details, (see responses above) to these concerns. For instance, we added a supplementary figure showing the speed distribution per individual. The primary reviewer concern (that each kangaroo travelled at a single speed) was due to a miscommunication around the terminology “preferred” which has now been corrected.
 
 We now elaborate in the results why we are not very concerned that EMA is insignificant. The statistical insignificance of EMA is ultimately due to the insignificance of the direct measurement of R, however, we now better explain in the results why we believe that this statistical insignificance is due to error/noise of the measurement which is relatively large compared to the effect size. Indirect indications of how R may increase with speed (via ankle height from the ground) are statistically significant. Lines 177-185.
 
 We consider this worth reporting because, for instance, an 18% change in EMA will be undetectable by measurement, but corresponds to an 18% change in tendon stress which is measurable and physiologically significant (safety factor would decrease from 2 to 1.67). We presented both significant and insignificant results for transparency.
 
 We have also discussed this within a revised limitations section of the manuscript (Line 311328).
 
 Reviewer #1 (Recommendations For The Authors):
 
 Title: I would cut the first half of the title. At least hedge it a bit. "Clues" instead of "Unlocking the secrets".
 
 We have revised the title to: “Postural adaptations may contribute to the unique locomotor energetics seen in hopping kangaroos”
 
 In my comments, ... typically indicates a stylistic change suggested to the text.
 
 Overall, the paper covers speed and size. Unfortunately, the authors were not 100% consistent in the order of presenting size then speed, or speed then size. Just choose one and stick with it.
 
 We have attempted to keep the order of presenting size and speed consistent, however there are several cases where this would reduce the readability of the manuscript and so in some cases this may vary.
 
 One must admit that there is a lot of vertical scatter in almost all of the plots. I understand that these animals were not in a lab on a treadmill at a controlled speed and the animals wear fur coats so marker placements vary/move etc. But the spread is quite striking, e.g. Figure 5a the span at one speed is almost 10x. Can the authors address this somewhere? Limitations section?
 
 The variation seen likely results from attempting to display data in a 2D format, when it is in fact the result of multiple variables, including speed, mass, stride frequency and subject specific lengths. Slight variations in these would be expected to produce some noise around the mean, and I think it’s important to consider this while showing the more dominant effects.
 
 In many locations in the manuscript, the term "work" is used, but rarely if ever specified that this is the work "per hop". The big question revolves around the rate of metabolic energy consumption (i.e. energy per time or average metabolic power), one must not forget that hop frequency changes somewhat across speed, so work per hop is not the final calculation.
 
 Thank you for this comment. We have now explicitly stated work per hop in figure captions and in the results (line 208). The change in stride frequency at this range of speeds is very small, particularly compared to the variance in stride frequency (Suppl. Fig. 1d), which is consistent with other researchers who found that stride frequency was constant or near constant in macropods at analogous speeds (e.g. Dawson and Taylor 1973, Baudinette et al. 1987).
 
 Line 61 ....is likely related.
 
 Added “likely” (line 59)
 
 Line 86 I think the Allen reference is incomplete. Wasn't it in J Exp Biology?
 
 Thank you. Changed.
 
 Line 122 ... at faster speeds and in larger individuals.
 
 Changed: “We hypothesised that (i) the hindlimb would be more crouched at faster speeds, primarily due to the distal hindlimb joints (ankle and metatarsophalangeal), independent of changes with body mass” (Line 121-122).
 
 Line 124 I found this confusing. Try to re-word so that you explain you mean more work done by the tendons and less by the ankle musculature.
 
 Amended: “changes in moment arms resulting from the change in posture would contribute to the increase in tendon stress with speed, and may thereby contribute to energetic savings by increasing the amount of positive and negative work done by the ankle without requiring additional muscle work” (Line 123)
 
 Line 129 hopefully "braking" not "breaking"!
 
 Thank you. Fixed. (Line 130)
 
 Line 129 specify fore-aft horizontal force.
 
 Added "fore-aft" to "negative fore-aft horizontal component" (Line 130-131)
 
 Line 130 add something like "of course" or "naturally" since if there is zero fore-aft force, the GRF vector of course must be vertical.
 
 Added "naturally" (Line 132)
 
 Line 138 clarify that this section is all stance phase. I don't recall reading any swing phase data.
 
 Changed to: "Kangaroo hindlimb stance phase kinematics varied…" (Line 141)
 
 Line 143 and elsewhere. I found the use of dorsiflexion and plantarflexion confusing. In Figure 3, I see the ankle never flexing more than 90 degrees. So, the ankle joint is always in something of a flexed position, though of course it flexes and extends during contact. I urge the authors to simplify to flextion/extension and drop the plantar/dorsi.
 
 We have edited this section to describe both movements as greater extension (plantarflexion). (Line 147). We have further clarified this in the figure caption for figure 3.
 
 Line 147 ...changes were…
 
 Fixed, line 150
 
 Line 155 I'm a bit confused here. Are the authors calculating some sort of overall EMA or are they saying all of the individual joint EMAs all decreased?
 
 Thank you, we clarified that it is at the ankle. Line 158
 
 Line 158 since kangaroos hop and are thus positioned high and low throughout the stance phase, try to avoid using "high" and "low" for describing variables, e.g. GRF or other variables. Just use "greater/greatest" etc.
 
 Thanks for this suggestion. We have changed "higher" into "greater" where appropriate throughout the manuscript e.g. line 161
 
 Lines 162 and 168 same comment here about "r" and "R". Do you mean ankle or all joints?
 
 Clarified that it is the gastrocnemius and plantaris r, and the R to the ankle. (Lines 164-165)
 
 Line 173 really, ankle height?
 
 Added: ankle height is "vertical distance from the ground". Line 177
 
 Line 177 is this just the ankle r?
 
 Added "of the ankle" line 158 and “Achilles” line 187
 
 Line 183 same idea, which tendon/tendons are you talking about here?
 
 Added "Achilles" to be more clear (Line 187)
 
 Line 195 substitute "converted" for "transferred".
 
 Done (Line 210)
 
 Line 223 why so vague? i.e. why use "may"? Believe in your data. ...stress was also modulated by changes....
 
 Changed "may" to "is"
 
 Line 229 smaller ankle EMA (especially since you earlier talked about ankle "height").
 
 Changed “lower” to “smaller” Line 254
 
 Line 2236 ...and return elastic energy…
 
 Added "elastic" line 262
 
 Line 244 IMPORTANT: Need to explain this better! I think you are saying that the net work at the ankle is staying the same across speed, BUT it is the tendons that are storing and returning that work, it's not that the muscles are doing a lot of negative/positive work.
 
 Changed: “The consistent net work observed among all speeds suggests the ankle extensor muscle-tendon units are performing similar amounts of ankle work independent of speed, which would predominantly be done by the tendon.” Line 270-272)
 
 Line 258-261 I think here is where you are over-selling the data/story. Although you do say "a" mechanism (and not "the" mechanism, you still need to deal with the cost of generating more force and generating that force faster.
 
 We removed this sentence and replaced it with a discussion of the cost of generating force hypothesis, and alternative scenarios for the how force and metabolics could be uncoupled.
 
 Line 278 "the" tendon? Which tendon?
 
 Added "Achilles"
 
 Line 289. I don't think one can project into the past.
 
 Changed “projected” to "estimated"
 
 Line 303 no problem, but I've never seen a paper in biology where the authors admit they don't know what species they were studying!
 
 Can’t be helped unfortunately. It is an old dataset and there aren’t photos of every kangaroo. Fortunately, from the grey and red kangaroos we can distinguish between, we know there are no discernible species effects on the data.
 
 Lines 304-306 I'm not clear here. Did you use vertical impulse (and aerial time) to calculate body weight? Or did you somehow use the braking/propulsive impulse to calculate mass? I would have just put some apples on the force plate and waited for them to stop for a snack.
 
 Stationary weights were recorded for some kangaroos which did stand on the force plate long enough, but unfortunately not all of them were willing to do so. In those cases, yes, we used impulse from steady-speed trials to estimate mass. We cross-checked by estimated mass from segment lengths (as size and mass are correlated). This is outlined in the first paragraph of the methods.
 
 Lines 367 & 401 When you use the word "scaled" do you mean you assumed geometric similarity?
 
 No, rather than geometric scaling, we allowed scaling to individual dimensions by using the markers at midstance for measurements. We have amended the paragraph to clarify that the shape of the kangaroo changes and that mass distribution was preserved during the shape change (line 441-446)
 
 Lines 381-82 specify "joint work"
 
 Added "joint work" (Line 457)
 
 Figure 1 is gorgeous. Why not add the CF equation to the left panel of the caption?
 
 We decided to keep the information in the figure caption. “Total leg length was calculated as the sum of the segment lengths (solid black lines) in the hindlimb and compared to the pelvisto-toe distance (dashed line) to calculate the crouch factor”
 
 Figure 2 specify Horizontal fore-aft.
 
 Done
 
 Figure 3g I'd prefer the same Min. Max Flexion vertical axis labels as you use for hip & knee.
 
 While we appreciate the reviewer trying to increase the clarity of this figure, we have left it as plantar/dorsi flexion since these are recognised biomechanical terms. To avoid confusion, we have further defined these in the figure caption “For (f-g), increased plantarflexion represents a decrease in joint flexion, while increased dorsiflexion represents increased flexion of the joint.”
 
 Figure 4. I like it and I think that you scaled all panels the same, i.e. 400 W is represented by the same vertical distance in all panels. But if that's true, please state so in the Caption. It's remarkable how little work occurs at the hip and knee despite the relatively huge muscles there.
 
 Is it true that the y axes are all at the same scale. We have added this to the caption.
 
 Figure 5 Caption should specify "work per hop".
 
 Added
 
 Figure 7 is another beauty.
 
 Thank you!
 
 Supplementary Figure 3 is this all ANKLE? Please specify.
 
 Clarified that it is the gastrocnemius and plantaris r, and the R to the ankle.
 
 Reviewer #2 (Recommendations For The Authors):
 
 To 'unlock the secrets of kangaroo locomotor energetics' I expected the authors to measure the secretive outcome variable, metabolic rate using laboratory measures. Rather, the authors relied on reviewing historic metabolic data and collecting biomechanics data across different animals, which limits the conclusions of this manuscript.
 
 We have revised to the title to make it clearer that we are investigating a subset of the energetics problem, specifically posture. “Postural adaptations may contribute to the unique locomotor energetics seen in hopping kangaroos.” We have also substantially modified the discussion to temper the conclusions from the paper.
 
 After reading the hypothesis, why do the authors hypothesize about joint flexion and not EMA? Because the following hypothesis discusses the implications of moment arms on tendon stress, EMA predictions are more relevant (and much more discussed throughout the manuscript).
 
 Ankle and MTP angles are the primary drivers of changes in r, R & thus, EMA. We used a two part hypothesis to capture this. We have rephased the hypotheses: “We hypothesised that (i) the hindlimb would be more crouched at faster speeds, primarily due to the distal hindlimb joints (ankle and metatarsophalangeal), independent of changes with body mass, and (ii) changes in moment arms resulting from the change in posture would contribute to the increase in tendon stress with speed, and may thereby contribute to energetic savings by increasing the amount of positive and negative work done by the ankle without requiring additional muscle work.”
 
 If there were no detectable effects of speed on EMA, are kangaroos mechanically like other animals (Biewener Science 89 & JAP 04) who don't vary EMA across speeds? Despite no detectible effects, the authors state [lines 228-229] "we found larger and faster kangaroos were more crouched, leading to lower ankle EMA". Can the authors explain this inconsistency? Lines 236 "Kangaroos appear to use changes in posture and EMA". I interpret the paper as EMA does not change across speed.
 
 Apologies, we did not sufficiently explain this originally. We now explain in the results our reasoning behind our belief that EMA and R may change with speed. “If we consider the nonsignificant relationship between R (and EMA) and speed to indicate that there is no change in R, then it conflicts with the ankle height and CoP result. Taking both into account, we think it is more likely that there is a small, but important, change in R, rather than no change in R with speed. It may be undetectable because we expect small effect sizes compared to the measurement range and measurement error (Suppl. Fig. 3h), or be obscured by a similar change in R with body mass. R is highly dependent on the length of the metatarsal segment, which is longer in larger kangaroos (1 kg BM corresponded to ~1% longer segment, P<0.001, R2=0.449). If R does indeed increase with speed, both R and r will tend to decrease EMA at faster speeds.” (Line 177-185)
 
 Lines 335-339: "We assumed the force was applied along phalanx IV and that there was no medial or lateral movement of the centre of pressure (CoP)". I'm confused, did the authors not measure CoP location with respect to the kangaroo limb? If not, this simple estimation undermines primary results (EMA analyses).
 
 We have changed "The anterior or posterior movement of the CoP was recorded by the force plate" to read: "The fore-aft movement of the CoP was recorded by the force plate within the motion capture coordinate system" (Line 406-407) and added more justification for fixing the CoP movement in the other axis: “It was necessary to assume the CoP was fixed in the mediallateral axis because when two feet land on the force plate, the lateral forces on each foot are not recorded, and indeed cancel if the forces are symmetrical (i.e. if the kangaroo is hopping in a straight path and one foot is not in front of the other). We only used symmetrical trials to ensure reliable measures of the anterior-posterior movement of the CoP.” (Line 408-413)
 
 The introduction makes many assertions about the generalities of locomotion and the relationship between mechanics and energetics. I'm afraid that the authors are selectively choosing references without thoroughly evaluating alternative theories. For example, Taylor, Kram, & others have multiple papers suggesting that decreasing EMA and increasing muscle force (and active muscle volume) increase metabolic costs during terrestrial locomotion. Rather, the authors suggest that decreasing EMA and increasingly high muscle force at faster speeds don't affect energetics unless muscle work increases substantially (paragraph 2)? If I am following correctly, does this theory conflict with active muscle volume ideas that are peppered throughout this manuscript?
 
 Yes, as you point out, the same mechanism does lead to different results in kangaroos vs humans, for instance, but this is not a contradiction. In all species, decreasing EMA will result in an increase in muscle force due to less efficient leverage (i.e. lower EMA) of the muscles, and the muscle-tendon unit will be required to produce more force to balance the joint moment. As a consequence, human muscles activate a greater volume in order for the muscle-tendon unit to increase muscle work and produce enough force. We are proposing that in kangaroos, the increase in work is done by the achilles tendon rather than the muscles. Previous research suggests that macropod ankle muscles contract isometrically or that the fibres do not shorten more at faster speeds i.e. muscle work does not increase with speed. Instead, the additional force seems to come from the tendon storing and subsequently returning more strain energy (indicated by higher stress). We found that the increase in tendon stress comes from higher ground force at faster speeds, and from it adopting a more crouched posture which increases the tendons’ stresses compared to an upright posture for a given speed (think of this as increasing the tendon’s stress capacity). We have substantially revised the discussion to highlight this.
 
 Similarly, does increased gross or net tendon mechanical energy storage & return improve hopping energetics? Would more tendon stress and strain energy storage with a given hysteresis value also dissipate more mechanical energy, requiring leg muscles to produce more net work? Does net or gross muscle work drive metabolic energy consumption?
 
 Based on the cost of generating force hypothesis, we think that gross muscle work would be linked to driving metabolic energy consumption. Our idea here is that the total body work is a product of the work done by the tendon and the muscle combined. If the tendon has the potential to do more work, then the total work can increase without muscle work needing to increase.
 
 The results interpret speed effects on biomechanics, but each kangaroo was only collected at 1 speed. Are inter-animal comparisons enough to satisfy this investigation?
 
 We have added a figure (Suppl Fig 9) to demonstrate the distribution of speed and number of trials per kangaroo. We have also removed "preferred" from the manuscript as this seems to cause confusion. Most kangaroos travelled at a range of “casual” speeds.
 
 Abstract: Can the authors more fully connect the concept of tendon stress and low metabolic rates during hopping across speeds? Surely, tendon mechanics don't directly drive the metabolic cost of hopping, but they affect muscle mechanics to affect energetics.
 
 Amended to: " This phenomenon may be related to greater elastic energy savings due to increasing tendon stress; however, the mechanisms which enable the rise in stress, without additional muscle work remain poorly understood." (Lines 25-27).
 
 The topic sentence in lines 61-63 may be misleading. The ensuing paragraph does not substantiate the topic sentence stating that ankle MTUs decouple speeds and energetics.
 
 We added "likely" to soften the statement. (Line 59)
 
 Lines 84-86: In humans, does more limb flexion and worse EMA necessitate greater active muscle volume? What about muscle contractile dynamics - See recent papers by Sawicki & colleagues that include Hill-type muscle mechanics in active muscle volume estimates.
 
 Added: “Smaller EMA requires greater muscle force to produce a given force on the ground, thereby demanding a greater volume of active muscle, and presumably greater metabolic rates than larger EMA for the same physiology”. (Line 80-82)
 
 Lines 106: can you give the context of what normal tendon safety factors are?
 
 Good idea. Added: "far lower than the typical safety factor of four to eight for mammalian tendons (Ker et al. 1988)." Line 106-107
 
 I thought EMA was relatively stable across speeds as per Biewener [Science & JAP '04]. However the authors gave an example of an elephant to suggest that it is typically inversely related to speed. Can the authors please explain the disconnect and the most appropriate explanation in this paragraph?
 
 Knee EMA in particular changed with speed in Biewener 2004. What is “typical” probably depends on the group of animals studied; e.g., cursorial quadrupedal mammals generally seem to maintain constant EMA, but other groups do not.
 
 These cases are presented to show a range of consequences for changing EMA (usually with mass, but sometimes with speed). We have made several adjustments to the paragraph to make this clearer. Lines 85-93.
 
 The results depend on the modeled internal moment arm (r). How confident are the authors in their little r prediction? Considering complications of joint mechanics in vivo including muscle bulging. Holzer et al. '20 Sci Rep demonstrated that different models of the human Achilles tendon moment arm predict vastly different relationships between the moment arm and joint angle.
 
 Our values for r and EMA closely align with previous papers which measured/calculate these values in kangaroos, such as Kram 1998, and thus we are confident in our interpretation.
 
 This is a misleading results sentence: Small decreases in EMA correspond to a nontrivial increase in tendon stress, for instance, reducing EMA from 0.242 (mean minimum EMA of the slow group) to 0.206 (mean minimum EMA of the fast group) was associated with an ~18% increase in tendon stress. The authors could alternatively say that a ~15% decrease in EMA was associated with an ~18% increase in tendon stress, which seems pretty comparable.
 
 Thank you for pointing this out, it is important that it is made clearer. Although the change in relative magnitude is approximately the same (as it should be), this does not detract from the importance. The "small decrease in EMA" is referring to the absolute values, particularly in respect to the measurement error/noise. The difference is small enough to have been undetectable with other methods used in previous studies. We have amended the sentence to clarify this.
 
 It now reads: “Subtle decreases in EMA which may have been undetected in previous studies correspond to discernible increases in tendon stress. For instance, reducing EMA from 0.242 (mean minimum EMA of the slow group) to 0.206 (mean minimum EMA of the fast group) was associated with an increase in tendon stress from ~50 MPa to ~60 MPa, decreasing safety factor from 2 to 1.67 (where 1 indicates failure), which is both measurable and physiologically significant.” (Line 195-200)
 
 Lines 243-245: "The consistent net work observed among all speeds suggests the ankle extensors are performing similar amounts of ankle work independent of speed." If this is true, and presumably there is greater limb work performed on the center of mass at faster speeds (Donelan, Kram, Kuo), do more proximal leg joints increase work and energy consumption at faster speeds?
 
 The skin over the proximal leg joints (knee and hip) moves too much to get reliable measures of EMA from the ratio of moment arms. This will be pursued in future work when all muscles are incorporated in the model so knee and hip EMA can be determined from muscle force.
 
 We have added limitations and considerations paragraph to the manuscript: “Finally, we did not determine whether the EMA of proximal hindlimb joints (which are more difficult to track via surface motion capture markers) remained constant with speed. Although the hip and knee contribute substantially less work than the ankle joint (Fig. 4), the majority of kangaroo skeletal muscle is located around these proximal joints. A change in EMA at the hip or knee could influence a larger muscle mass than at the ankle, potentially counteracting or enhancing energy savings in the ankle extensor muscle-tendon units. Further research is needed to understand how posture and muscles throughout the whole body contribute to kangaroo energetics.” (Line 321-328)
 
 Lines 245-246: "Previous studies using sonomicrometry have shown that the muscles of tammar wallabies do not shorten considerably during hops, but rather act near-isometrically as a strut" Which muscles? All muscles? Extensors at a single joint?
 
 Added "gastrocnemius and plantaris" Line 164-165
 
 Lines 249-254: "The cost of generating force hypothesis suggests that faster movement speeds require greater rates of muscle force development, and in turn greater cross-bridge cycling rates, driving up metabolic costs (Taylor et al. 1980, Kram and Taylor 1990). The ability for the ankle extensor muscle fibres to remain isometric and produce similar amounts of work at all speeds may help explain why hopping macropods do not follow the energetic trends observed in quadrupedal species." These sentences confuse me. Kram & Taylor's cost of force-generating hypothesis assumes that producing the same average force over shorter contact times increases metabolic rate. How does 'similar muscle work' across all speeds explain the ability of macropods to use unique energetic trends in the cost of force-generating hypothesis context?
 
 Thank you for highlighting this confusion. We have substantially revised the discussion clarify where the mechanisms presented deviate from the cost of generating force hypothesis. Lines 270-309
 
 Reviewer #3 (Recommendations For The Authors):
 
 In addition to the points described in the public review, I have additional, related, specific comments:
 
 (1) Results: Please refer to the hypotheses in the results, and relate the the findings back to the hypotheses.
 
 We now relate the findings back to the hypotheses
 
 Line 142 “In partial support of hypothesis (i), greater masses and faster speeds were associated with more crouched hindlimb postures (Fig. 3a,c).”.
 
 Lines 205-206: “The increase in tendon stress with speed, facilitated in part by the change in moment arms by the shift in posture, may explain changes in ankle work (c.f. Hypothesis (ii)).”
 
 (2) Results: please provide the main statistical results either in-line or in a table in the main text.
 
 We (the co-authors) have discussed this at length, and have agreed that the manuscript is far more readable in the format whereby most statistics lie within the supplementary tables, otherwise a reader is met with a wall of statistics. We only include values in the main text when the magnitude is relevant to the arguments presented in the results and discussion.
 
 (3) Line 140: Describe how 'crouched' was defined.
 
 We have now added a brief definition of ‘Crouch factor’ after the figure caption. (Line 143) (Fig. 3a,c; where crouch factor is the ratio of total limb length to pelvis to toe distance).
 
 (4) Line 162: This seems to be a main finding and should be a figure in the main text not supplemental. Additionally, Supplementary Figures 3a and b do not show this finding convincingly There should be a figure plotting r vs speed and r vs mass.
 
 The combination of r and R are represented in the EMA plot in the main text. The r and R plots are relegated to the supplementary because the main text is already very crowded. Thank you for the suggestion for the figure plotting r and R versus speed, this is now included as Suppl. Fig. 3h
 
 (5) Line 166: Supplementary Figure 3g does not show the range of dorsiflexion angles as a function of speed. It shows r vs dorsiflexion angle. Please correct.
 
 Thanks for noticing this, it was supposed to reference Fig 3g rather than Suppl Fig 3g in the sentence regarding speed. We have fixed this, Line 170.
 
 We had added a reference to Suppl Fig 3 on Line 169 as this shows where the peak in r with ankle angle occurs (114.4 degrees).
 
 (6) Line 184: Where are the statistical results for this statement?
 
 The relationship between stress and EMA does not appear to be linear, thus we only present R^2 for the power relationship rather than a p-value.
 
 (7) Line 192: The authors should explain how joint work and power relate/support the overall hypotheses. This section also refers to Figures 4 and 5 even though Figures 6 and 7 have already been described. Please reorganize.
 
 We have added a sentence at the end of the work and power section to mention hypothesis (ii) and lead into the discussion where it is elaborated upon.
 
 “The increase in positive and negative ankle work may be due to the increase in tendon stress rather than additional muscle work.” Line 219-220 We have rearranged the figure order.
 
 (8) The statistics are not reported in the main text, but in the supplementary tables. If a result is reported in the main text, please report either in-line or with a table in the main text.
 
 We leave most statistics in the supplementary tables to preserve the readability of the manuscript. We only include values in the main text when the magnitude is relevant to the arguments raised in the results and discussion.
 
 AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.02.05.578950v2
www.biorxiv.org www.biorxiv.org

Membrane potential modulates ERK activity and cell proliferation

5
1. Public_Reviews 15 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This important paper employs multiple experimental approaches and presents evidence that changes in membrane voltage directly affect ERK signaling to regulate cell division. This result is relevant because it supports an ion channel-independent pathway by which changes in membrane voltage can affect cell growth. The evidence now presented is solid and the data support the conclusions. This paper should be of interest to a broad readershp in the areas of cell and developemental biology and electrophysiology.
 
 Summary
2. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 This is a contribution to the field of developmental bioelectricity. How do changes of resting potential at the cell membrane affect downstream processes? Zhou et al. reported in 2015 that phosphatidylserine and K-Ras cluster upon plasma membrane depolarization and that voltage-dependent ERK activation occurs when constitutively active K-RasG12V mutants are overexpressed. In this paper, the authors advance the knowledge of this phenomenon by showing that membrane depolarization up-regulates mitosis and that this process is dependent on voltage-dependent activation of ERK. ERK activity's voltage-dependence is derived from changes in the dynamics of phosphatidylserine in the plasma membrane and not by extracellular calcium dynamics. This paper reports an interesting and important finding. It is somewhat derivative of Zhou et al., 2015 (https://www.science.org/doi/full/10.1126/science.aaa5619). The main novelty seems to be that they find quantitatively different conclusions upon conducting similar experiments, albeit with a different cell line (U2OS) than those used by Zhou et al. Sasaki et al. do show that increased K+ levels increase proliferation, which Zhou et al. did not look at. The data presented in this paper are a useful contribution to a field often lacking such data.
 
 Review 1
3. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Sasaki et al. use a combination of live-cell biosensors and patch-clamp electrophysiology to investigate the effect of membrane potential on the ERK MAPK signaling pathway, and probe associated effects on proliferation. This is an effect that has long been proposed, but a convincing demonstration has remained elusive, because it is difficult to perturb membrane potential without disturbing other aspects of cell physiology in complex ways. The time-resolved measurements here are a nice contribution to this question, and the perforated patch clamp experiments with an ERK biosensor are fantastic - they come closer to addressing the above difficulty of perturbing voltage than any prior work. It would have been difficult to obtain these observations with any other combination of tools.
 
 Comments on previous revisions:
 
 The authors have done a good job addressing the comments on the previous submission.
 
 Review 2
4. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This paper demonstrates that membrane depolarization induces a small increase in cell entry into mitosis. Based on previous work from another lab, the authors propose that ERK activation might be involved. They show convincingly using a combination of assays that ERK is activated by membrane depolarization. They show this is Ca2+ independent and is a result of activation of the whole K-Ras/ERK cascade which results from changed dynamics of phosphatidylserine in the plasma membrane that activates K-Ras. Although the activation of the Ras/ERK pathway by membrane depolarization is not new, linking it to an increase in cell proliferation is novel.
 
 Strengths:
 
 A major strength of the study is the use of different techniques - live imaging with ERK reporters, as well as Western blotting to demonstrate ERK activation as well as different methods for inducing membrane depolarization. They also use a number of different cell lines. Via Western blotting the authors are also able to show that the whole MAPK cascade is activated.
 
 Weaknesses:
 
 In the previous round of revisions, the authors addressed the issues with Figure 1, and the data presented are much clearer. The authors did also attempt to pinpoint when in the cell cycle ERK is having its activity, but unfortunately, this was not conclusive.
 
 Review 3
5. Public_Reviews 15 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the previous reviews
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This is a contribution to the field of developmental bioelectricity. How do changes of resting potential at the cell membrane affect downstream processes? Zhou et al. reported in 2015 that phosphatidylserine and K-Ras cluster upon plasma membrane depolarization and that voltage-dependent ERK activation occurs when constitutively active K-RasG12V mutants are overexpressed. In this paper, the authors advance the knowledge of this phenomenon by showing that membrane depolarization up-regulates mitosis and that this process is dependent on voltage-dependent activation of ERK. ERK activity's voltage-dependence is derived from changes in the dynamics of phosphatidylserine in the plasma membrane and not by extracellular calcium dynamics. This paper reports an interesting and important finding. It is somewhat derivative of Zhou et al., 2015. (https://www.science.org/doi/full/10.1126/science.aaa5619). The main novelty seems to be that they find quantitatively different conclusions upon conducting similar experiments, albeit with a different cell line (U2OS) than those used by Zhou et al. Sasaki et al. do show that increased K+ levels increase proliferation, which Zhou et al. did not look at. The data presented in this paper are a useful contribution to a field often lacking such data.
 
 Strengths:
 
 Bioelectricity is an important field for areas of cell, developmental, and evolutionary biology, as well as for biomedicine. Confirmation of ERK as a transduction mechanism and a characterization of the molecular details involved in the control of cell proliferation are interesting and impactful.
 
 Weaknesses:
 
 The authors lean heavily on the assumption that the Nernst equation is an accurate predictor of membrane potential based on K+ level. This is a large oversimplification that undermines the author's conclusions, most glaringly in Figure 2C. The author's conclusions should be weakened to reflect that the activity of voltage gated ion channels and homeostatic compensation are unaccounted for.
 
 We appreciate the reviewer’s thoughtful comment regarding our reliance on the Nernst equation to estimate membrane potential. We agree that the Nernst equation is a simplification and does not account for the activity of other ions, voltage-gated channels, or homeostatic compensation mechanisms. To address this concern, we conducted electrophysiological experiments in which the membrane potential was directly controlled using the perforated patch-clamp technique (Fig. 3). Under these conditions, we also monitored the membrane potential and confirmed that there was negligible drift within 20 minutes of perfusion with 145 mM K⁺ (only a 1–5 mV change). These results suggest that the influence of voltage-gated channels and homeostatic compensation is minimal in our experimental setup. We revised the manuscript to clarify these limitations and to present our conclusions more cautiously in light of this point.
 
 “A potential limitation of extracellular K⁺-based approaches is their reliance on the Nernst equation to estimate membrane potential, which oversimplifies the actual situation by neglecting voltage-gated ion channel activity and compensatory mechanisms. To directly address this concern, we measured membrane potential using the perforated patch-clamp technique and confirmed that the potential was stable during perfusion with 145 mM K⁺ (only a 1–5 mV drift within 20 min). Moreover, we used a voltage clamp to precisely control the membrane potential and demonstrated that ERK activity was directly regulated by the voltage itself, excluding the influence of other secondary factors. An additional strength of electrophysiology is its ability to examine the effects of repolarization, which is difficult to assess with conventional perfusion-based methods owing to slow solution exchange.”
 
 There are grammatical tense errors are made throughout the paper (ex line 99 "This kinetics should be these kinetics")
 
 We thank the reviewer for pointing out the grammatical errors. We carefully revised the entire manuscript.
 
 Line 71: Zhou et al. use BHK, N2A, PSA-3 cells, this paper uses U2OS (osteosarcoma) cells. Could that explain the differences in bioelectric properties that they describe? In general, there should be more discussion of the choice of cell line. Why were U2OS cells chosen? What are the implications of the fact that these are cancer cells, and bone cancer cells in particular? Does this paper provide specific insights for bone cancers? And crucially, how applicable are findings from these cells to other contexts?
 
 We thank the reviewer for this valuable comment regarding the choice of cell line. We selected U2OS cells primarily because they are well suited for live-cell FRET imaging. We did not use BHK, N2A, or PSA-3 cells, and therefore it is difficult for us to provide a clear comparison with the specific bioelectric properties reported in Zhou et al. Nevertheless, we agree that cancer cell lines, including U2OS, may exhibit bioelectric properties that differ from those of non-cancerous cells. While this could be a potential limitation, we are inclined to consider voltage-dependent ERK activation to be a fundamental and generalizable phenomenon, not restricted to osteosarcoma cells. The key components of this pathway—phosphatidylserine, Ras, MAPK (including ERK)—are expressed in essentially all mammalian cells. In support of our view, we observed voltage-dependent ERK activation not only in U2OS cells but also in HeLa, HEK293, and A431 cells. These results strongly suggest that the mechanism we describe is not cell-type specific but rather a universal feature of mammalian cells. In the revised Discussion, we expanded our rationale to choose U2OS cells, while addressing the potential implications of using a cancer-derived cell line.
 
 “In this study, we primarily used U2OS cells because their flat morphology makes them suitable for live-cell FRET imaging. Although cancer cell lines, including U2OS, may display bioelectric properties that differ from those of noncancerous cells, our findings raise the possibility that voltage-dependent ERK activation is a fundamental and broadly applicable phenomenon rather than a feature specific to osteosarcoma cells. This conclusion is supported by the fact that essential components of this pathway, namely phosphatidylserine, Ras, and MAPK (including ERK), are ubiquitously expressed in mammalian cells. Consistent with this finding, we observed voltage-dependent ERK activation across multiple cell lines: U2OS, HeLa, HEK293, and A431 cells (Fig.S2). These observations indicate that the mechanism we describe is not cell-type-restricted, but rather a universal property of mammalian cells.”
 
 Line 115: The authors use EGF to calibrate 'maximal' ERK stimulation. Is this level near saturation? Either way is fine, but it would be useful to clarify.
 
 We thank the reviewer for raising this important point. The YFP/CFP ratio obtained after EGF stimulation is generally considered to represent saturation levels detectable by EKAREV imaging. However, we acknowledge that it remains uncertain whether 10 ng/mL EGF induces the absolute maximal ERK activity in all contexts. To clarify this point, we revised the manuscript (result) text as follows:
 
 “To normalize variation among cells, cells were stimulated with EGF (10 ng/mL) at the end of the experiment, which presumably yielded a near-saturated YFP/CFP value (ERK activity). This value was used to determine the maximum ERK activity in each cell”
 
 Line 121: Starting line 121 the authors say "Of note, U2OS cells expressed wild-type K-Ras but not an active mutant of K-Ras, which means voltage dependent ERK activation occurs not only in tumor cells but also in normal cells". Given that U2OS cells are bone sarcoma cells, is it appropriate to refer to these as 'normal' cells in contrast to 'tumor' cells?
 
 We thank the reviewer for pointing this out. We agree that it is not appropriate to contrast U2OS cells with “normal” cells, since they are sarcoma-derived. To address this point, we revised the sentence to weaken the claim and avoid the misleading terminology.
 
 “Importantly, as U2OS cells express wild-type K-Ras rather than an oncogenic mutant (16), our results raise the possibility that voltage-dependent ERK activation may also occur in non-transformed cells.”
 
 Line 101: These normalizations seem reasonable, the conclusions sufficiently supported and the requisite assumptions clearly presented. Because the dish-to-dish and cell-to-cell variation may reflect biologically relevant phenomena it would be ideal if non-normalized data could be added in supplemental data where feasible.
 
 We thank the reviewer for this helpful suggestion. As recommended, we added representative non-normalized data in the Supplemental Figure S1, which illustrates the non-normalized variation across cells and dishes.
 
 Figure 2C is listed as Figure 2D in the text
 
 There is no Figure 2F (Referenced in line 148)
 
 We thank the reviewer for pointing out these errors. The incorrect figure citations were corrected.
 
 Reviewer #2 (Public review):
 
 Sasaki et al. use a combination of live-cell biosensors and patch-clamp electrophysiology to investigate the effect of membrane potential on the ERK MAPK signaling pathway, and probe associated effects on proliferation. This is an effect that has long been proposed, but a convincing demonstration has remained elusive, because it is difficult to perturb membrane potential without disturbing other aspects of cell physiology in complex ways. The time-resolved measurements here are a nice contribution to this question, and the perforated patch clamp experiments with an ERK biosensor are fantastic - they come closer to addressing the above difficulty of perturbing voltage than any prior work. It would have been difficult to obtain these observations with any other combination of tools.
 
 However, there are still some concerns as detailed in specific comments below:
 
 Specific comments:
 
 (1) All the observations of ERK activation, by both high extracellular K+ and voltage clamp, could be explained by cell volume increase (more discussion in subsequent comments). There is a substantial literature on ERK activation by hypotonic cell swelling (e.g. https://doi.org/10.1042/bj3090013, https://doi.org/10.1002/j.1460-2075.1996.tb00938.x, among others). Here are some possible observations that could demonstrate that ERK activation by volume change is distinct from the effects reported here:
 
 (i) Does hypotonic shock activate ERK in U2OS cells?
 
 (ii) Can hypotonic shock activate ERK even after PS depletion, whereas extracellular K+ cannot?
 
 (iii) Does high extracellular K+ change cell volume in U2OS cells, measured via an accurate method such as fluorescence exclusion microscopy?
 
 (iv) It would be helpful to check the osmolality of all the extracellular solutions, even though they were nominally targeted to be iso-osmotic.
 
 (2) Some more details about the experimental design and the results are needed from Figure 1:
 
 (i) For how long are the cells serum-starved? From the Methods section, it seems like the G1 release in different K+ concentration is done without serum, is this correct? Is the prior thymidine treatment also performed in the absence of serum?
 
 (ii) There is a question of whether depolarization constitutes a physiologically relevant mechanism to regulate proliferation, and how depolarization interacts with other extracellular signals that might be present in an in vivo context. Does depolarization only promote proliferation after extended serum starvation (in what is presumably a stressed cell state)? What fraction of total cells are observed to be mitotic (without normalization), and how does this compare to the proliferation of these cells growing in serum-supplemented media? Can K+ concentration tune proliferation rate even in serum-supplemented media?
 
 (3) In Figure 2, there are some possible concerns with the perfusion experiment:
 
 (i) Is the buffer static in the period before perfusion with high K+, or is it perfused? This is not clear from the Methods. If it is static, how does the ERK activity change when perfused with 5 mM K+? In other words, how much of the response is due to flow/media exchange versus change in K+ concentration?
 
 (ii) Why do there appear to be population-average decreases in ERK activity in the period before perfusion with high K+ (especially in contrast to Fig. 3)? The imaging period does not seem frequent enough for photobleaching to be significant.
 
 (4) Figure 3 contains important results on couplings between membrane potential and MAPK signaling. However, there are a few concerns:
 
 (i) Does cell volume change upon voltage clamping? Previous authors have shown that depolarizing voltage clamp can cause cells to swell, at least in the whole-cell configuration: https://www.cell.com/biophysj/fulltext/S0006-3495(18)30441-7 . Could it be possible that the clamping protocol induces changes in ERK signaling due to changes in cell volume, and not by an independent mechanism?
 
 (ii) Does the -80 mV clamp begin at time 0 minutes? If so, one might expect a transient decrease in sensor FRET ratio, depending on the original resting potential of the cells. Typical estimates for resting potential in HEK293 cells range from -40 mV to -15 mV, which would reach the range that induces an ERK response by depolarizing clamp in Fig. 3B. What are the resting potentials of the cells before they are clamped to -80 mV, and why do we not see this downward transient?
 
 (5) The activation of ERK by perforated voltage clamp and by high extracellular K+ are each convincing, but it is unclear whether they need to act purely through the same mechanism - while additional extracellular K+ does depolarize the cell, it could also be affecting function of voltage-independent transporters and cell volume regulatory mechanisms on the timescales studied. To more strongly show this, the following should be done with the HEK cells where there is already voltage clamp data:
 
 (i) Measure resting potential using the perforated patch in zero-current configuration in the high K+ medium. Ideally this should be done in the time window after high K+ addition where ERK activation is observed (10-20 minutes) to minimize the possibility of drift due to changes in transporter and channel activity due to post-translational regulation.
 
 (ii) Measure YFP/CFP ratio of the HEK cells in the high K+ medium (in contrast to the U2OS cells from Fig. 2 where there is no patch data).
 
 (iii) The assertion that high K+ is equivalent to changes in Vmem for ERK signaling would be supported if the YFP/CFP change from K+ addition is comparable to that induced by voltage clamp to the same potential. This would be particularly convincing if the experiment could be done with each of the 15 mM, 30 mM, and 145 mM conditions.
 
 (6) Line 170: "ERK activity was reduced with a fast time course (within 1 minute) after repolarization to -80 mV." I don't see this in the data: in Fig. 3C, it looks like ERK remains elevated for > 10 min after the electrical stimulus has returned to -80 mV
 
 Comments on revisions:
 
 The authors have done a good job addressing the comments on the previous submission.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This paper demonstrates that membrane depolarization induces a small increase in cell entry into mitosis. Based on previous work from another lab, the authors propose that ERK activation might be involved. They show convincingly using a combination of assays that ERK is activated by membrane depolarization. They show this is Ca2+ independent and is a result of activation of the whole K-Ras/ERK cascade which results from changed dynamics of phosphatidylserine in the plasma membrane that activates K-Ras. Although the activation of the Ras/ERK pathway by membrane depolarization is not new, linking it to an increase in cell proliferation is novel.
 
 Strengths
 
 A major strength of the study is the use of different techniques - live imaging with ERK reporters, as well as Western blotting to demonstrate ERK activation as well as different methods for inducing membrane depolarization. They also use a number of different cell lines. Via Western blotting the authors are also able to show that the whole MAPK cascade is activated.
 
 Weaknesses
 
 A weakness of the study is the data in Figure 1 showing that membrane depolarization results in an increase of cells entering mitosis. There are very few cells entering mitosis in their sample in any condition. This should be done with many more cells to increase the confidence in the results. The study also lacks a mechanistic link between ERK activation by membrane depolarization and increased cell proliferation.
 
 The authors did achieve their aims with the caveat that the cell proliferation results could be strengthened. The results, for the most par,t support the conclusions.
 
 This work suggests that alterations in membrane potential may have more physiological functions than action potential in the neural system as it has an effect on intracellular signalling and potentially cell proliferation.
 
 In the revised manuscript, the authors have now addressed the issues with Figure 1, and the data presented are much clearer. They did also attempt to pinpoint when in the cell cycle ERK is having its activity, but unfortunately, this was not conclusive.
 
 Reviewer #2 (Recommendations for the authors):
 
 Small issues:
 
 Fig. 1A. Please add a mark on the timeline showing when the K+ concentration is changed. Also, please add a time axis that matches the time axis in (C), so readers can know when in C the medium was changed.
 
 1B caption: unclear what "the images were 20 min before and after cytokinesis" means, given that the images go from -30 min to +20 min. Maybe the authors mean, "the indicated times are measured relative to cytokinesis."
 
 Thank you for bringing these points to our attention that can confuse readers. We revised the figure legend.
 
 Line 214: nonoclusters --> nanoclusters
 
 Line 475: 10 mm -> 10 ¥mum
 
 Corrected.
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.27.610010v4
www.biorxiv.org www.biorxiv.org

Evidence for systematic - yet task- and motor-contingent - rhythmicity of auditory perceptual judgements

5
1. Public_Reviews 14 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This high-N, multi-task study offers a comprehensive examination of rhythmicity in behavioral performance during listening. It presents a valuable set of findings that reveal task- and ear-specific effects, challenging the notion of a universal rhythmicity in auditory perception. The evidence is solid and the work is likely to be of significant interest to behavioral and cognitive scientists focused on perception and neural oscillations.
  
  Summary
2. Public_Reviews 14 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper presents results from four independent experiments, each of them testing for rhythmicity in auditory perception. The authors report rhythmic fluctuations in discrimination performance at frequencies between 2 and 6 Hz. The exact frequency depends on the ear and experimental paradigm, although some frequencies seem to be more common than others.
  
  Strengths:
  
  The first sentence in the abstract describes the state of the art perfectly: "Numerous studies advocate for a rhythmic mode of perception; however, the evidence in the context of auditory perception remains inconsistent". This is precisely why the data from the present study is so valuable. This is probably the study with the highest sample size (total of > 100 in 4 experiments) in the field. The analysis is very thorough and transparent, due to the comparison of several statistical approaches and simulations of their sensitivity. Each of the experiments differs from the others in a clearly defined experimental parameter, and the authors test how this impacts auditory rhythmicity, measured in pitch discrimination performance (accuracy, sensitivity, bias) of a target presented at various delays after noise onset.
  
  Weaknesses:
  
  The authors find that the frequency in auditory perception changes between experiments. Possible reasons for such differences are described, but they remain difficult to interpret, as it is unclear whether they merely reflect some natural variability (independent of experimental parameters) or are indeed driven by the specific experimental paradigm (and therefore replicable).
  
  Therefore, it remains to be shown whether there is any systematic pattern in the results that allows conclusions about the roles of different frequencies.
  
  Review 1
3. Public_Reviews 14 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The current study aims to shed light on why previous work on perceptual rhythmicity has led to inconsistent results. They propose that the differences may stem from conceptual and methodological issues. In a series of experiments, the current study reports perceptual rhythmicity in different frequency bands that differ between different ear stimulations and behavioral measures. The study suggests challenges regarding the idea of universal perceptual rhythmicity in hearing.
  
  Strengths:
  
  The study aims to address differences observed in previous studies about perceptual rhythmicity. This is important and timely because the existing literature provides quite inconsistent findings. Several experiments were conducted to assess perceptual rhythmicity in hearing from different angles. The authors use sophisticated approaches to address the research questions. The manuscript has greatly improved after the revision.
  
  Weaknesses:
  
  Additional variance: In several experiments, a fixation cross preceded - at a variable interval - the onset of the background noise that aimed to reset the phase of an ongoing oscillation. There is the chance that the fixation cross also resets the phase, potentially adding variance to the data. In addition, the authors used an adaptive procedure during the experimental blocks such that the stimulus intensity was adjusted throughout. There is good reason for doing so, but it means that correctly identified/discriminated targets will on average have a greater stimulus intensity. This may add variance to the data. These two aspects may potentially contribute to the observation of weak perceptual rhythmicity.
  
  Figures: The text in Figures 4 and 6 is small. I think readers would benefit from a larger font size. Moreover, Figure 1A is not very intuitive. Perhaps it could be made clearer. The new Figure 5 was not discussed in the text. I wonder whether analyses with traditional t-tests could be placed in the supplements.
  
  50% significant samples: The authors consider 50% of significant bootstrapped samples robust. For example: "This revealed that the above‐mentioned effects prevail for at least 50% of the simulated experiments, corroborating their robustness within the participant sample". Many of the effects have even lower than 50% of significant samples. It is a matter of opinion of what is robust or not, but I think combined with the overall variable nature of the effects in different frequency bands and ears etc. leaves more the impression that the effects are not very robust. I think the authors state it correctly in the last sentence of the first paragraph of the discussion: "At the same time the prevalence of significant effects in random samples of participants were mostly below 50%, raising questions as to the ubiquity of such effects." I think the authors should update the abstract in this regard to avoid that readers who only read the abstract get the wrong impression about the robustness of the effects. It is not clear to me if the same study (using the same conditions) was done in a different lab that the results would come out similarly to the results reported here.
  
  Review 2
4. Public_Reviews 14 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The finding of rhythmic activity in the brain has for a long time engendered the theory of rhythmic modes of perception, that humans might oscillate between improved and worse perception depending on states of our internal systems. However, experiments looking for such modes have resulted in conflicting findings, particularly in those where the stimulus itself is not rhythmic. This paper seeks to take a comprehensive look at the effect and various experimental parameters which might generate these competing findings: in particular, the presentation of the stimulus to one ear or the other, the relevance of motor involvement, attentional demands, and memory: each of which are revealed to effect the consistency of this rhythmicity.
  
  The need the paper attempts to resolve is a critical one for the field. However, as presented, I remain unconvinced that the data would not be better interpreted as showing no consistent rhythmic mode effect.
  
  Strengths:
  
  The paper is strong in its experimental protocol and its comprehensive analysis which seeks to compare effects across several analysis types and slight experiment changes to investigate which parameters could effect the presence or absence of an effect of rhythmicity. The prescribed nature of its hypotheses and its manner to set out to test them is very clear which allows for a straightforward assessment of its results
  
  Weaknesses:
  
  The papers cited to justify a rhythmic mode are largely based on the processing of rhythmic stimuli. The authors assume the rhythmic mode to be the general default but its not so clear to me why this would be so. The task design seems better suited to a continuous vigilance mode task.
  
  Secondly, the analysis to detect a "rhythmic mode", assumes a total phase rest at noise onset which is highly implausible given standard nonlinear dynamical analysis of oscillator performance. It's not clear that a rhythmic mode (should it be applied in this task) would indeed generate a consistent phase as the analysis searches for.
  
  Thirdly, the number of statistical tests used here make trusting any single effect quite difficult and very few of the effects replicate more than once. I think the better would be interpreted as not confirming evidence for rhythmic mode processing in the ears.
  
  Comments on revised version:
  
  No further comments. The paper has much of the same issues that I expressed in the initial review but I don't think they can be addressed without a replication study which I appreciate is not always plausible.
  
  Review 3
5. Public_Reviews 14 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper presents results from four independent experiments, each of which tests for rhythmicity in auditory perception. The authors report rhythmic fluctuations in discrimination performance at frequencies between 2 and 6 Hz. The exact frequency depends on the ear and experimental paradigm, although some frequencies seem to be more common than others.
  
  Strengths:
  
  The first sentence in the abstract describes the state of the art perfectly: "Numerous studies advocate for a rhythmic mode of perception; however, the evidence in the context of auditory perception remains inconsistent". This is precisely why the data from the present study is so valuable. This is probably the study with the highest sample size (total of > 100 in 4 experiments) in the field. The analysis is very thorough and transparent, due to the comparison of several statistical approaches and simulations of their sensitivity. Each of the experiments differs from the others in a clearly defined experimental parameter, and the authors test how this impacts auditory rhythmicity, measured in pitch discrimination performance (accuracy, sensitivity, bias) of a target presented at various delays after noise onset.
  
  Weaknesses:
  
  (1) The authors find that the frequency of auditory perception changes between experiments. I think they could exploit differences between experiments better to interpret and understand the obtained results. These differences are very well described in the Introduction, but don't seem to be used for the interpretation of results. For instance, what does it mean if perceptual frequency changes from between- to within-trial pitch discrimination? Why did the authors choose this experimental manipulation? Based on differences between experiments, is there any systematic pattern in the results that allows conclusions about the roles of different frequencies? I think the Discussion would benefit from an extension to cover this aspect.
  
  We believe that interpreting these differences remains difficult and a precise, detailed (and possibly mechanistic) interpretation is beyond the goal of the present study. The main goal of this study was to explore the consistency and variability of effects across variations of the experimental design and samples of participants. Interpreting specific effects, e.g. at particular frequencies, would make sense mostly if differences between experiments have been confirmed in a separate reproduction. Still, we do provide specific arguments for why differences in the outcome between different experiments, e.g. with and without explicit trial initialization by the participants, could be expected. See lines 91ff in the introduction and 786ff in the discussion.
  
  (2) The Results give the impression of clear-cut differences in relevant frequencies between experiments (e.g., 2 Hz in Experiment 1, 6 Hz in Exp 2, etc), but they might not be so different. For instance, a 6 Hz effect is also visible in Experiment 1, but it just does not reach conventional significance. The average across the three experiments is therefore very useful, and also seems to suggest that differences between experiments are not very pronounced (otherwise the average would not produce clear peaks in the spectrum). I suggest making this point clearer in the text.
  
  We have revised the conclusions to note that the present data do not support clear cut differences between experiments. For this reason we also refrain from detailed interpretations of specific effects, as suggested by this reviewer in point 1 above.
  
  (3) I struggle to understand the hypothesis that rhythmic sampling differs between ears. In most everyday scenarios, the same sounds arrive at both ears, and the time difference between the two is too small to play a role for the frequencies tested. If both ears operate at different frequencies, the effects of the rhythm on overall perception would then often cancel out. But if this is the case, why would the two ears have different rhythms to begin with? This could be described in more detail.
  
  This hypothesis was not invented by us, but in essence put forward in previous work. The study by Ho et al. CurrBiol 2017 has reported rhythmic effects at different frequencies in the left and right ears, and we here tried to reproduce these effects. One could speculate about an ear-difference based on studies reporting a right-ear advantage in specific listening tasks, and the idea that different time scales of rhythmic brain activity may be specifically prevail in the left and right cortical hemispheres; hence it does not seem improbable that there could be rhythmic effects in both ears at different frequencies. We note this in the introduction, l. 65ff.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The current study aims to shed light on why previous work on perceptual rhythmicity has led to inconsistent results. They propose that the differences may stem from conceptual and methodological issues. In a series of experiments, the current study reports perceptual rhythmicity in different frequency bands that differ between different ear stimulations and behavioral measures.
  
  The study suggests challenges regarding the idea of universal perceptual rhythmicity in hearing.
  
  Strengths:
  
  The study aims to address differences observed in previous studies about perceptual rhythmicity. This is important and timely because the existing literature provides quite inconsistent findings. Several experiments were conducted to assess perceptual rhythmicity in hearing from different angles. The authors use sophisticated approaches to address the research questions.
  
  Weaknesses:
  
  (1) Conceptional concerns:
  
  The authors place their research in the context of a rhythmic mode of perception. They also discuss continuous vs rhythmic mode processing. Their study further follows a design that seems to be based on paradigms that assume a recent phase in neural oscillations that subsequently influence perception (e.g., Fiebelkorn et al.; Landau & Fries). In my view, these are different facets in the neural oscillation research space that require a bit more nuanced separation. Continuous mode processing is associated with vigilance tasks (work by Schroeder and Lakatos; reduction of low frequency oscillations and sustained gamma activity), whereas the authors of this study seem to link it to hearing tasks specifically (e.g., line 694). Rhythmic mode processing is associated with rhythmic stimulation by which neural oscillations entrain and influence perception (also, Schroeder and Lakatos; greater low-frequency fluctuations and more rhythmic gamma activity). The current study mirrors the continuous rather than the rhythmic mode (i.e., there was no rhythmic stimulation), but even the former seems not fully fitting, because trials are 1.8 s short and do not really reflect a vigilance task. Finally, previous paradigms on phase-resetting reflect more closely the design of the current study (i.e., different times of a target stimulus relative to the reset of an oscillation). This is the work by Fiebelkorn et al., Landau & Fries, and others, which do not seem to be cited here, which I find surprising. Moreover, the authors would want to discuss the role of the background noise in resetting the phase of an oscillation, and the role of the fixation cross also possibly resetting the phase of an oscillation. Regardless, the conceptional mixture of all these facets makes interpretations really challenging. The phase-reset nature of the paradigm is not (or not well) explained, and the discussion mixes the different concepts and approaches. I recommend that the authors frame their work more clearly in the context of these different concepts (affecting large portions of the manuscript).
  
  Indeed, the paradigms used here and in many similar previous studies incorporate an aspect of phase-resetting, as the presentation of a background noisy may effectively reset ongoing auditory cortical processes. Studies trying to probe for rhythmicity in auditory perception in the absence any background noise have not shown any effect (Zoefel and Heil, 2013), perhaps because the necessary rhythmic processes along auditory pathways are only engaged when some sound is present. We now discuss these points, and also acknowledge the mentioned studies in the visual system; l. 57.
  
  (2) Methodological concerns:
  
  The authors use a relatively unorthodox approach to statistical testing. I understand that they try to capture and characterize the sensitivity of the different analysis approaches to rhythmic behavioral effects. However, it is a bit unclear what meaningful effects are in the study. For example, the bootstrapping approach that identifies the percentage of significant variations of sample selections is rather descriptive (Figures 5-7). The authors seem to suggest that 50% of the samples are meaningful (given the dashed line in the figure), even though this is rarely reached in any of the analyses. Perhaps >80% of samples should show a significant effect to be meaningful (at least to my subjective mind). To me, the low percentage rather suggests that there is not too much meaningful rhythmicity present.
  
  We note that there is no clear consensus on what fraction of experiments should be expected or how this way of quantifying effects should be precisely valued (l. 441ff). However, we now also clearly acknowledge in the discussion that the effective prevalence is not very high (l. 663).
  
  I suggest that the authors also present more traditional, perhaps multi-level, analyses: Calculation of spectra, binning, or single-trial analysis for each participant and condition, and the respective calculation of the surrogate data analysis, and then comparison of the surrogate data to the original data on the second (participant) level using t-tests. I also thought the statistical approach undertaken here could have been a bit more clearly/didactically described as well.
  
  We here realize that our description of the methods was possibly not fully clear. We do follow the strategy as suggested by this reviewer, but rather than comparing actual and surrogate data based on a parametric t-test, we compare these based on a non-parametric percentile-based approach. This has the advantage of not making specific (and possibly not-warranted) assumptions about the distribution of the data. We have revised the methods to clarify this, l. 332ff.
  
  The authors used an adaptive procedure during the experimental blocks such that the stimulus intensity was adjusted throughout. In practice, this can be a disadvantage relative to keeping the intensity constant throughout, because, on average, correct trials will be associated with a higher intensity than incorrect trials, potentially making observations of perceptual rhythmicity more challenging. The authors would want to discuss this potential issue. Intensity adjustments could perhaps contribute to the observed rhythmicity effects. Perhaps the rhythmicity of the stimulus intensity could be analyzed as well. In any case, the adaptive procedure may add variance to the data.
  
  We have added an analysis of task difficulty to the results (new section “Effects of adaptive task difficulty“) to address this. Overall we do not find systematic changes in task difficulty across participants for most of the experiments, but for sure one cannot rule out that this aspect of the design also affects the outcomes. Importantly, we relied on an adaptive task difficulty to actually (or hopefully) reduce variance in the data, by keeping the task-difficulty around a certain level. Give the large number of trials collected, not using such an adaptive produce may result in performance levels around chance or near ceiling, which would make impossible to detect rhythmic variations in behavior.
  
  Additional methodological concerns relate to Figure 8. Figures 8A and C seem to indicate that a baseline correction for a very short time window was calculated (I could not find anything about this in the methods section). The data seem very variable and artificially constrained in the baseline time window. It was unclear what the reader might take from Figure 8.
  
  This figure was intended mostly for illustration of the eye tracking data, but we agree that there is no specific key insight to be taken from this. We removed this.
  
  Motivation and discussion of eye-movement/pupillometry and motor activity: The dual task paradigm of Experiment 4 and the reasons for assessing eye metrics in the current study could have been better motivated. The experiment somehow does not fit in very well. There is recent evidence that eye movements decrease during effortful tasks (e.g., Contadini-Wright et al. 2023 J Neurosci; Herrmann & Ryan 2024 J Cog Neurosci), which appears to contradict the results presented in the current study. Moreover, by appealing to active sensing frameworks, the authors suggest that active movements can facilitate listening outcomes (line 677; they should provide a reference for this claim), but it is unclear how this would relate to eye movements. Certainly, a person may move their head closer to a sound source in the presence of competing sound to increase the signal-to-noise ratio, but this is not really the active movements that are measured here. A more detailed discussion may be important. The authors further frame the difference between Experiments 1 and 2 as being related to participants' motor activity. However, there are other factors that could explain differences between experiments. Self-paced trials give participants the opportunity to rest more (inter-trial durations were likely longer in Experiment 2), perhaps affecting attentional engagement. I think a more nuanced discussion may be warranted.
  
  We expanded the motivation of why self-pacing trials may effectively alter how rhythmic processes affect perception, and now also allude to attention and expectation related effects (l. 786ff). Regarding eye movements we now discuss the results in the light of the previously mentioned studies, but again refrain from a very detailed and mechanistic interpretation (l. 782).
  
  Discussion:
  
  The main data in Figure 3 showed little rhythmicity. The authors seem to glance over this fact by simply stating that the same phase is not necessary for their statistical analysis. Previous work, however, showed rhythmicity in the across-participant average (e.g., Fiebelkorn's and similar work). Moreover, one would expect that some of the effects in the low-frequency band (e.g., 2-4 Hz) are somewhat similar across participants. Conduction delays in the auditory system are much smaller than the 0.25-0.5 s associated with 2-4 Hz. The authors would want to discuss why different participants would express so vastly different phases that the across-participant average does not show any rhythmicity, and what this would mean neurophysiologically.
  
  We now discussion the assumptions and implications of similar or distinct phases of rhythmic processes within and between participants (l. 695ff). In particular we note that different origins of the underlying neurophysiological processes eventually may suggest that such assumptions are or a not warranted.
  
  An additional point that may require more nuanced discussion is related to the rhythmicity of response bias versus sensitivity. The authors could discuss what the rhythmicity of these different measures in different frequency bands means, with respect to underlying neural oscillations.
  
  We expanded discussion to interpret what rhythmic changes in each of the behavioral metric could imply (l. 706ff).
  
  Figures:
  
  Much of the text in the figures seems really small. Perhaps the authors would want to ensure it is readable even for those with low vision abilities. Moreover, Figure 1A is not as intuitive as it could be and may perhaps be made clearer. I also suggest the authors discuss a bit more the potential monoaural vs binaural issues, because the perceptual rhythmicity is much slower than any conduction delays in the auditory system that could lead to interference.
  
  We tried to improve the font sizes where possible, and discuss the potential monaural origins as suggested by other reviewers.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The finding of rhythmic activity in the brain has, for a long time, engendered the theory of rhythmic modes of perception, that humans might oscillate between improved and worse perception depending on states of our internal systems. However, experiments looking for such modes have resulted in conflicting findings, particularly in those where the stimulus itself is not rhythmic. This paper seeks to take a comprehensive look at the effect and various experimental parameters which might generate these competing findings: in particular, the presentation of the stimulus to one ear or the other, the relevance of motor involvement, attentional demands, and memory: each of which are revealed to effect the consistency of this rhythmicity.
  
  The need the paper attempts to resolve is a critical one for the field. However, as presented, I remain unconvinced that the data would not be better interpreted as showing no consistent rhythmic mode effect. It lacks a conceptual framework to understand why effects might be consistent in each ear but at different frequencies and only for some tasks with slight variants, some affecting sensitivity and some affecting bias.
  
  Strengths:
  
  The paper is strong in its experimental protocol and its comprehensive analysis, which seeks to compare effects across several analysis types and slight experiment changes to investigate which parameters could affect the presence or absence of an effect of rhythmicity. The prescribed nature of its hypotheses and its manner of setting out to test them is very clear, which allows for a straightforward assessment of its results
  
  Weaknesses:
  
  There is a weakness throughout the paper in terms of establishing a conceptual framework both for the source of "rhythmic modes" and for the interpretation of the results. Before understanding the data on this matter, it would be useful to discuss why one would posit such a theory to begin with. From a perceptual side, rhythmic modes of processing in the absence of rhythmic stimuli would not appear to provide any benefit to processing. From a biological or homeostatic argument, it's unclear why we would expect such fluctuations to occur in such a narrow-band way when neither the stimulus nor the neurobiological circuits require it.
  
  We believe that the framework for why there may be rhythmic activity along auditory pathways that shapes behavioral outcomes has been laid out in many previous studies, prominently here (Schroeder et al., 2008; Schroeder and Lakatos, 2009; Obleser and Kayser, 2019). Many of the relevant studies are cited in the introduction, which is already rather long given the many points covered in this study.
  
  Secondly, for the analysis to detect a "rhythmic mode", it must assume that the phase of fluctuations across an experiment (i.e., whether fluctuations are in an up-state or down-state at onset) is constant at stimulus onset, whereas most oscillations do not have such a total phase-reset as a result of input. Therefore, some theoretical positing of what kind of mechanism could generate this fluctuation is critical toward understanding whether the analysis is well-suited to the studied mechanism.
  
  In line with this and previous comments (by reviewer 2) we have expanded the discussion to consider the issue of phase alignment (l. 695ff).
  
  Thirdly, an interpretation of why we should expect left and right ears to have distinct frequency ranges of fluctuations is required. There are a large number of statistical tests in this paper, and it's not clear how multiple comparisons are controlled for, apart from experiment 4 (which specifies B&H false discovery rate). As such, one critical method to identify whether the results are not the result of noise or sample-specific biases is the plausibility of the finding. On its face, maintaining distinct frequencies of perception in each ear does not fit an obvious conceptual framework.
  
  Again this point was also noted by another reviewer and we expanded the introduction and discussion in this regard (l. 65ff).
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) An update of the AR-surrogate method has recently been published (https://doi.org/10.1101/2024.08.22.609278). I appreciate that this is a lot of work, and it is of coursee up to the authors, but given the higher sensitivity of this method, it might be worth applying it to the four datasets described here.
  
  Reading this article we note that our implementation of the AR-surrogate method was essentially as suggested here, and not as implemented by Brookshire. In fact we had not realized that Brookshire had apparently computed the spectrum based on the group-average data. As explained in the Methods section, as now clarified even better, we compute for each participant the actual spectrum of this participant’s data, and a set of surrogate spectra. We then perform a group-average of both to compute the p-value of the actual group-average based on the percentile of the distribution of surrogate averages. This send step differs from Harris & Beale, which used a one-sided t-test. The latter is most likely not appropriate in a strict statistical sense, but possibly more powerful for detecting true results compared to the percentile-based approach that we used (see l. 332ff).
  
  (2) When results for the four experiments are reported, a reminder for the reader of how these experiments differ from each other would be useful.
  
  We have added this in the Results section.
  
  "considerable prevalence of differences around 4Hz, with dual‐task requirements leading to stronger rhythmicity in perceptual sensitivity". There is a striking similarity to recently published data (https://doi.org/10.1101/2024.08.10.607439 ) demonstrating a 4-Hz rhythm in auditory divided attention (rather than between modalities as in the present case). This could be a useful addition to the paragraph.
  
  We have added a reference to this preprint, and additional previous work pointing in the same direction mentioned in there.
  
  (3) There are two typos in the Introduction: "related by different from the question", and below, there is one "presented" too much.
  
  These have been fixed.
  
  Reviewer #3 (Recommendations for the authors):
  
  My major suggestion is that these results must be replicated in a new sample. I understand this is not simple to do and not always possible, but at this point, no effect is replicated from one experiment to the next, despite very small changes in protocol (especially experiment 1 vs 2). It's therefore very difficult to justify explaining the different effects as real as opposed to random effects of this particular sample. While the bootstrapping effects show the level of consistency of the effect within the sample studied, it can not be a substitute for a true replication of the results in a new sample.
  
  We agree that only an independent replication can demonstrate the robustness of the results. We do consider experiment 1 a replication test of Ho et al. CurrBiol 2017, which results in different results than reported there. But more importantly, we consider the analysis of ‘reproducibility’ by simulating participant samples a key novelty of the present work, and want to emphasize this over the within-study replication of the same experiment. In fact, in light of the present interpretation of the data, even a within-study replication would most likely not offer a clear-cut answer.
  
  As I said in the public review, the interpretation of the results, and of why perceptual cycles in arhythmic stimuli could be a plausible theory to begin with, is lacking. A conceptual framework would vastly improve the impact and understanding of the results.
  
  We tried to strengthen the conceptual framework in the introduction. We believe that this is in large provided by previous work, and the aim of the present study was to explore the robustness of effects and not to suggest and discover novel effects.
  
  Minor comments:
  
  (1) The authors adapt the difficulty as a function of performance, which seems to me a strange choice for an experiment that is analyzing the differences in performance across the experiment. Could you add a sentence to discuss the motivation for this choice?
  
  We now mention the rationale in the Methods section and in a new section of the Results. There we also provide additional analyses on this parameter.
  
  (2) The choice to plot the p-values as opposed to the values of the actual analysis feels ill-advised to me. It invites comparison across analyses that isn't necessarily fair. It would be more informative to plot the respective analysis outputs (spectral power, regression, or delta R2) and highlight the windows of significance and their overlap across analyses. In my opinion, this would be more fair and accurate depiction of the analyses as they are meant to be used.
  
  We do disagree. As explained in the Methods (l. 374ff): “(Showing p-values) … allows presenting the results on a scale that can be directly compared between analysis approaches, metrics, frequencies and analyses focusing on individual ears or the combined data. Each approach has a different statistical sensitivity, and the underlying effect sizes (e.g. spectral power) vary with frequency for both the actual data and null distribution. As a result, the effect size reaching statistical significance varies with frequency, metrics and analyses.”
  
  The fact that the level of power (or R2 or whatever metric we consider) required to reach significance differs between analyses (one ear, both ears), metrics (d-prime, bias, RT) and between analyses approaches makes showing the results difficult, as we would need a separate panel for each of those. This would multiply the number of panels required e.g. for Figure 4 by 3, making it a figure with 81 axes. Also neither the original quantities of each analysis (e.g. spectral power) nor the p-values that we show constitute a proper measure of effect size in a statistical sense. In that sense, neither of these is truly ideal for comparing between analyses, metrics etc.
  
  We do agree thought that many readers may want to see the original quantification and thresholds for statistical significance. We now show these in an exemplary manner for the Binned analysis of Experiment 1, which provides a positive result and also is an attempt to replicate the findings by Ho et al 2017. This is shown in new Figure 5.
  
  (3) Typo in line 555 (+ should be plus minus).
  
  (4) Typo in line 572: "Comparison of 572 blocks with minus dual task those without"
  
  (5) Typo in line 616: remove "one".
  
  (6) Line 666 refers to effects in alpha band activity, but it's unclear what the relationship is to the authors' findings, which peak around 6 Hz, lower than alpha (~10 Hz).
  
  (7) Line 688 typo, remove "amount of".
  
  These points have been addressed.
  
  (8) Oculomotor effect that drives greater rhythmicity at 3-4 Hz. Did the authors analyze the eye movements to see if saccades were also occurring at this rate? It would be useful to know if the 3-4 Hz effect is driven by "internal circuitry" in the auditory system or by the typical rate of eye movement.
  
  A preliminary analysis of eye movement data was in previous Figure 8, which was removed on the recommendation of another review. This showed that the average saccade rate is about 0.01 saccade /per trial per time bin, amounting to on average less than one detected saccade per trial. Hence rhythmicity in saccades is unlikely to explain rhythmicity in behavioral data at the scale of 34Hz. We now note this in the Results.
  
  Obleser J, Kayser C (2019) Neural Entrainment and Attentional Selection in the Listening Brain. Trends Cogn Sci 23:913-926.
  
  Schroeder CE, Lakatos P (2009) Low-frequency neuronal oscillations as instruments of sensory selection. Trends Neurosci 32:9-18.
  
  Schroeder CE, Lakatos P, Kajikawa Y, Partan S, Puce A (2008) Neuronal oscillations and visual amplification of speech. Trends Cogn Sci 12:106-113.
  
  Zoefel B, Heil P (2013) Detection of Near-Threshold Sounds is Independent of EEG Phase in Common Frequency Bands. Front Psychol 4:262.
  
  AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.17.628892v2
www.biorxiv.org www.biorxiv.org

Engineering NIR-Sighted Bacteria

5
1. Public_Reviews 14 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This important study establishes bathy phytochromes, a unique class of bacterial photoreceptors that respond to near-infrared light (NIR), as versatile tools for bacterial optogenetics. NIR light is a key control signal in optogenetics due to its deep tissue penetration and the ability to combine with existing red- and blue-light sensitive systems, but thus far, NIR-activated proteins have been poorly characterized. The strength of evidence is convincing, with comprehensive in vitro characterization, modular design strategies, and validation across different hosts, supporting the versatility and potential for these tools in biotechnological applications. This study should advance the fields of optogenetics and photobiology and inspire future work.
 
 Summary
2. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This is an interesting study characterizing and engineering so-called bathy phytochromes, i.e. those that respond to near infrared (NIR) light in the ground state, for optogenetic control of bacterial gene expression. Previously, the authors have developed a structure-guided approach to functionally link several light responsive protein domains to the signaling domain of the histidine kinase FixL, which ultimately controls gene expression. Here, the authors use the same strategy to link bathy phytochrome light responsive domains to FixL, resulting in sensors of NIR light. Interestingly, they also link these bathy phytochrome light sensing domains to signaling domains from the tetrathionate-sensing SHK TtrS and the toluene-sensing SHK TodS, demonstrating generality of their protein engineering approach more broadly across bacterial two-component systems.
 
 This is an exciting result that should inspire future bacterial sensor design. The authors go on to leverage this result to develop what is, to my knowledge, the first system for orthogonally controlling the expression of two separate genes in the same cell with NIR and Red light, a valuable contribution to the field.
 
 Finally, the authors reveal new details of the pH-dependent photocycle of bathy phytochromes and demonstrate their sensors work in the gut- and plant-relevant strains E. coli Nissle 1917 and A. tumefaciens.
 
 Strengths:
 
 The experiments are well founded, well executed, and rigorous.
 
 The manuscript is clearly written.
 
 The sensors developed exhibit large responses to light, making them valuable tools for ontogenetic applications.
 
 This study is a valuable contribution to photobiology and optogenetics.
 
 Weaknesses:
 
 As the authors note, the sensors are relatively insensitive to NIR light due to the rapid dark reversion process in bathy phytochromes. Though NIR light is generally non-phototoxic, one would expect this characteristic to be a limitation in some downstream applications where light intensities are not high (e.g. in vivo).
 
 Though they can be multiplexed with Red light sensors, these bathy phytochrome NIR sensors are more difficult to multiplex with other commonly used light sensors (e.g. blue) due to the broad light responsivity of the Pfr state. This challenge may be overcome by careful dosing of blue light, as the authors discuss, but other bacterial NIR sensing systems with less cross-talk may be preferred in some applications.
 
 Comments on revisions:
 
 My concerns have been addressed.
 
 Review 1
3. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 In this manuscript, Meier et al. engineer a new class of light-regulated two-component systems. These systems are built using bathy-bacteriophytochromes that respond to near-infrared (NIR) light. Through a combination of genetic engineering and systematic linker optimization, the authors generate bacterial strains capable of selective and tunable gene expression in response to NIR stimulation. Overall, these results are an interesting expansion of the optogenetic toolkit into the NIR range. The cross-species functionality of the system, modularity, and orthogonality have the potential to make these tools useful for a range of applications.
 
 Strengths:
 
 (1) The authors introduce a novel class of near-infrared light-responsive two-component systems in bacteria, expanding the optogenetic toolbox into this spectral range.
 
 (2) Through engineering and linker optimization, the authors achieve specific and tunable gene expression, with minimal cross-activation from red light in some cases.
 
 (3) The authors show that the engineered systems function robustly in multiple bacterial strains, including laboratory E. coli, the probiotic E. coli Nissle 1917, and Agrobacterium tumefaciens.
 
 (4) The combination of orthogonal two-component systems can allow for simultaneous and independent control of multiple gene expression pathways using different wavelengths of light.
 
 (5) The authors explore the photophysical properties of the photosensors, investigating how environmental factors such as pH influence light sensitivity.
 
 Comments on revisions:
 
 The authors have addressed all my prior concerns.
 
 Review 2
4. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This paper by Meier et al introduces a new optogenetic module for regulation of bacterial gene expression based on "bathy-BphP" proteins. Their paper begins with a careful characterization of kinetics and pH dependence of a few family members, followed by extensive engineering to produce infrared-regulated transcriptional systems based on the authors' previous design of the pDusk and pDERusk systems, and closing with characterization of the systems in bacterial species relevant for biotechnology.
 
 Strengths:
 
 The paper is important from the perspective of fundamental protein characterization, since bathy-BphPs are relatively poorly characterized compared to their phytochrome and cyanobacteriochrome cousins. It is also important from a technology development perspective: the optogenetic toolbox currently lacks infrared-stimulated transcriptional systems. Infrared light offers two major advantages: it can be multiplexed with additional tools, and it can penetrate into deep tissues with ease relative to the more widely used blue light activated systems. The experiments are performed carefully and the manuscript is well written.
 
 Weaknesses:
 
 Some of the light-inducible responses described in this compelling paper are complex and difficult to rationalize, such as the dependence of light responses on linker length and differences in responses observed from the bathy-BphPs in isolation versus strains in which they are multiplexed. Nevertheless, the authors should be commended for carrying out rigorous experiments and reporting these results accurately. These are minor weaknesses in an overall very strong paper.
 
 Review 3
5. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This is an interesting study characterizing and engineering so-called bathy phytochromes, i.e., those that respond to near infrared (NIR) light in the ground state, for optogenetic control of bacterial gene expression. Previously, the authors have developed a structure-guided approach to functionally link several light-responsive protein domains to the signaling domain of the histidine kinase FixL, which ultimately controls gene expression. Here, the authors use the same strategy to link bathy phytochrome light-responsive domains to FixL, resulting in sensors of NIR light. Interestingly, they also link these bathy phytochrome light-sensing domains to signaling domains from the tetrathionate-sensing SHK TtrS and the toluene-sensing SHK TodS, demonstrating the generality of their protein engineering approach more broadly across bacterial two-component systems.
 
 This is an exciting result that should inspire future bacterial sensor design. They go on to leverage this result to develop what is, to my knowledge, the first system for orthogonally controlling the expression of two separate genes in the same cell with NIR and Red light, a valuable contribution to the field.
 
 Finally, the authors reveal new details of the pH-dependent photocycle of bathy phytochromes and demonstrate that their sensors work in the gut - and plant-relevant strains E. coli Nissle 1917 and A. tumefaciens.
 
 Strengths:
 
 (1) The experiments are well-founded, well-executed, and rigorous.
 
 (2) The manuscript is clearly written.
 
 (3) The sensors developed exhibit large responses to light, making them valuable tools for ontogenetic applications.
 
 (4) This study is a valuable contribution to photobiology and optogenetics.
 
 We thank the reviewer for the positive verdict on our manuscript.
 
 Weaknesses:
 
 (1) As the authors note, the sensors are relatively insensitive to NIR light due to the rapid dark reversion process in bathy phytochromes. Though NIR light is generally non-phototoxic, one would expect this characteristic to be a limitation in some downstream applications where light intensities are not high (e.g., in vivo).
 
 We principally concur with this reviewer’s assessment that delivery of light (of any color) into living tissue can be severely limited by absorption, reflection, and scattering. That notwithstanding, at least two considerations suggest that in-vivo deployment of the pNIRusk setups we presently advance may be feasible.
 
 First, while the pNIRusk setups are indeed less light-sensitive compared to, e.g., our earlier redlight-responsive pREDusk and pDERusk setups (see Meier et al. Nat Commun 2024), we note that the overall light fluences required for triggering them are in the range of tens of µW per cm2. By contrast, optogenetic experiments in vivo, in particular in the neurosciences, often employ light area intensities on the order of mW per cm2 and above. Put another way, compared to the optogenetic tools used in these experiments, the pNIRusk setups are actually quite sensitive to light.
 
 Second, sensitivity to NIR light brings the advantage of superior tissue penetration, see data reported by Weissleder Nat Biotech 2001 and Ash et al. Lasers Med Sci 2017 (both papers are cited in our manuscript). Based on these data, the intensity of blue light (450 nm) therefore falls off 5-10 times more strongly with penetration depth than that of NIR light (800 nm).
 
 We have added a brief treatment of these aspects in the Discussion section.
 
 (2) Though they can be multiplexed with Red light sensors, these bathy phytochrome NIR sensors are more difficult to multiplex with other commonly used light sensors (e.g., blue) due to the broad light responsivity of the Pfr state. This challenge may be overcome by careful dosing of blue light, as the authors discuss, but other bacterial NIR sensing systems with less cross-talk may be preferred in some applications.
 
 The reviewer is correct in noting that, at least to a certain extent, the pNIRusk systems also respond to blue light owing to their Soret absorbance bands (see Fig. 1). That said, we note two points:
 
 First, a given photoreceptor that preferentially responds to certain wavelengths, e.g., 700 nm in the case of conventional bacterial phytochromes (BphP), generally absorbs shorter wavelengths to some degree as well. Absorption of these shorter wavelengths suffices for driving electronic and/or vibronic transitions of the chromophore to higher energy levels which often give rise to productive photochemistry and downstream signal transduction. Put another way, a certain response of sensory photoreceptors to shorter wavelengths is hence fully expected and indeed experimentally borne out, as for instance shown by Ochoa-Fernandez et al. in the so-called PULSE setup (Nat Meth 2020, doi: 10.1038/s41592-020-0868-y).
 
 Second, known BphPs share similar Pr and Pfr absorbance spectra. We therefore expect other BphP-based optogenetic setups to also respond to blue light to some degree. Currently, there are insufficient data to gauge whether individual BphPs systematically differ in their relative sensitivity to blue compared to red or NIR light. Arguably, pertinent experiments may be an interesting subject for future study.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this manuscript, Meier et al. engineer a new class of light-regulated two-component systems. These systems are built using bathy-bacteriophytochromes that respond to near-infrared (NIR) light. Through a combination of genetic engineering and systematic linker optimization, the authors generate bacterial strains capable of selective and tunable gene expression in response to NIR stimulation. Overall, these results are an interesting expansion of the optogenetic toolkit into the NIR range. The cross-species functionality of the system, modularity, and orthogonality have the potential to make these tools useful for a range of applications.
 
 Strengths:
 
 (1) The authors introduce a novel class of near-infrared light-responsive two-component systems in bacteria, expanding the optogenetic toolbox into this spectral range.
 
 (2) Through engineering and linker optimization, the authors achieve specific and tunable gene expression, with minimal cross-activation from red light in some cases.
 
 (3) The authors show that the engineered systems function robustly in multiple bacterial strains, including laboratory E. coli, the probiotic E. coli Nissle 1917, and Agrobacterium tumefaciens.
 
 (4) The combination of orthogonal two-component systems can allow for simultaneous and independent control of multiple gene expression pathways using different wavelengths of light.
 
 (5) The authors explore the photophysical properties of the photosensors, investigating how environmental factors such as pH influence light sensitivity.
 
 Weaknesses:
 
 (1) The expression of multi-gene operons and fluorescent reporters could impose a metabolic burden. The authors should present data comparing optical density for growth curves of engineered strains versus the corresponding empty-vector control to provide insight into the burden and overall impact of the system on host viability and growth.
 
 In response to this comment, we have recorded growth kinetics of bacteria harboring the pNIRusk-DsRed plasmids or empty vectors under both inducing (i.e., under NIR light) and noninducing conditions (i.e., darkness). We did not observe systematic differences in the growth kinetics between the different cultures, thus suggesting that under the conditions tested there is no adverse effect on cell viability.
 
 We include the new data in Suppl. Fig. 5c-d and refer to them in the main text.
 
 (2) The manuscript consistently presents normalized fluorescence values, but the method of normalization is not clear (Figure 2 caption describes normalizing to the maximal fluorescence, but the maximum fluorescence of what?). The authors should provide a more detailed explanation of how the raw fluorescence data were processed. In addition, or potentially in exchange for the current presentation, the authors should include the raw fluorescence values in supplementary materials to help readers assess the actual magnitude of the reported responses.
 
 We appreciate this valid comment and have altered the representation of the fluorescence data. All values for a given fluorescent protein (i.e., either DsRed or YPet) across all systems are now normalized to a single reference value, thus enabling direct comparison between experiments.
 
 (3) Related to the prior point, it would be useful to have a positive control for fluorescence that could be used to compare results across different figure panels.
 
 As all data are now normalized to the same reference value, direct comparison across all figures is enabled.
 
 (4) Real-time gene expression data are not presented in the current manuscript, but it would be helpful to include a time-course for some of the key designs to help readers assess the speed of response to NIR light.
 
 In response to this comment, we include in the revised manuscript induction kinetics of bacterial cultures bearing pNIRusk upon transfer to inducing NIR-light conditions. To this end, aliquots were taken at discrete timepoints, transcriptionally and translationally arrested, and analyzed for optical density and DsRed reporter fluorescence after allowing for chromophore maturation.
 
 We include the new data in Suppl. Fig. 5e and refer to them in the manuscript.
 
 Moreover, we note that the experiments in Agrobacterium tumefaciens used a luciferase reporter thus enabling the continuous monitoring of the light-induced expression kinetics. These data (unchanged in revision) are to be found in Suppl. Fig. 9.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This paper by Meier et al introduces a new optogenetic module for the regulation of bacterial gene expression based on "bathy-BphP" proteins. Their paper begins with a careful characterization of kinetics and pH dependence of a few family members, followed by extensive engineering to produce infrared-regulated transcriptional systems based on the authors' previous design of the pDusk and pDERusk systems, and closing with characterization of the systems in bacterial species relevant for biotechnology.
 
 Strengths:
 
 The paper is important from the perspective of fundamental protein characterization, since bathyBphPs are relatively poorly characterized compared to their phytochrome and cyanobacteriochrome cousins. It is also important from a technology development perspective: the optogenetic toolbox currently lacks infrared-stimulated transcriptional systems. Infrared light offers two major advantages: it can be multiplexed with additional tools, and it can penetrate into deep tissues with ease relative to the more widely used blue light-activated systems. The experiments are performed carefully, and the manuscript is well written.
 
 Weaknesses:
 
 My major criticism is that some information is difficult to obtain, and some data is presented with limited interpretation, making it difficult to obtain intuition for why certain responses are observed. For example, the changes in red/infrared responses across different figures and cellular contexts are reported but not rationalized. Extensive experiments with variable linker sequences were performed, but the rationale for linker choices was not clearly explained. These are minor weaknesses in an overall very strong paper.
 
 We are grateful for the positive take on our manuscript.
 
 Reviewer #1 (Recommendations for the authors):
 
 (1) As eLife is a broad audience journal, please define the Soret and Q-bands (line 125).
 
 We concur and have added labels in fig. 1a that designate the Soret and Q bands.
 
 (2) The initial (0) Ac design in Figure 2b is activated by NIR and Red light, albeit modestly. The authors state that this construct shows "constant reporter fluorescence, largely independent of illumination" (line 167). This language should be changed to reflect the fact that this Ac construct responds to both of these wavelengths.
 
 Agreed. We have amended the text accordingly.
 
 (3) pNIRusk Ac 0 appears to show a greater light response than pNIRusk Av -5. However, the authors claim that the former is not light-responsive and the latter is. This conclusion should be explained or changed.
 
 The assignment of pNIRusk Av-5 as light-responsive is based on the relative difference in reporter fluorescence between darkness and illumination with either red or NIR light. Although the overall fluorescence is much lower in Av-5 than for Av-0, the relative change upon illumination is much more pronounced. We add a statement to this effect to the text.
 
 (4) The authors state that "when combining DmDERusk-Str-YPet with AvTod+21-DsRed expression rose under red and NIR light, respectively, whereas the joint application of both light colors induced both reporter genes" (lines 258-261). In contrast, Figure 3c shows that application of both wavelengths of light results in exclusive activation of YPet expression. It appears the description of the data is wrong and must be corrected. That said, this error does not impact their conclusion that two separate target genes can be independently activated by NIR and red light.
 
 We thank the reviewer for catching this error which we have corrected in the revised manuscript.
 
 (5) Line 278: I don't agree with the authors' blanket statement that the use of upconversion nanoparticles is a "grave" limitation for NIR-light mediated activation of bacterial gene expression in vivo. The authors should either expound on the severity of the limitation or use more moderate language.
 
 We have replaced the word ‘grave’ by ‘potential’ and thereby toned down our wording.
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) Please include a discussion on the expected depth penetration of different light wavelengths. This is most relevant in the context of the discussion about how these NIR systems could be used with living therapeutics.
 
 Given the heterogeneity of biological tissue, it is challenging to state precise penetration depths for different wavelengths of light. That said, blue light for instance is typically attenuated by biological tissue around 5 to 10 times as strongly as near-infrared light is.
 
 We have expanded the Discussion chapter to cover these aspects.
 
 (2) It would be helpful for Figure 2C (or supplementary) to also include the response to blue light stimulation.
 
 We agree and have acquired pertinent data for the blue-light response. The new data are included in an updated Fig. 2c. Data acquired at varying NIR-light intensities, originally included in Fig. 2c, have been moved to Suppl. Fig. 5a-b.
 
 (3) In Figure 4A, data on the response of E. coli Nissle to blue and red light are missing. Including this would help identify whether the reduced sensitivity to non-NIR wavelengths observed in the E. coli lab strain is preserved in the probiotic background.
 
 In response to this comment, we have acquired pertinent data on E. coli Nissle. While the results were overall similar to those in the laboratory strain, the response to blue and NIR light was yet lower in the Nissle bacteria which stands to benefit optogenetic applications.
 
 We have updated Fig. 4a accordingly. For clarity, we only show the data for AvNIRusk in the main paper but have relegated the data on AcNIRusk to Suppl. Fig. 8. (Note that this has necessitated a renumbering of the subsequent Suppl. Figs.)
 
 (4) On many of the figures, there are thin gray lines that appear between the panels that it would be nice to eliminate because, in some cases, they cut through words and numbers.
 
 The grey lines likely arose from embedding the figures into the text document. In the typeset manuscript, which has become available on the eLife webpage in the meantime, there are no such lines. That said, we will carefully check throughout the submission/publishing/proofing process lest these lines reappear.
 
 (5) Page 7, line 155: "As not least seen" typo or awkward phrasing.
 
 We have restructured the sentence and thereby hopefully clarified the unclear phrasing.
 
 (6) Page 7, line 167: It does not appear to be the case that the initial pNIRusk designs show constant fluorescence that is largely independent of illumination. AcNIRusk shows an almost twofold change from dark to NIR. Reword this to avoid confusion.
 
 We concur with this comment, similar to reviewer #1’s remark, and have adjusted the text accordingly.
 
 (7) Page 8, line 174: Related to the previous point, AvNIRusk has one design that is very minimally light switchable (-5), so stating that six light switchable designs have been identified is also confusing.
 
 As stated in our response to reviewer #1 above, the assignment of AvNIRusk-5 as light-switchable is based on the relative fluorescence change upon illumination. We have added an explanation to the text.
 
 (8) Page 10, line 228-229: I was not able to find the data showing that expression levels were higher for the DmTtr systems than the pREDusk and pNIRusk setups. This may be an issue related to the normalization point. It was not clear to me how to compare these values.
 
 We apologize for the initially unclear representation of the data. In response to this reviewer’s general comments above, we have now normalized all fluorescence values to a single reference value, thus allowing their direct comparison.
 
 (9) Page 12, line 264: "finer-grained expression control can be exerted..." Either show data or adjust the language so that it is clear this is a prediction.
 
 True, we have replaced ‘can’ by ‘could’.
 
 (10) Page 25, line 590: CmpX13 cells have a reference that is given later, but it should be added where it first appears.
 
 Agreed, we have added the reference in the indicated place.
 
 (11) Page 25, line 592: define LB/Kan.
 
 We had already defined this abbreviation further up but, for clarity, we have added it again in the indicated position.
 
 (12) Page 40, line 946: "normalized by" rather than "to".
 
 We have implemented the requested change in the indicated and several other positions of the manuscript.
 
 (13) Figures 2C, 3C, and similar plots in the supplementary material would benefit from having a legend for the colors.
 
 We agree and have added pertinent legends to the corresponding main and supplementary figures.
 
 (14) As a reader, I had some trouble following all the acronyms. This is at the author's discretion, but I would eliminate ones that are not strictly essential (e.g. MTP for microtiter plate; I was unable to identify what "MCS" meant; look for other opportunities to remove acronyms).
 
 In the revised manuscript, we have defined the abbreviation ‘MCS’ (for ‘multiple-cloning site’) upon first occurrence. We have decided to retain the abbreviation ‘MTP’ in the text.
 
 (15) Could the authors briefly speculate on why A. tumefaciens activation with red light might occur?
 
 While we can but speculate as to the underlying reasons for the divergent red-light response in A. tumefaciens, we discuss possible scenarios below.
 
 Commonly, two-component systems (TCS) exhibit highly cooperative and steep responses to signal. As a consequence, even small differences in the intracellular amounts of phosphorylated and unphosphorylated response regulator (RR) can give to significantly changed gene-expression output. Put another way, the gene-expression output need not scale linearly with the extent of RR phosphorylation but, rather, is expected to show nonlinear dependence with pronounced thresholding effects.
 
 Differences in the pertinent RR levels can for instance arise from variations in the expression levels of the pNIRusk system components between E. coli and A. tumefaciens. Moreover, the two bacteria greatly differ in their two-component-system (TCS) repertoire. Although TCSs are commonly well insulated from each other, cross-talk with endogenous TCSs, even if limited, may cause changes in the levels of phosphorylated RR and hence gene-expression output. In a similar vein, the RR can also be phosphorylated and dephosphorylated non-enzymatically, e.g., by reaction with high-energy anhydrides (such as acetyl phosphate) and hydrolysis, respectively. Other potential origins for the divergent red-light response include differences in the strength of the promoters driving expression of the pNIRusk system components and the fluorescent/luminescent reporters, respectively.
 
 (16) It would be helpful for the authors to briefly explain why they needed to switch to luminescence from fluorescence for the A. tumeraciens studies.
 
 While there was no strict necessity to switch from the fluorescence-based system used in E. coli to a luminescence-based system in A. tumefaciens, we opted for luminescence based on prior experience with other Alphaproteobacteria (e.g., 10.1128/mSystems.00893-21), where luminescence offered significant advantages. Specifically, it provides essentially background-free signal detection and greater sensitivity for monitoring gene expression. In addition, as demonstrated in Suppl. Fig. 9c and d, the luminescence system enables real-time tracking of gene expression dynamics, which further supported its use in our experimental setup (see our response to reviewer #2’s general comments).
 
 (17) This is a very minor comment that the authors can take or leave, but I got hung up on the word "implement" when it appeared a few times in the manuscript because I tended to read it as "put a plan into place" rather than its other meaning.
 
 In the abstract, we have replaced one instance of the word ‘implement’ by ‘instrument’.
 
 (18) The authors should include the relevant constructs on AddGene or another public strainsharing service.
 
 We whole-heartedly subscribe to the idea of freely sharing research materials with fellow scientists. Therefore, we had already deposited the most relevant AvNIRusk in Addgene, even prior to the initial submission of the manuscript (accession number 235084). In the meantime, we have released the deposition, and the plasmid can be obtained from Addgene since May 15th of this year.
 
 Reviewer #3 (Recommendations for the authors):
 
 Suggestion for improvement:
 
 This paper relies heavily on variations in linker sequences to shift responses. I am familiar with prior work from the Moglich lab in which helical linkers were employed to shift responses in synthetic two-component systems, with interesting periodicity in responses with every 7 residues (as expected for an alpha helix) and inversion of responses at smaller linker shifts. There is no mention in this paper whether their current engineering follows a similar rationale, what types of linkers are employed (e.g. flexible vs helical), and whether there is an interpretation for how linker lengths alter responses. Can you explain what classes of linker sequences are used throughout Figures 2 and 3, and whether length or periodicity affects the outcome? This would be very helpful for readers who are new to this approach, or if the rationale here differs from the authors' prior work.
 
 The PATCHY approach employed at present followed a closely similar rationale as in our previous studies. That is, linkers were extended/shortened and varied in their sequence by recombining different fragments of the natural linkers of the parental receptors, i.e., the bacteriophytochrome and the FixL sensor histidine kinase, respectively. We have added a statement to this effect in the text and a reference to Suppl. Fig. 3 which illustrates the principal approach.
 
 Compared to our earlier studies, we isolated fewer receptor variants supporting light-regulated responses, despite covering a larger sequence space. Owing to the sparsity of the light-regulated variants, an interpretation of the linker properties and their correlation with light-regulated activity is challenging. Although doubtless unsatisfying from a mechanistic viewpoint, we therefore refrain from a pertinent discussion which would be premature and speculative at this point. As the reviewer raises a valid and important point, we have expanded the text by referring to our earlier studies and the observed dependence of functional properties on linker composition.
 
 It is sometimes difficult to intuit or rationalize the differences in red/IR sensitivity across closely related variants. An important example appears in Figure 3C vs 3B. I think the AvTod+21 in 3B should be the equivalent to the DsRed response in the second column of 3C (AvTod+21 + DmDERusk), except, of course, that the bacteria in 3C carry an additional plasmid for the DERusk system. However, in 3B, the response to red light is substantial - ~50% as strong as that for IR, whereas in 3C, red light elicits no response at all. What is the difference? The reason this is important is that the AvTod+21 and DMDERusk represent the best "orthogonal" red and infrared light responses, but this is not at all obvious from 3B, where AvTod+21 still causes a substantial (and for orthogonality, undesirable) response under red light. Perhaps subtle differences in expression level due to plasmid changes cause these differences in light responses? Could the authors test how the expression level affects these responses? The paper would be greatly improved if observations of the diverse red/IR responses could be rationalized by some design criteria.
 
 As noted above in our response to reviewer #2, we have now normalized all fluorescence readings to joint reference values, thus allowing a better comparison across experiments.
 
 The reviewer is correct in noting that upon multiplexing, the individual plasmid systems support lower fluorescence levels than when used in isolation. We speculate that the combination of two plasmids may affect their copy numbers (despite the use of different resistance markers and origins of replications) and hence their performance. Likewise, the cellular metabolism may be affected when multiple plasmids are combined. These aspects may well account for the absent red-light response in AvTod+21 in the multiplexing experiments which is – indeed – unexpected. As, at present, we cannot provide a clear rationalization for this effect, we recommend verifying the performance of the plasmid setups when multiplexing.
 
 The paper uses "red" and "infrared" to refer to ~624 nm and ~800 nm light, respectively. I wonder whether it might be possible to shift these peak wavelengths to obtain even better separation for the multiplexing experiments. Perhaps shifting the specific red wavelength could result in better separation between DERusk and AvTod systems, for example? Could the authors comment on this (maybe based on action spectra of their previously developed tools) or perhaps test a few additional stimulation wavelengths?
 
 The choice of illumination wavelengths used in these experiments is dictated by the LED setups available for illumination of microtiter plates. On the one hand, we are using an SMD (surface-mount device) three-color LED with a fixed wavelength of the red channel around 624 nm (see Hennemann et al., 2018). On the other hand, we are deploying a custom-built device with LEDs emitting at around 800 nm (see Stüven et al., 2019 and this work). Adjusting these wavelengths is therefore challenging, although without doubt potentially interesting.
 
 To address this reviewer comment, we have added a statement to the text that the excitation wavelengths may be varied to improve multiplexed applications.
 
 Additional minor comments:
 
 (1) Figure 2C: It would be very helpful to place a legend on the figure panel for what the colors indicate, since they are unique to this panel and non-intuitive.
 
 This comment coincides with one by reviewer #2, and we have added pertinent legends to this and related supplementary figures.
 
 (2) Figure 3C: it is not obvious which system uses DsRed and which uses YPet in each combination, since the text indicates that all combinations were cloned, and this is not clearly described in the legend. Is it always the first construct in the figure legend listed for DsRed and the second for YPet?
 
 For clarification, we have revised the x-axis labels in Fig. 3C. (And yes, it is as this reviewer surmises: the first of the two constructs harbored DsRed and the second one YPet.)
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.25.650650v2
www.biorxiv.org www.biorxiv.org

Foveated metamers of the early visual system

4
1. Public_Reviews 14 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This study provides important insights into how researchers can use perceptual metamers to formally explore the limits of visual representations at different processing stages. The framework is compelling and the data largely support the claims, subject to minor caveats.
 
 Summary
2. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 This is an interesting study on the nature of representations across the visual field. The question of how peripheral vision differs from foveal vision is a fascinating and important one. The majority of our visual field is extra-foveal yet our sensory and perceptual capabilities decline in pronounced and well-documented ways away from the fovea. Part of the decline is thought to be due to spatial averaging ('pooling') of features. Here, the authors contrast two models of such feature pooling with human judgments of image content. They use much larger visual stimuli than in most previous studies, and some sophisticated image synthesis methods to tease apart the prediction of the distinct models.
 
 More importantly, in so doing, the researchers thoroughly explore the general approach of probing visual representations through metamers-stimuli that are physically distinct but perceptually indistinguishable. The work is embedded within a rigorous and general mathematical framework for expressing equivalence classes of images and how visual representations influence these. They describe how image-computable models can be used to make predictions about metamers, which can then be compared to make inferences about the underlying sensory representations. The main merit of the work lies in providing a formal framework for reasoning about metamers and their implications, for comparing models of sensory processing in terms of the metamers that they predict, and for mapping such models onto physiology. Importantly, they also consider the limits of what can be inferred about sensory processing from metamers derived from different models.
 
 Overall, the work is of a very high standard and represents a significant advance over our current understanding of perceptual representations of image structure at different locations across the visual field. The authors do a good job of capturing the limits of their approach I particularly appreciated the detailed and thoughtful Discussion section and the suggestion to extend the metamer-based approach described in the MS with observer models. The work will have an impact on researchers studying many different aspects of visual function including texture perception, crowding, natural image statistics and the physiology of low- and mid-level vision.
 
 The main weaknesses of the original submission relate to the writing. A clearer motivation could have been provided for the specific models that they consider, and the text could have been written in a more didactic and easy to follow manner. The authors could also have been more explicit about the assumptions that they make.
 
 Comments following re-submission:
 
 Overall, I think the authors have done a satisfactory job of addressing most of the points I raised.
 
 There's one final issue which I think still needs better discussion.
 
 I think reviewer 2 articulated better than I have the point I was concerned about: the relationship between JNDs and metamers as depicted in the schematics and indeed in the whole conceptualization.
 
 I think the issue here is that there seems to be a conflating of two concepts- 'subthreshold' and 'metamer'-and I'm not convinced it is entirely unproblematic. It's true that two stimuli that cannot be discriminated from one another due to the physical differences being too small to detect reliably by the visual system are a form of metamer in the strict definition 'physically different, but perceptually the same'. However, I don't think this is the scientifically substantial notion of metamer that enabled insights into trichromacy. That form of metamerism is due to the principle of univariance in feature encoding, and involves conditions in which physically very different stimuli are mapped to one and the same point in sensory encoding space whether or not there is any noise in the system. When I say 'physically very different' I mean different by a large enough amount that they would be far above threshold, potentially orders of magnitude larger than a JND if the system's noise properties were identical but the system used a different sensory basis set to measure them. This seems to be a very different kind of 'physically different, but perceptually the same'.
 
 I do think the notion of metamerism can obviously be very usefully extended beyond photoreceptors and photon absorptions. In the interesting case of texture metamers, what I think is meant is that stimuli would be discriminable if scrutinised in the fovea, but because they have the same statistics they are treated as equivalent. I think the discussion of this could still be clearly articulated in the manuscript. It would benefit from a more thorough discussion of the difference between metamerism and subthreshold, especially in the context of the Voronoi diagrams at the beginning.
 
 It needs to be made clear to the reader why it is that two stimuli that are physically similar (e.g., just spanning one of the edges in the diagram) can be discriminable, while at the same time, two stimuli that are very different (e.g., at opposite ends of a cell) can't.
 
 Do the cells include BOTH those sets of stimuli that cannot be discriminated just because of internal noise AND those that can't be discriminated because they are projected to literally the same point in the sensory encoding space? What are the strengths and limits of models that involve the strict binarization of sensory representations, and how can they be integrated with models dealing with continuous differences? These seem like important background concepts that ought to be included in either the introduction of discussion sections. In this context it might also be helpful to refer to the notion of 'visual equivalence' as described by:
 
 Ramanarayanan, G., Ferwerda, J., Walter, B., & Bala, K. (2007). Visual equivalence: towards a new standard for image fidelity. ACM Transactions on Graphics (TOG), 26(3), 76-es.
 
 Other than that, I congratulate the authors on a very interesting study, and look forward to reading the final version.
 
 Review 1
3. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors have improved clarity overall and have spoken to most of the issues raised by the reviewers. There are still two outstanding problems however, where issues raised during the review were inappropriately dismissed in the manuscript. These should be explicitly addressed as limitations to the results presented (no eye tracking), and early pilot experiments that informed the experiments as presented (pink noise) rather than brushed off as 'unnecessary' and 'would be uninformative'.
 
 Eye tracking:
 
 It is generally accepted that experiments testing stimuli presented at specific locations in peripheral vision require eye tracking to ensure that the stimulus is presented as expected, in particular, in the correct location. As I stated in the previous round of review, while a stimulus presentation time of 200ms does help eliminate some saccades, it does not eliminate the possibility that subjects were not fixating well during stimulus onset. I am also unclear what the authors mean by 'trained observer' in this context, though the authors state that an author subject in a different portion of the paper is an 'expert observer'. Does this mean the 'trained observers' are non-expert recruited subjects? Given the conditions tested differ from previous work (Freeman & Simoncelli, 2011) *these differences are a main contribution of the paper!* which DID include eye tracking in a subset of subjects, it is entirely possible to get similar results to this work in the context of non eye-tracking controlled stimulus presentation. The reasons now in the manuscript are not reasons that make eye tracking 'considered unnecessary'.
 
 I appreciate that the authors now state the lack of eye tracking explicitly, but believe the paper needs to at least state that this is a limitation of the results reported, and eyetracking being 'considered unnecessary' is unreasonable, nor a norm in this subfield.
 
 N=1: The authors now state clearly the limitations of a single subject in the manuscript, and state the expertise level of this subject.
 
 Large number of trials: The authors now address this and include an enumeration of the large number of trials.
 
 Simple Models / Physiology comparison: I support the choice to reduce claims regarding tight connections to physiology, and appreciate the explanation of the luminance model.
 
 Previous Work: I appreciate the author's changes to the introduction, both in discussing previous work and citation fixes.
 
 Blurred White, Pink Noise: While the authors now address pink noise, the explanation for such stimuli being expected to be uninformative is confusing to me. The manuscript now first states that pink noise is a natural choice, then claims it would be uninformative, while also stating in the rebuttal (not the manuscript) that they tried it and it indeed reduced the artifacts they note. The logic of the experiments indeed relies on finding the smallest critical scaling value, which is measured by subjects determining if a synthesis is similar or different to a target or second synth. A synthesis free from artifacts would surely affect the subjects responses and the smallest critical scaling measured.
 
 The statement that the authors experimented with pink noise early on and found this able to address the artifacts should be stated in the manuscript itself, not just in the rebuttal, and the blanket statement that this experiment would be 'uninformative' is incorrect. Surely this early pilot the authors mention in the rebuttal was informative to designing the experiments that appear in the final paper, and would be an informative experiment to include.
 
 Review 2
4. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public Review):
 
 This is an interesting study of the nature of representations across the visual field. The question of how peripheral vision differs from foveal vision is a fascinating and important one. The majority of our visual field is extra-foveal yet our sensory and perceptual capabilities decline in pronounced and well-documented ways away from the fovea. Part of the decline is thought to be due to spatial averaging (’pooling’) of features. Here, the authors contrast two models of such feature pooling with human judgments of image content. They use much larger visual stimuli than in most previous studies, and some sophisticated image synthesis methods to tease apart the prediction of the distinct models.
 
 More importantly, in so doing, the researchers thoroughly explore the general approach of probing visual representations through metamers-stimuli that are physically distinct but perceptually indistinguishable. The work is embedded within a rigorous and general mathematical framework for expressing equivalence classes of images and how visual representations influence these. They describe how image-computable models can be used to make predictions about metamers, which can then be compared to make inferences about the underlying sensory representations. The main merit of the work lies in providing a formal framework for reasoning about metamers and their implications, for comparing models of sensory processing in terms of the metamers that they predict, and for mapping such models onto physiology. Importantly, they also consider the limits of what can be inferred about sensory processing from metamers derived from different models.
 
 Overall, the work is of a very high standard and represents a significant advance over our current understanding of perceptual representations of image structure at different locations across the visual field. The authors do a good job of capturing the limits of their approach and I particularly appreciated the detailed and thoughtful Discussion section and the suggestion to extend the metamer-based approach described in the MS with observer models. The work will have an impact on researchers studying many different aspects of visual function including texture perception, crowding, natural image statistics, and the physiology of low- and mid-level vision.
 
 The main weaknesses of the original submission relate to the writing. A clearer motivation could have been provided for the specific models that they consider, and the text could have been written in a more didactic and easy-to-follow manner. The authors could also have been more explicit about the assumptions that they make.
 
 Thank you for the summary. We appreciate the positives noted above. We address the weaknesses point by point below.
 
 Reviewer #2 (Public Review):
 
 Summary
 
 This paper expands on the literature on spatial metamers, evaluating different aspects of spatial metamers including the effect of different models and initialization conditions, as well as the relationship between metamers of the human visual system and metamers for a model. The authors conduct psychophysics experiments testing variations of metamer synthesis parameters including type of target image, scaling factor, and initialization parameters, and also compare two different metamer models (luminance vs energy). An additional contribution is doing this for a field of view larger than has been explored previously
 
 General Comments
 
 Overall, this paper addresses some important outstanding questions regarding comparing original to synthesized images in metamer experiments and begins to explore the effect of noise vs image seed on the resulting syntheses. While the paper tests some model classes that could be better motivated, and the results are not particularly groundbreaking, the contributions are convincing and undoubtedly important to the field. The paper includes an interesting Voronoi-like schematic of how to think about perceptual metamers, which I found helpful, but for which I do have some questions and suggestions. I also have some major concerns regarding incomplete psychophysical methodology including lack of eye-tracking, results inferred from a single subject, and a huge number of trials. I have only minor typographical criticisms and suggestions to improve clarity. The authors also use very good data reproducibility practices.
 
 Thank you for the summary. We appreciate the positives noted above. We address the weaknesses point by point below.
 
 Specific Comments
 
 Experimental Setup
 
 Firstly, the experiments do not appear to utilize an eye tracker to monitor fixation. Without eye tracking or another manipulation to ensure fixation, we cannot ensure the subjects were fixating the center of the image, and viewing the metamer as intended. While the short stimulus time (200ms) can help minimize eye movements, this does not guarantee that subjects began the trial with correct fixation, especially in such a long experiment. While Covid-19 did at one point limit in-person eye-tracked experiments, the paper reports no such restrictions that would have made the addition of eye-tracking impossible. While such a large-scale experiment may be difficult to repeat with the addition of eye tracking, the paper would be greatly improved with, at a minimum, an explanation as to why eye tracking was not included.
 
 Addressed on pg. 25, starting on line 658.
 
 Secondly, many of the comparisons later in the paper (Figures 9,10) are made from a single subject. N=1 is not typically accepted as sufficient to draw conclusions in such a psychophysics experiment. Again, if there were restrictions limiting this it should be discussed. Also (P11) Is subject sub-00 is this an author? Other expert? A naive subject? The subject’s expertise in viewing metamers will likely affect their performance.
 
 Addressed on pg. 14, starting on line 308.
 
 Finally, the number of trials per subject is quite large. 13,000 over 9 sessions is much larger than most human experiments in this area. The reason for this should be justified.
 
 In general, we needed a large number of trials to fit full psychometric functions for stimuli derived for both models, with both types of comparison, both initializations, and over many target images. We could have eliminated some of these, but feel that having a consistent dataset across all these conditions is a strength of the paper.
 
 In addition to the sentence on pg. 14, line 318, a full enumeration of trials is now described on pg. 23, starting on line 580.
 
 Model
 
 For the main experiment, the authors compare the results of two models: a ’luminance model’ that spatially pools mean luminance values, and an ’energy model’ that spatially pools energy calculated from a multi-scale pyramid decomposition. They show that these models create metamers that result in different thresholds for human performance, and therefore different critical scaling parameters, with the basic luminance pooling model producing a scaling factor 1/4 that of the energy model. While this is certain to be true, due to the luminance model being so much simpler, the motivation for the simple luminance-based model as a comparison is unclear.
 
 The use of simple models is now addressed on pg. 3, starting on line 98, as well as the sentence starting on pg. 4 line 148: the luminance model is intended as the simplest possible pooling model.
 
 The authors claim that this luminance model captures the response of retinal ganglion cells, often modeled as a center-surround operation (Rodieck, 1964). I am unclear in what aspect(s) the authors claim these center-surround neurons mimic a simple mean luminance, especially in the context of evidence supporting a much more complex role of RGCs in vision (Atick & Redlich, 1992). Why do the authors not compare the energy model to a model that captures center-surround responses instead? Do the authors mean to claim that the luminance model captures only the pooling aspects of an RGC model? This is particularly confusing as Figures 6 and 9 show the luminance and energy models for original vs synth aligning with the scaling of Midget and Parasol RGCs, respectively. These claims should be more clearly stated, and citations included to motivate this. Similarly, with the energy model, the physiological evidence is very loosely connected to the model discussed.
 
 We have removed the bars showing potential scaling values measured by electrophysiology in the primate visual system and attempted to clarify our language around the relationship between these models and physiology. Our metamer models are only loosely connected to the physiology, and we’ve decided in revision not to imply any direct connection between the model parameters and physiological measurements. The models should instead be understood as loosely inspired by physiology, but not as a tool to localize the representation (as was done in the Freeman paper).
 
 The physiological scaling values are still used as the mean of the priors on the critical scaling value for model fitting, as described on pg. 27, starting on line 698.
 
 Prior Work:
 
 While the explorations in this paper clearly have value, it does not present any particularly groundbreaking results, and those reported are consistent with previous literature.The explorations around critical eccentricity measurement have been done for texture models (Figure 11) in multiple papers (Freeman 2011, Wallis, 2019, Balas 2009). In particular, Freeman 20111 demonstrated that simpler models, representing measurements presumed to occur earlier in visual processing need smaller pooling regions to achieve metamerism. This work’s measurements for the simpler models tested here are consistent with those results, though the model details are different. In addition, Brown, 2023 (which is miscited) also used an extended field of view (though not as large as in this work). Both Brown 2023, and Wallis 2019 performed an exploration of the effect of the target image. Also, much of the more recent previous work uses color images, while the author’s exploration is only done for greyscale.
 
 We were pleased to find consistency of our results with previous studies, given the (many) differences in stimuli and experimental conditions (especially viewing angle), while also extending to new results with the luminance model, and the effects of initialization. Note that only one of the previous studies (Freeman and Simoncelli, 2011) used a pooled spectral energy model. Moreover, of the previous studies, only one (Brown et al., 2023) used color images (we have corrected that citation - thanks for catching the error).
 
 Discussion of Prior Work:
 
 The prior work on testing metamerism between original vs. synthesized and synthesized vs. synthesized images is presented in a misleading way. Wallis et al.’s prior work on this should not be a minor remark in the post-experiment discussion. Rather, it was surely a motivation for the experiment. The text should make this clear; a discussion of Wallis et al. should appear at the start of that section. The authors similarly cite much of the most relevant literature in this area as a minor remark at the end of the introduction (P3L72).
 
 The large differences we observed between comparison types (original vs synthesized, compared to synthesized vs synthesized) surprised us. Understanding such difference was not a primary motivation for the work, but it is certainly an important component of our results. In the introduction, we thought it best to lay out the basic logic of the metamer paradigm for foveated vision before mentioning the complications that are introduced in both the Wallis and Brown papers (paragraph beginning p. 3, line 109). Our results confirm and bolster the results of both of those earlier works, which are now discussed more fully in the Introduction (lines 109 and following).
 
 White Noise: The authors make an analogy to the inability of humans to distinguish samples of white noise. It is unclear however that human difficulty distinguishing samples of white noise is a perceptual issue- It could instead perhaps be due to cognitive/memory limitations. If one concentrates on an individual patch one can usually tell apart two samples. Support for these difficulties emerging from perceptual limitations, or a discussion of the possibility of these limitations being more cognitive should be discussed, or a different analogy employed.
 
 We now note the possibility of cognitive limits on pg. 8, starting on line 243, as well as pg. 22, line 571. The ability of observers to distinguish samples of white noise is highly dependent on display conditions. A small patch of noise (i.e., large pixels, not too many) can be distinguished, but a larger patch cannot, especially when presented in the periphery. This is more generally true for textures (as shown in Ziemba and Simoncelli (2021)). Samples of white noise at the resolution used in our study are indistinguishable.
 
 Relatedly, in Figure 14, the authors do not explain why the white noise seeds would be more likely to produce syntheses that end up in different human equivalence classes.
 
 In figure 14, we claim that white noise seeds are more likely to end up in the same human equivalence classes than natural image seeds. The explanation as to why we think this may be the case is now addressed on pg. 19, starting on line 423.
 
 It would be nice to see the effect of pink noise seeds, which mirror the power spectrum of natural images, but do not contain the same structure as natural images - this may address the artifacts noted in Figure 9b.
 
 The lack of pink noise seeds is now addressed on pg. 19, starting on line 429.
 
 Finally, the authors note high-frequency artifacts in Figure 4 & P5L135, that remain after syntheses from the luminance model. They hypothesize that this is due to a lack of constraints on frequencies above that defined by the pooling region size. Could these be addressed with a white noise image seed that is pre-blurred with a low pass filter removing the frequencies above the spatial frequency constrained at the given eccentricity?
 
 The explanation for this is similar to the lack of pink noise seeds in the previous point: the goal of metamer synthesis is model testing, and so for a given model, we want to find model metamers that result in the smallest possible critical scaling value. Taking white noise seed images and blurring them will almost certainly remove the high frequencies visible in luminance metamers in figure 4 and thus result in a larger critical scaling value, as the reviewer points out. However, the logic of the experiments requires finding the smallest critical scaling value, and so these model metamers would be uninformative. In an early stage of the project, we did indeed synthesize model metamers using pink noise seeds, and observed that the high frequency artifacts were less prominent.
 
 Schematic of metamerism: Figures 1,2,12, and 13 show a visual schematic of the state space of images, and their relationship to both model and human metamers. This is depicted as a Voronoi diagram, with individual images near the center of each shape, and other images that fall at different locations within the same cell producing the same human visual system response. I felt this conceptualization was helpful. However, implicitly it seems to make a distinction between metamerism and JND (just noticeable difference). I felt this would be better made explicit. In the case of JND, neighboring points, despite having different visual system responses, might not be distinguishable to a human observer.
 
 Thanks for noting this – in general, metamers are subthreshold, and for the purpose of the diagram, we had to discretize the space showing metameric regions (Voronoi regions) around a set of stimuli. We’ve rewritten the captions to explain this better. We address the binary subthreshold nature of the metamer paradigm in the discussion section (pg. 19, line 438).
 
 In these diagrams and throughout the paper, the phrase ’visual stimulus’ rather than ’image’ would improve clarity, because the location of the stimulus in relation to the fovea matters whereas the image can be interpreted as the pixels displayed on the computer.
 
 We agree and have tried to make this change, describing this choice on pg. 3 line 73.
 
 Other
 
 The authors show good reproducibility practices with links to relevant code, datasets, and figures.
 
 Reviewer #1 (Recommendations For The Authors):
 
 In its current form, I found the introduction to be too cursory. I felt that the article would benefit from a clearer motivation for the two models that are considered as the reader is left unclear why these particular models are of special scientific significance. The luminance model is intended to capture some aspects of retinal ganglion cells response characteristics and the spectral energy model is intended to capture some aspects of the primary visual cortex. However, one can easily imagine models that include the pooling of other kinds of features, and it would be helpful to get an idea of why these are not considered. Which aspects of processing in the retina and V1 are being considered and which are being left out, and why? Why not consider representations that capture even higher-order statistical structure than those covered by the spectral energy model (or even semantics)? I think a bit of rewriting with this in mind could improve the introduction.
 
 Along similar lines, I would have appreciated having the logic of the study explained more explicitly and didactically: which overarching research question is being asked, how it is operationalised in the models and experiments, and what are the predictions of the different models. Figures 2 and 3 are certainly helpful, but I felt further explanations would have made it easier for the reader to follow. Throughout, the writing could be improved by a careful re-reading with a view to making it easier to understand. For example, where results are presented, a sentence or two expanding on the implications would be helpful.
 
 I think the authors could also be more explicit about the assumptions they make. While these are obviously (tacitly) included in the description of the models themselves, it would be helpful to state them more openly. To give one example, when introducing the notion of critical scaling, on p.6 the authors state as if it is a self-evident fact that "metamers can be achieved with windows whose size is matched to that of the underlying visual neurons". This presumably is true only under particular conditions, or when specific assumptions about readout from populations of neurons are invoked. It would be good to identify and state such assumptions more directly (this is partly covered in the Discussion section ’The linking proposition underlying the metamer paradigm’, but this should be anticipated or moved earlier in the text).
 
 We agree that our introduction was too cursory and have reworked it. We have also backed off of the direct comparison to physiology and clarified that we chose these two as the simplest possible pooling models. We have also added sentences at the end of each result section attempting to summarize the implication (before discussing them fully in the discussion). Hopefully the logic and assumptions are now clearer.
 
 There are also some findings that warrant a more extensive discussion. For example, what is the broader implication of the finding that original vs. synthesised and synthesised vs. synthesised comparisons exhibit very different scaling values? Does this tell us something about internal visual representations, or is it simply capturing something about the stimuli?
 
 We believe this difference is a result of the stimuli that are used in the experiment and thus the synthesis procedure itself, which interacts with the model’s pooled image feature. We have attempted to update the relevant figures and discussions to clarify this, in the sections starting on pg 17 line 396 and pg. 19 line 417.
 
 At some points in the paper, a third model (’texture model’) creeps into the discussion, without much explanation. I assume that this refers to models that consider joint (rather than marginal) statistics of wavelet responses, as in the famous Portilla & Simoncelli texture model. However, it would be helpful to the reader if the authors could explain this.
 
 Addressed on pg. 3, starting on line 94.
 
 Minor corrections.
 
 Caption of Figure 3: ’top’ and ’bottom’ should be ’left’ and ’right’
 
 Line 177: ’smallest tested scaling values tested’. Remove one instance of ’tested’
 
 Line 212: ’the images-specific psychometric functions’ -> ’image-specific’
 
 Line 215: ’cloud-like pink noise’. It’s not literally pink noise, so I would drop this.
 
 Line 236: ’Importantly, these results cannot be predicted from the model, which gives no specific insight as to why some pairs are more discriminable than others’. The authors should specify what we do learn from the model if it fails to provide insight into why some image pairs are more discriminable than others.
 
 Figure 9: it might be helpful to include small insets with the ’highway’ and ’tiles’ source images to aid the reader in understanding how the images in 9B were generated.
 
 Table 1 placement should be after it is first referred to on line 258.
 
 In the Discussion section "Why does critical scaling depend on the comparison being performed", it would be helpful to consider the case where the two model metamers *are* distinguishable from each other even though each is indistinguishable from the target image. I would assume that this is possible (e.g., if the target image is at the midpoint between the two model images in image space and each of the stimuli is just below 1 JND away from the target). Or is this not possible for some reason?
 
 Regarding line 236: this specific line has been removed, and the discussion about this issue has all been consolidated in the final section of the discussion, starting on pg. 19 line 438.
 
 Regarding the final comment: this is addressed in the paragraph starting on pg. 16 line 386. To expand upon that: the situation laid out by the reviewer is not possible in our conceptualization, in which metamerism is transitive and image discriminability is binary. In order to investigate situations like the one laid out by the reviewer, one needs models whose representations have metric properties, i.e., which allow you to measure and reason about perceptual distance, which we refer to in the paragraph starting on pg. 20 line 460. We also note that this situation has not been observed in this or any other pooling model metamer study that we are aware of. All other minor changes have been addressed.
 
 Reviewer #2 (Recommendations For The Authors):
 
 Original image T should be marked in the Voronoi diagrams.
 
 Brown et al is miscited as 2021 should be ACM Transactions on Applied Perception 2023.
 
 Figure 3 caption: models are left and right, not top and bottom.
 
 Thanks, all of the above have been addressed.
 
 References
 
 BrownReral Encoding, in the Human Visual System. ACM Transactions on Applied Perception. 2023 Jan; 20(1):1–22.http://dx.doi.org/10.1145/356460, Dutell V, Walter B, Rosenholtz R, Shirley P, McGuire M, Luebke D. Efficient Dataflow Modeling of Periph-5, doi: 10.1145/3564605.
 
 Freeman Jdoi: 10.1038/nn.2889, Simoncelli EP. Metamers of the ventral stream. Nature Neuroscience. 2011 aug; 14(9):1195–1201..
 
 Ziemba CMnications. 2021 jul; 12(1)., Simoncelli EP. Opposing Effects of Selectivity and Invariance in Peripheral Vision. Nature Commu-https://doi.org/10.1038/s41467-021-24880-5, doi: 10.1038/s41467-021-24880-5.
 
 AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.05.18.541306v7
www.biorxiv.org www.biorxiv.org

Brain-wide arousal signals are segregated from movement planning in the superior colliculus

5
1. Public_Reviews 14 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This study presents a valuable finding relating to how the state of arousal is represented within the superior colliculus, a principal visuo-oculomotor structure. The main conclusion that the representation of arousal is segregated, and thus influences visual activity but not motor output, is incompletely supported by the evidence, but could be stronger if a specific concern relating to an alternative explanation for the dichotomy was addressed. The work will be of interest to sensory, motor, and cognitive neuroscientists.
 
 Summary
2. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.
 
 Comments on revised manuscript:
 
 The authors have done a very good job of responding to all of the reviewers' concerns.
 
 Review 1
3. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 Neurons in motor-related areas have increasingly shown to carry also other, non-motoric signals. This creates a problem of avoidance of interference between the motor and non-motor-related signals. This is a significant problem that likely affects many brain areas. The specific example studied here is interference between saccade-related activity and slow-changing arousal signals in the superior colliculus. The authors identify neuronal activity related to saccades and arousal. Identifying saccade-related activity is straightforward, but arousal-related activity is harder to identify. The authors first identify a potential neuronal correlate of arousal using PCA to identifying a component in the population activity corresponding to slow drift over the recording session. Next, they link this component to arousal by showing that the component is present across different brain areas (SC and PFC), and that it is correlated with pupil size, an external marker of arousal. Having identified an arousal-related component in SC, the authors show next that SC neurons with strong motor-related activity are less strongly affected by this arousal component (both SC and PFC). Lastly, they show that SC population activity pattern related to saccades and pupil size form orthogonal subspaces in the SC population.
 
 Strengths:
 
 A great strength of this research is the clear description of the problem, its relationship with the performed analysis and the interpretation of the results. the paper is very well written and easy to follow. An additional strength is the use of fairly sophisticated analysis using population activity.
 
 Weaknesses:
 
 (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.
 
 (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.
 
 (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?
 
 Comments on revised manuscript:
 
 I remain somewhat concerned that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see a more detailed discussion justifying the use pupil size alone (i.e., w/o other indicators such as RT) as indicative of fluctuations in general arousal that are causal to concomitant changes in SC activity. Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'.
 
 Other than this conceptual issue, I do not have major problems with the analysis per se.
 
 Review 2
4. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is low if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.
 
 Strengths:
 
 The paper is clear and well-written.
 
 Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.
 
 Weaknesses:
 
 The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.
 
 Of course, the general conclusion is that the motor neurons will not have the arousal signal. It's just the interpretation that is different in the sense that the lack of the arousal signal is due to a lack of visual sensitivity in the motor neurons.
 
 I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. Please also note that I do not mean the luminance transient associated with the target onset. I mean the luminance of the gray display. it is a source of light. if the pupil diameter changes, then the amount of light entering to the visually sensitive neurons also changes.
 
 Comments on revised manuscript:
 
 The authors have addressed my first primary comment. For the light comment, I'm still not sure they addressed it. At the very least, they should explicitly state the possibility that the amount of light entering from the gray background can matter greatly, and it is not resolved by simply changing the analysis interval to the baseline pre-stimulus epoch. I provide more clear details below:
 
 In line 194 of the redlined version of the article (in the Introduction), the citation to Baumann et al., PNAS, 2023 is missing near the citation of Jagadisan and Gandhi, 2022. Besides replicating Jagadisan and Gandhi, 2022, this other study actually showed that the subspaces for the visual and motor epochs are orthogonal to each other
 
 Line 683 (and around) of the redlined version of the article (in the Results): I'm very confused here. When I mentioned visual modulation by changed pupil diameter, I did not mean the transient changes associated with the brief onset of the cue in the memory-guided saccade task. I meant the gray background of the display itself. This is a strong source of light. If the pupil diameter changes across trials, then the amount of light entering the eye also changes from the gray background. Thus, visually-responsive neurons will have different amount of light driving them. This will also happen in the baseline interval containing only a fixation spot. The arguments made by the authors here do not address this point at all. So, please modify the text to explicitly state the possibility that the global luminance of the display (as filtered by the pupil diameter) alters the amount of light driving the visually-responsive neurons and could contribute to the higher effects seen in the more visual neurons.
 
 The figures (everywhere, including the responses to reviewers) are very low resolution and all equations in methods are missing.
 
 I'm very confused by Fig. 2 - supplement 2. Panel B shows a firing rate burst aligned to *microsaccade* onset. Does that mean you were in the foveal SC? i.e. how can neurons have a motor burst to the target of the memory-guided saccade and also for microsaccades? And which microsaccade directions caused such a burst? And what does it mean to compute the motor index and spike count for microsaccades in panel C? if you were in the proper SC location for the saccade target, then shouldn't you *not* get any microsaccade-related burst at all? This is very confusing to me and needs to be clarified
 
 Review 3
5. Public_Reviews 14 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public Review):
 
 (1) The authors make fairly strong claims that "arousal-related fluctuations are isolated from neurons in the deep layers of the SC" (emphasis added). This conclusion is based on comparisons between a "slow drift axis", a low-dimensional representation of neuronal drift, and other measures of arousal (Figures 2C, 3) and motor output sensitivity (Figures 2B, 3B). However, the metrics used to compare the slow-drift axis and motor activity were computed during separate task epochs: the delay period (600-1100 ms) and a perisaccade epoch (25 ms before and after saccade initiation), respectively. As the authors reference, deep-layer SC neurons are typically active only around the time of a saccade. Therefore, it is not clear if the lack of arousal-related modulations reported for deep-layer SC neurons is because those neurons are truly insensitive to those modulations, or if the modulations were not apparent because they were assessed in an epoch in which the neurons were not active. A potentially more valuable comparison would be to calculate a slow-drift axis aligned to saccade onset.
 
 The reviewer makes an important point that the calculation of an axis can depend critically on the time window of neuronal response. We find when considering this that the slow drift axis is less sensitive to this issue because it is calculated on time-averaged activity over multiple trials. In previous work we found that slow drift calculated on the stimulus evoked response in V4 was very well aligned to slow drift calculated on pre-stimulus spontaneous activity (Cowley et al, Neuron, 2020, Supplemental Figure 3A and 3B). To address this issue in the present data, we compared the axis computed for an example session for neural activity during the delay period and neural activity aligned to saccade onset. As shown new Figure 2 – figure supplement 1 in the revised manuscript, we found a similar lack of arousal-related modulations for deep-layer SC neurons when slow drift was computed using the saccade epoch (25ms before to 25ms after the onset of the saccade). Figure 2 – figure supplement 1A shows loadings for the SC slow drift axis when it was computed using spiking responses during the delay period (as in the main manuscript analysis). In contrast, Figure 2 – figure supplement 1B shows loadings from the same session when the SC slow drift axis was computed using spiking responses during the saccade epoch. The plots are highly similar and in both cases the loadings were weaker for neurons recorded from channels at the bottom of the probe which have a higher motor index. Finally, we found that projections onto the SC slow drift axis for this session were strongly correlated when the slow drift axis was computed using spiking responses during the delay period and the saccade epoch (r = 0.66, p < 0.001, Figure 1C). Taken together, these results suggest that arousal-related modulations are less evident in deep-layer SC neurons irrespective of whether slow drift was computed during the delay or saccade epoch (see also Public Reviews, Reviewer 1, Point 2).
 
 (2) More generally, arousal-related signals may persist throughout multiple different epochs of the task. It would be worthwhile to determine whether similar "slow-drift" dynamics are observed for baseline, sensory-evoked, and saccade-related activity. Although it may not be possible to examine pupil responses during a saccade, there may be systematic relationships between baseline and evoked responses.
 
 Similar to the point above, slow drift dynamics tend to be similar across different response epochs because they are averaged across many trials and seem to tap into responsivity trends that are robust across epochs. As shown in Author response image 1 below, and the Figure 2 – figure supplement 1 in the revised manuscript, similar dynamics were observed when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade epochs. We did not investigate differences between baseline and evoked pupil responses in the current paper. However, these effects were characterized in one of our previous papers that focused exclusively on the relationship between slow drift and eye-related metrics (Johnston et al., 2022, Cereb. Cortex, Figure 6). In this previous work, we found a negative correlation between baseline and evoked pupil size. Both variables were significantly correlated with slow drift, the only difference being the sign of the correlation.
 
 Author response image 1.
 
 (A-C) Dynamics of slow drift for three example sessions when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade epochs. Baseline = 100ms before the onset of the target stimulus; Delay = 600 to 1100ms after the offset of the target stimulus; Stim = 25ms to 125ms after the onset of the target stimulus; Sac = 25ms before to 25ms after the onset of the saccade.
 
 Johnston R, Snyder AC, Khanna SB, Issar D, Smith MA (2022) The eyes reflect an internal cognitive state hidden in the population activity of cortical neurons. Cereb Cortex 32:3331–3346.
 
 (3) The relationships between changes in SC activity and pupil size are quite small (Figures 2C & 5C). Although the distribution across sessions (Figure 2C) is greater than chance, they are nearly 1/4 of the size compared to the PFC-SC axis comparisons. Likewise, the distribution of r2 values relating pupil size and spiking activity directly (Figure 5) is quite low. We remain skeptical that these drifts are truly due to arousal and cannot be accounted for by other factors. For example, does the relationship persist if accounting for a very simple, monotonic (e.g., linear) drift in pupil size and overall firing rate over the course of an individual session?
 
 Firstly, it is important to note that the strength of the relationship between projections onto the SC slow drift axis and pupil size (r2 = 0.06) is within the range reported by Joshi et al. (2016, Neuron, Figure 3). They investigated the median variance explained between the spiking responses of individual SC neurons and pupil size and found it to be approximately 0.02 across sessions. Secondly, our statistical approach of testing the actual distribution of r2 values against a shuffled distribution was specifically designed to rule out the possibility that the relationship between SC spiking responses and pupil size occurred due to linear drifts. The shuffled distribution in Figure 2C of the main manuscript represents the variance that can be explained by one session’s slow drift correlated with another session’s pupil, which would contain effects that occurred due to linear drifts alone. That the actual proportion of variance explained was significantly greater than this distribution suggests that the relationship between projections onto the SC slow drift axis and pupil size reflects changes in arousal rather than other factors related to linear drifts.
 
 Joshi S, Li Y, Kalwani RM, Gold JI (2016) Relationships between Pupil Diameter and Neuronal Activity in the Locus Coeruleus, Colliculi, and Cingulate Cortex. Neuron 89:221–234.
 
 (4) It is not clear how the final analysis (Figure 6) contributes to the authors' conclusions. The authors perform PCA on: (i) residual spiking responses during the delay period binned according to pupil size, and (ii) spiking responses in the saccade epoch binned according to target location (i.e., the saccade tuning curve). The corresponding PCs are the spike-pupil axis and the saccade tuning axis, respectively. Unsurprisingly, the spikepupil axis that captures variance associated with arousal (and removes variance associated with saccade direction) was not correlated with a saccade-tuning axis that captures variance associated with saccade direction and omits arousal. Had these measures been related it would imply a unique association between a neuron's preferred saccade direction and pupil control- which seems unlikely. The separation of these axes thus seems trivial and does not provide evidence of a "mechanism...in the SC to prevent arousal-related signals interfering with the motor output." It remains unknown whether, for example, arousal-related signals may impact trial-by-trial changes in neuronal gain near the time of a saccade, or alter saccade dynamics such as acceleration, precision, and reaction time.
 
 The reviewer makes a good point, and we agree that more evidence is needed to determine if the separation of the pupil size axis and saccade tuning axis is the mechanism through which cognitive and arousal-related signals can be intermixed in the SC. In the revised manuscript (lines 679-682), we have raised this as a possible explanation that necessitates further study rather than stating definitively that it is the exact mechanism through which these signals are kept separate. Our analysis here is similar to the one from Smoulder et al (2024, Neuron, Fig. 2F), in which the interactions between reward signals and target tuning in M1 were examined (and found to be orthogonal). While we agree with the reviewer that it may seem “trivial” for these axes to be orthogonal, it does not have to be so. If, for example, neural tuning curves shifted with changes in pupil size through gain changes that revealed tuning or affected tuning curve shape, there could be projections of the pupil axis onto the target tuning axis. Thus, while we agree with the reviewer that it appears sensible for these two axes to be orthogonal, our result is nonetheless a novel finding. We have edited the text in our revised manuscript, however, to make sure the nuance of this point is conveyed to the reader.
 
 Smoulder AL, Marino PJ, Oby ER, Snyder SE, Miyata H, Pavlovsky NP, Bishop WE, Yu BM, Chase SM, Batista AP. A neural basis of choking under pressure. Neuron. 2024 Oct 23;112(20):3424-33.
 
 Reviewer #2 (Public Review):
 
 (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themselves introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time explaining the importance of arousal and how it could interfere with oculomotor behavior.
 
 Although attention does represent an important cognitive process, we did not design an experiment in which attention and oculomotor control are differentiated because attention does not appear to be related to slow drift. In our first paper that reported on this phenomenon, we investigated the effects of spatial attention on slow fluctuations in neural activity by cueing the monkeys to attend to a stimulus in the left or right visual field in a block-wise manner. Each block lasted ~20 minutes and we found that slow drift did not covary with the timing of cued blocks (see Figure 4A, Cowley et al., 2020, Neuron). Furthermore, there is a large body of work showing that arousal also impacts motor behavior leading to changes in a range of eye-related metrics (e.g., pupil size, microsaccade rate and saccadic reaction time - for review, see Di Stasi et al. 2013, Neurosci. Biobehav. Rev.). We also note that the terms attention and arousal are often used in nonspecific and overlapping ways in the literature, adding to some potential confusion here. Nonetheless, pupil-linked arousal is an important variable that impacts motor performance. This has now been stated clearly in the Introduction of the revised manuscript (lines 108-114) to address the reviewer’s concerns and highlight the importance of studying how precise fixation and eye movements are maintained even in the presence of signals related to ongoing changes in brain state.
 
 Cowley BR, Snyder AC, Acar K, Williamson RC, Yu BM, Smith MA (2020) Slow Drift of Neural Activity as a Signature of Impulsivity in Macaque Visual and Prefrontal Cortex. Neuron 108:551-567.e8.
 
 (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.
 
 As described above, several studies across species have demonstrated that arousal impacts motor behavior e.g., saccade reaction time, saccade velocity and microsaccade rate (for review, see Di Stasi et al. 2013, Neurosci. Biobehav. Rev.). This has been clarified in the Introduction of the revised manuscript to address the reviewer's concerns (lines 108-114). Our prior work (Johnston et al, Cerebral Cortex, 2022) shows that slow drift impacts several types of oculomotor behavior. Overall, these studies highlight the impact of arousal on eye movements as a robust effect, and support the present investigation into arousal and oculomotor control signals. While we agree reaction time, accuracy, and speed all can be influenced by arousal depending on task demands, the present study is focused on the connection between slow fluctuations in neural activity, linked to arousal, and different subpopulations of SC neurons.
 
 Di Stasi LL, Catena A, Cañas JJ, Macknik SL, Martinez-Conde S (2013) Saccadic velocity as an arousal index in naturalistic tasks. Neurosci Biobehav Rev 37:968–975.
 
 Johnston R, Snyder AC, Khanna SB, Issar D, Smith MA (2022) The eyes reflect an internal cognitive state hidden in the population activity of cortical neurons. Cereb Cortex 32:3331–3346.
 
 (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?
 
 We agree with the reviewer that our actual data distribution was non-uniform. We examined individual sessions with high and low variance explained and did not find notable differences. One source of this variation has to do with session length. Longer sessions in principle should have a chance distribution of variance explained closer to zero because they contained more time bins. Given that we had no specific hypothesis for a non-uniform distribution, we have simply displayed the full distribution of values in our figure and the statistical result of a comparison to a shuffled distribution.
 
 Reviewer #3 (Public Review):
 
 (1) However, I am concerned about two main points: First, the authors repeatedly say that the "output" layers of the SC are the ones with the highest motor indices. This might not necessarily be accurate. For example, current thresholds for evoking saccades are lowest in the intermediate layers, and Mohler & Wurtz 1972 suggested that the output of the SC might be in the intermediate layers. Also, even if it were true that the high motor index neurons are the output, they are very few in the authors' data (this is also true in a lot of other labs, where it is less likely to see purely motor neurons in the SC). So, this makes one wonder if the electrode channels were simply too deep and already out of the SC? In other words, it seems important to show distributions of encountered neurons (regardless of the motor index) across depth, in order to better know how to interpret the tails of the distributions in the motor index histogram and in the other panels of Figure Supplement 1. I elaborate more on these points in the detailed comments below.
 
 The reviewer makes a good point about the efferent signals from SC. It is true that electrical thresholds are often lowest in intermediate layers, though deep layers do project to the oculomotor nuclei (Sparks, 1986; Sparks & Hartwich-Young, 1989) and often intermediate and deep layers are considered to function together to control eye movements (Wurtz & Albano, 1980). As suggested by the reviewer, we have edited the text throughout the manuscript to say that slow drift was less evident in SC neurons with a higher motor index, as well as included the above references and points about the intermediate and deep layers (Lines 73-81). Aside from the question of which layers of the SC function as the “motor output”, the reviewer raises a separate and important question – are our deep recordings still in SC. Here, we can say definitively that they are. We removed neurons if they did not exhibit elevated (above baseline) firing rates during the visual or saccade epochs of the MGS task (see Methods section on “Exclusion criteria”). All included neurons possessed a visual, visuomotor or motor response, consistent with the response properties of neurons in the SC. In addition, we found a number of neurons well above the bottom of the probe with strong motor responses and minimal loadings onto the slow drift axis (see Figure 2 – figure supplement 1A), consistent with the reviewer’s comment that intermediate layer neurons are tuned for movement and play a role in saccade production.
 
 Mohler CW, Wurtz RH. Organization of monkey superior colliculus: intermediate layer cells discharging before eye movements. Journal of neurophysiology. 1976 Jul 1;39(4):722-44.
 
 Sparks DL. Translation of sensory signals into commands for control of saccadic eye movements: role of primate superior colliculus. Physiol Rev. 1986 Jan;66(1):118-71. doi: 10.1152/physrev.1986.66.1.118. PMID: 3511480.
 
 Sparks DL, Hartwich-Young R. The deep layers of the superior colliculus. Reviews of oculomotor research. 1989 Jan 1;3:213-55.
 
 Wurtz RH, Albano JE. Visual-motor function of the primate superior colliculus. Annu Rev Neurosci. 1980;3:189-226. doi: 10.1146/annurev.ne.03.030180.001201. PMID: 6774653.
 
 (2) Second, the authors find that the SC cells with a low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual responses. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.
 
 The reviewer makes an important point about the SC’s visual responses. Neurons with a low motor index are, conversely, likely to have a stronger visual response index. However, we do not believe that changes in luminance can explain why the correlation between SC spiking response and pupil size is weaker for neurons with a lower motor index. Firstly, the changes in pupil size observed in the current paper and our previous work are slow and occur on a timescale of minutes (Cowley et al., 2020, Neuron) and are correlated with eye movement measures such as reaction time and microsaccade rate (Johnston et al., 2022, Cerebral Cortex). This is in stark contrast to luminance-evoked changes in pupil size that occur on a timescale of less than a second. Secondly, as shown the new Figure 5 – figure supplement 1 in the revised manuscript, very similar results were found when SC spiking responses were correlated with pupil size during the baseline period, when only the fixation point was on the screen. Although the luminance of the small peripheral target stimulus can result in small luminance-evoked changes in pupil size, no changes in luminance occurred during the baseline period which was defined as 100ms before the onset of the target stimulus. In Figure 2 – figure supplement 1 and Author response image 1 above, we show that slow drift is the same whether calculated on the baseline response, delay period, or peri-saccadic epoch. Thus, the measurement of slow drift is insensitive to the precise timing of the selection of both the window for the spiking response and the window for the pupil measurement. If luminance were the explanation for the slow changes in firing observed in visually responsive SC neurons, it would require those neurons to exhibit robust, sustained tuned responses to the small changes in retinal illuminance induced by the relatively small fluctuations in pupil size we observed from minute to minute. We are aware of no reports of such behavior in visually-responsive neurons in SC. We have included these analyses and this reasoning in the revised manuscript on lines 478-495.
 
 Reviewer#1 (Recommendations for the author):
 
 (1) It would be useful to provide line numbers in subsequent manuscripts for reviewers.
 
 Line numbers have been added in the revised version of the manuscript.
 
 (2) Page #6; last sentence: "...even impact processing at the early to mid stages of the visuomotor transformation, without leading to unwanted changes in motor output." I do not believe the authors have provided evidence that arousal levels were not associated with changes in motor output.
 
 As suggested by Reviewer 3 (see Public Reviews, Reviewer 3, Point 2), we have edited the text throughout the manuscript to say that slow drift was less evident in SC neurons with a higher motor index. This sentence in the revised manuscript now reads:
 
 “This provides a potential mechanism through which signals related to cognition and arousal can exist in the SC, and even impact processing at the early to mid stages of the visuomotor transformation, without leading to unwanted changes in SC neurons that are linked to saccade execution.”
 
 (3) Page #8; last paragraph: Although deep-layer SC neurons may not have been obtained during every recording session, a summary of the motor index scores observed along the probe across sessions would be useful to confirm their assumptions.
 
 See Author response image 2 below which shows the motor index of each recoded SC neuron on the x-axis and session number on the y-axis. The points are colored by to the squared factor loading which represents the variance explained between the response a neuron and the slow drift axis (see Figure 3B of the main manuscript). You can see from this plot that neurons with a stronger component loading (shown in teal to yellow) typically have a lower motor index whereas the opposite is true for neurons with a weaker component loading (shown in dark blue).
 
 Author response image 2.
 
 Scatter plot showing the motor index of each recorded neuron along with the session number in which it was recorded. The points are colored by to the squared factor loading for each neuron along the slow drift axis. Note that loadings above 0.5 (33 data points in total) have been thresholded at 0.5 so that we could effectively use the color range to show all of the slow drift axis loadings.
 
 (4) Page #10; first paragraph: The authors should state the time window of the delay period used, since it may be distinct from the pupil analysis (first 200ms of delay).
 
 This has been stated in the revised version of the manuscript. The sentence now reads:
 
 “We first asked if arousal-related fluctuations are present in the SC. As in previous studies that recorded from neurons in the cortex (Cowley et al., 2020), we found that the mean spiking responses of individual SC neurons during the delay period (chosen at random on each trial from a uniform distribution spanning 600-1100ms, see Methods) fluctuated over the course of a session while the monkeys performed the MGS task (Figure 2A, left).”
 
 (5) Page #10; second paragraph: Extra period at the end of a sentence: " most variance in the data..".
 
 Fixed in the revised version of the manuscript.
 
 (6) Page #12: "between projections onto the SC slow drift axis and mean pupil size during the first 200ms of the delay period when a task-related pupil response could be observed." What criteria was used to determine whether a task-related pupil response was observed?
 
 This was chosen based on the results of a previous study in our lab that used the same memory-guided saccade task to investigate the relationship between slow drift and changes in based and evoked pupil size (see Johnston et al., 2022, Cereb. Cortex, Figure 6B). The period was chosen based on plotting the average pupil size aligned on different trial epochs. As we show in Figure 5-figure supplement 3 above, the pupil interactions with slow drift did not depend on the particular time window of the pupil we chose.
 
 (7) Page #14; Figure 2A: The axes for the individual channels are strangely floating and quite different from all other figures. Please label the channel in the figure legend that was used as an example of the projected values onto the slow drift axis.
 
 The figure has been changed in the revised version of the manuscript so that the tick mark denoting zero residual spikes per second is on the top layer of each plot. A scale bar was chosen instead of individual axes to reduce clutter in the figure as it was used to demonstrate how slow drift was computed. Residual spiking responses from all neurons were projected on the slow drift axis to generate the scatter plot in the bottom right-hand corner of Figure 2A. There is no single neuron to label.
 
 (8) Page #16: "These results demonstrate that even though arousal-related fluctuations are present in the SC, they are isolated from deep-layer neurons that elicit a strong saccadic response and presumably reside closer to the motor output." In line with our major comments, lack of arousal-related activity during the delay period is meaningless for deep-layer SC neurons that are generally inactive during this time. It does not imply that there is no arousal signal!
 
 Addressed in Public Reviews, Reviewer 1, Point 1 & 2. We found a similar lack of arousal-related modulations reported for deep-layer SC neurons when slow drift was computed using the saccade epoch (Figure 1 above). In addition, similar dynamics were observed when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade period (Figure 2).
 
 (9) Page #18: "These findings provide additional support for the hypothesis that arousalrelated fluctuations are isolated from neurons in the deep layers of the SC." The same criticism from above applies.
 
 Addressed in Public Reviews, Reviewer 1, Point 1 & 2.
 
 (10) Page #20; paragraph 3: "Taken together, the findings outlined above..." Would be useful to be more specific when referring to "activity" ; e.g., "...these neurons did not exhibit large fluctuations in delay-period activity over time".
 
 This sentence has been changed in the revised manuscript in light of the reviewer’s comments. It now reads:
 
 “In addition to being more weakly correlated with pupil size, the spiking responses of these neurons did not exhibit large fluctuations over time (Figure 2), and when considering the neuronal population as a whole, explained less variance in the slow drift axis when it was computed using population activity in the SC (Figure 3) and PFC (Figure 4).”
 
 Reviewer #3 (Recommendations for the author):
 
 The paper is clear and well-written. However, I am concerned about two main points:
 
 (1) First, the authors repeatedly say that the "output" layers of the SC are the ones with the highest motor indices. This might not necessarily be accurate. For example, current thresholds for evoking saccades are lowest in the intermediate layers, and Mohler & Wurtz 1972 suggested that the output of the SC might be in the intermediate layers. Also, even if it were true that the high motor index neurons are the output, they are very few in the authors' data (this is also true in a lot of other labs, where it is less likely to see purely motor neurons in the SC). So, this makes one wonder if the electrode channels were simply too deep and already out of the SC. In other words, it seems important to show distributions of encountered neurons (regardless of motor index) across depth, in order to better know how to interpret the tails of the distributions in the motor index histogram and in the other panels of the figure supplement 1. I elaborate more on these points in the detailed comments below.
 
 Addressed in Public Reviews, Reviewer 3, Point 1.
 
 (2) Second, the authors find that the SC cells with a low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual responses. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.
 
 Addressed in Public Reviews, Reviewer 3, Point 2.
 
 (3) I think that a remedy to the first point above is to change the text to make it a bit more descriptive and less interpretive. For example, just say that the slow drifts were less evident among the neurons with high motor index.
 
 We thank the reviewer for this suggestion (see Public Reviews, Reviewer 3, Point 1).
 
 (4) For the second point, I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large.
 
 We thank the reviewer for this suggestion (see Public Reviews, Reviewer 3, Point 2).
 
 (5) Line 31: I'm a bit underwhelmed by this kind of statement. i.e. we already know that cognitive processes and brain states do alter eye movements, so why is it "critical" that high precision fixation and eye movements are maintained? And, isn't the next sentence already nulling this idea of criticality because it does show that the brain state alters the SC neurons? In fact, cognitive processes are already known to be most prevalent in the intermediate and deep layers of the SC.
 
 It seems clear that while cognitive state does affect eye movements, it is desirable to have some separation between cognitive state and eye movement control. Covert attention, for instance, is precisely a situation where eye movement control is maintained to avoid overt saccades to the attended stimulus, and yet there are clear indications of attention’s impact on microsaccades and fixation. We stand by our statement that an important goal of vision is to have precise fixation and movements of the eye, and yet at the same time the eyes are subject to numerous influences by cognitive state.
 
 (6) Line 65: it is better to clarify that these are "functional layers" because there are actually more anatomical layers.
 
 We have edited this sentence in the revised version of the manuscript so that it now reads:
 
 “The role of these projections in the visuomotor transformation depends on the functional layer of the SC in which they terminate”.
 
 (7) Line 73: this makes it sound like only the deepest layers are topographically organized, which is not true. Also, as early as Mohler & Wurtz, 1972, it was suggested that the intermediate layers have the biggest impacts downstream of the SC. This is also consistent with electrical microstimulation current thresholds for evoking saccades from the SC.
 
 We have addressed the reviewers’ comments about the intermediate layers having the biggest impact downstream of the SC in Public Reviews, Reviewer 3, Point 1. Furthermore, line 73 has been changed in the revised manuscript so that it now reads:
 
 “As is the case for neurons in the superficial and intermediate layers, they [SC motor neurons] form a topographically organized map of visual space (White et al. 2017; Robinson 1972; Katnani and Gandhi 2011)”.
 
 (8) Line 100: there is an analogous literature regarding the question of why unwanted muscle contractions do not happen. Specifically, in the context of why SC visual bursts do not automatically cause saccades (which is a similar problem to the ones you mention about cognitive signals interfering by generating unwanted eye movements), both Jagadisan & Gandhi, Curr Bio, 2022 and Baumann et al, PNAS, 2023 also showed that SC population activity not only has different temporal structure (Jagadisan & Gandhi) but also occupy different subspaces (Baumann et al) under these two different conditions (visual burst versus saccade burst). This is conceptually similar to the idea that you are mentioning here with respect to arousal. So, it is worth it to mention these studies here and again in the discussion.
 
 We are grateful to the reviewer for these suggestions and have included text in the Introduction (Lines 125-128) and Discussion (Lines 678-682) of the revised manuscript along with the references cited above.
 
 (9) Line 147: as mentioned above, it is now generally accepted that there are quite a few "pure" motor neurons in the SC. This is consistent with what you find. E.g. Baumann et al., 2023. And, again see Mohler and Wurtz in the 1970's. So, I wonder how useful it is to go too much into this idea of the deeper motor neurons (e.g. the correlations in the other panels of the Figure 1 supplement).
 
 This is related to the reviewer’s comment that the output of the SC might be in the intermediate layers. This concern has been addressed in Public Reviews, Reviewer 3, Point 1.
 
 (10) Figure 1 should say where the RF was for the shown spike rasters. i.e. were these the same saccade target across trials? And where was that location relative to the RF? It would help also in the text to say whether the saccade was always to the RF center or whether you were randomizing the target location.
 
 We centered the array of saccade targets using the microstimulation-evoked eye movement for SC (see Methods section “Memory-guided saccade task”) to find the evoked eccentricity, and then used saccade targets with equal spacing of 45 degrees starting at zero (rightward saccade target). We did not do extensive RF mapping beyond this microstimulation centering. In Figure 1, the spike rasters are shown for a target that was visually identified to be within the neuron’s RF based on assessing responses to all 8 target angles. We have added information about this to the figure caption.
 
 (11) Line 218: but were there changes in the eye movement statistics? For example, the slow drift eye movements during fixation? Or even the microsaccades?
 
 Addressed in Public Reviews, Reviewer 2, Point 2.
 
 (12) Line 248: shuffling what exactly? I think that more explanation would be needed here.
 
 Addressed in Public Reviews, Reviewer 1, Point 3.
 
 (13) Line 263: but isn't this reflecting a sensory transient in the pupil diameter, since the target just disappeared?
 
 Addressed in Public Reviews, Reviewer 3, Point 2.
 
 (14) Line 271: I suspect that slow drift eye movements (in between microsaccades) would show higher correlations. Not sure how well you can analyze those with a video-based eye tracker.
 
 We agree that fixational drift would be a worthwhile metric, but it is not one we have focused on here and to our knowledge does require higher precision tracking.
 
 (15) Line 286: again, see above about similar demonstrations with respect to the visual and motor burst intervals, which clearly cause the same problem (even stronger) as the one studied here.
 
 See reply, including Figure 2.
 
 (16) Line 330: again, I'm not sure deeper necessarily automatically means closer to the output. For example, current thresholds for evoked saccades grow higher as you go deeper. Maybe the authors can ask their colleague Neeraj Gandhi about this point specifically, just to be safe. Maybe the safest would be to remain descriptive about the data, and just say something like: arousal-related fluctuations were absent in our deepest recorded sites.
 
 Addressed in Public Reviews, Reviewer 3, Point 1.
 
 (17) Line 332: likewise, statements like this one here would be qualified if the output was the intermediate layers......anyway if I understand what I read so far in the paper, the signal will be anyway orthogonal to the motor burst population subspace. So, maybe there's no need to emphasize that it goes away in the very deepest layers.
 
 See reply above, Public Reviews, Reviewer 1, Point 4.
 
 (18) Figure 3A: related to the above, I think one issue could be that the deeper contacts might already be out of the SC. Maybe some cell count distribution from each channel should help in this regard. i.e. were you finding way fewer saccade-related neurons in the deepest channels (even though the few that you found were with high motor index)? If so, then wouldn't this just mean that the channel was too deep? I think there needs to be an analysis like this, to convince readers that the channels were still in the SC. Ideally, electrical stimulation current thresholds for evoking saccades at different depths would be tested, but I understand that this can be difficult at this stage.
 
 Addressed in Public Reviews, Reviewer 3, Point 1.
 
 (19) I keep repeating this because in general, cognitive effects are stronger in the intermediate/deeper layers than in the superficial layers. If these interfere with eye movements like arousal, then why should arousal be different?
 
 Few studies have investigated the effects of attention on “pure” movement SC neurons that only discharge during a saccade. One study, which we cited in Introduction (Ignashchenkova et al., 2004, Nat. Neurosci.), found significant differences in spiking responses between trials with and without attentional cueing for visual and visuomotor neurons. No significant difference was found for motor neurons, consistent with our hypothesis that signals related to cognition and arousal are kept separate from saccade-related signals in the SC.
 
 (20) The problem with Figure 5 and its related text is that the neurons with low motor index are additionally visual. So, of course, they can be modulated if the pupil diameter changes!
 
 Addressed in Public Reviews, Reviewer 3, Point 2.
 
 (21) I had a hard time understanding Figure 6.
 
 See reply above, Public Reviews, Reviewer 1, Point 4.
 
 (22) Line 586: these cells have more visual responses and will be affected by the amount of light entering the eye.
 
 Addressed in Public Reviews, Reviewer 3, Point 2.
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.26.591284v2
www.biorxiv.org www.biorxiv.org

Glycogen Engineering Improves the Starvation Resistance of Mesenchymal stem cells and their Therapeutic Efficacy in Pulmonary Fibrosis

4
1. Public_Reviews 13 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This important study presents a novel approach to enhance the therapeutic potential of mesenchymal stromal cells (MSCs) by genetically modifying their glycogen synthesis pathway, resulting in increased glycogen accumulation and improved cell survival under starvation conditions, particularly in the context of experimental pulmonary fibrosis. The methods and findings are generally solid and could be strengthened in the future by investigating the kinetics of persistence, the immunomodulatory effects, and the underlying improved mechanism of action of MSCs in this pulmonary fibrosis model. If confirmed, this approach could suggest potential methods to improve the therapeutic functionality of MSCs in cell therapy strategies.
 
 Summary
2. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This study provides the first evidence that glucose availability, previously shown to support cell survival in other models, is also a key determinant for post-implantation MSC survival in the specific context of pulmonary fibrosis. To address glucose depletion in this context, the authors propose an original, elegant, and rational strategy: enhancing intracellular glycogen stores to provide transplanted MSCs with an internal energy reserve. This approach aims to prolong their viability and therapeutic functionality after implantation.
 
 Strengths:
 
 The efficacy of this metabolic engineering strategy is robustly demonstrated both in vitro and in an orthotopic mouse model of pulmonary fibrosis.
 
 Review 1
3. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this article, the authors investigate enhancing the therapeutic and regenerative properties of mesenchymal stem cells (MSCs) through genetic modification, specifically by overexpressing genes involved in the glycogen synthesis pathway. By creating a non-phosphorylatable mutant form of glycogen synthase (GYSmut), the authors successfully increased glycogen accumulation in MSCs, leading to significantly improved cell survival under starvation conditions. The study highlights the potential of glycogen engineering to improve MSC function, especially in inflammatory or energy-deficient environments. However, critical gaps in the study's design, including the lack of validation of key findings, limited differentiation assessments, and missing data on MSC-GYSmut resistance to reactive oxygen species (ROS), necessitate further exploration.
 
 Strengths:
 
 (1) Novel Approach: The study introduces an innovative method of enhancing MSC function by manipulating glycogen metabolism.
 
 (2) Increased Glycogen Storage: The genetic modification of GYS1, resulting in GYSmut, significantly increased glycogen accumulation, leading to improved MSC survival under starvation, which has strong implications for enhancing MSC therapeutic properties in energy-deficient environments.
 
 (3) Potential Therapeutic Impact: The findings suggest significant therapeutic potential for MSCs in conditions that require improved survival, persistence, and immunomodulation, especially in inflammatory or energy-limited settings.
 
 (4) In Vivo Validation: The in vivo murine model of pulmonary fibrosis demonstrated the improved survival and persistence of MSC-GYSmut, supporting the translational potential of the approach.
 
 Weaknesses:
 
 (1) Lack of Differentiation Assessments: The study did not evaluate key MSC differentiation pathways, including chondrogenic and osteogenic differentiation. The absence of analysis of classical MSC surface markers and multipotency limits the understanding of the full potential of MSC-GYSmut.
 
 (2) Missing Validation of RNA Sequencing Data: Although RNA sequencing data revealed promising transcriptomic changes in chondrogenesis and metabolic pathways, these findings were not experimentally validated, limiting confidence.
 
 (3) Lack of ROS Resistance Analysis: Resistance to reactive oxygen species (ROS), an important feature for MSCs under regenerative conditions, was not assessed, leaving out a critical aspect of MSC function.
 
 (4) Limited Exploration of Immunosuppressive Properties: The study did not address the immunosuppressive functions of MSC-GYSmut, which are critical for MSC-based therapies in clinical settings.
 
 Conclusion:
 
 The study presents an exciting new direction for enhancing MSC function through glycogen metabolism engineering. While the results show promise, key experiments and validations are missing, and several areas, such as differentiation capacity, ROS resistance, and immunosuppressive properties, require further investigation. Addressing these gaps would solidify the conclusions and strengthen the potential clinical applications of MSC-GYSmut in regenerative medicine.
 
 Review 2
4. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public Review)：
 
 (1) Glycogen biosynthesis typically involves several enzymes. In this context, could the authors comment on the effect of overexpressing a single enzyme - especially a mutant version - on the structure or quality of the glycogen synthesized?
 
 While quantitative molecular weight analysis of synthesized glycogen was not performed, we documented changes in glycogen particle morphology. GYSmut overexpression resulted in significantly enlarged singular glycogen granules, suggesting potential high molecular mass, while GYS-GYG co-overexpression in MSCs (GYG being the essential enzyme for glycogen synthesis initiation) produced a diffuse glycogen distribution pattern rather than particulate structures. We have incorporated this result as new Figure S2C.
 
 These results suggest that overexpression of specific glycogen-metabolizing enzymes significantly influences glycogen structure. Consequently, targeted modulation of glycogen architecture and properties through key enzymes represents a potential avenue for future investigation.
 
 (2) Regarding the in vitro starvation experiments (Figure 2C), what oxygen conditions (pO₂) were used? Are these conditions physiologically relevant and representative of the in vivo lung microenvironment?
 
 Our in vitro starvation experiments (Figure 3C) were conducted under normoxic (21%). The oxygen concentration in human lungs is physiologically lower than atmospheric levels, with healthy individuals exhaling air containing approximately 16% oxygen (Thalakkotur Lazar Mathew, Diagnostics 2015). To our knowledge, direct measurements of alveolar oxygen concentration in pulmonary fibrosis are rare. Therefore, to evaluate the performance of GYSmut under hypoxic conditions, in the revised manuscript, Figure S2 has been augmented to include assessment of cell performance under combined hypoxia （oxygen concentration < 5%）and nutrient deprivation stress, which further corroborate the superiority of the GYSmut group over the control under different oxygen concentrations.
 
 (3) In the in vitro model, how many hours does it take for the intracellular glycogen reserve to be completely depleted under starvation conditions?
 
 While quantitative cell viability data were recorded up to 72 hours post-implantation (Fig 3C), we observed cell viability at approximately 96 hours. We noticed that the presence of glycogen particles exhibited a correlation with sustained cell viability. However, reliable quantitative assessment of glycogen became increasingly challenging upon significant depletion of viable cells, thereby limiting our measurements during later time points.
 
 (4) For the in vivo model, is there a quantitative analysis of the survival kinetics of the transplanted cells over time for each group? This would help to better assess the role and duration of glycogen stores as an energy buffer after implantation.
 
 We tracked the in vivo distribution and persistence of implanted MSCs using enzymatic activity quantification assays (using Gluc luciferase assay) and live animal imaging (using Akaluc luciferase). The revised manuscript includes quantitative analysis of the in vivo fluorescence imaging data, which has been supplemented as Figure S4. Glycogen-engineered MSCs and control cells were quantitatively assessed at three discrete time points post-implantation. This quantification revealed a transient divergence in cell viability between the experimental and control groups around day 7. However, fluorescence in both cohorts subsequently declined to similar levels over the extended observation period.
 
 (5) Finally, the study was performed in male mice only. Could sex differences exist in the efficacy or metabolism of the engineered MSCs? It would be helpful to discuss whether the approach could be expected to be similarly effective in female subjects.
 
 We appreciate the reviewer’s important question regarding potential sex differences. Our study used male mice based on three key considerations: 1) Clinical Relevance: Idiopathic pulmonary fibrosis (IPF) shows significant male predominance, with diagnosis rates 3.5-fold higher in men (37.8% vs 10.6%, p<0.0001) and greater diagnostic confidence (Assayag et al., Thorax 2020). 2) Model Consistency: The bleomycin model (our chosen method) demonstrates more consistent fibrotic responses in male mice (Gul et al., BMC Pulm Med 2023). 3) Biological Rationale:
 
 Estrogen’s protective effects in females may confound therapeutic assessments (cited in Assayag et al.).
 
 We fully acknowledge this limitation and will include female subjects in subsequent translational studies. The therapeutic principle should theoretically apply to both sexes, but we agree this requires experimental validation.
 
 (6) The number of mice for each group and time point should be specified.
 
 The manuscript text has been revised to enhance clarity, and the number of mice for each group and time point has been specified (line 170 to 182).
 
 Reviewer #2 (Public Review):
 
 (4) Inconsistencies in In Vivo Data: There is a discrepancy between the number of animals shown in the figures and the graph (three individuals vs. five animals), as well as missing details on how luciferase signal intensity was quantified, requiring further clarification.
 
 To assess MSC survival in vivo, we employed two strategies utilizing distinct luciferases optimized for specific detection modalities. MSC viability was quantified ex vivo through Gaussia luciferase (Gluc) activity, leveraging its high sensitivity and established commercial assay kits (n = 3 mice per group per time point). For non-invasive longitudinal tracking within living animals, MSC distribution and viability were monitored via in vivo bioluminescence imaging using Akaluc luciferase, selected for its superior tissue penetration and sensitivity in situ (n = 5 mice per group).The manuscript text has been revised to enhance clarity, and the experiment protocols for luciferase signal detection and quantification has been added into Methods.
 
 （1) (2) (3) (5):
 
 We fully agree that further investigation into the functional consequences of glycogen engineering in MSCs – encompassing core cellular functions, immunomodulatory properties, and associated signaling pathways – is important to fully elucidate the underlying mechanisms. Cellular metabolism is intrinsically intertwined with diverse physiological processes. Consequently, we believe that glycogen engineering exerts multifaceted effects on MSCs, likely extending beyond the modulation of any single specific pathway. Studying the metabolic perturbation induced by such engineering approaches in mammalian cells represents an interesting field. The exploration of these aspects remains an long-term research objective within our group.
 
 Reviewer #2 (Recommendations for the authors):
 
 (6) Clarification of Data in the Murine Model:
 
 In Figure 4B, there is a discrepancy between the number of animals shown in the image (five) and those represented in the graph (three). This discrepancy needs clarification. Additionally, the study lacks information regarding the intensity of the signal in the luciferase assays. It is unclear how luciferase expression in the mice was quantified, and providing this detail would enhance the understanding of the data presented.
 
 We sincerely appreciate these valuable suggestions. We have revised the relevant text for greater clarity. Figure 4B and Figure 4C present results from two distinct experimental approaches, each employing different luciferase reporters and measurement methodologies, and different num of mice were used in these two experiments.
 
 Quantitative data derived from the in vivo bioluminescence imaging has been supplemented as Figure S4. The experiment protocols for luciferase signal detection and quantification has been added into Methods.
 
 To other recommendations of reviewer 2：
 
 We sincerely appreciate your valuable insights, which demonstrate your deep expertise. We fully agree that beyond nutrient availability, factors such as reactive oxygen species (ROS) and the immune microenvironment are also critical limitations affecting the survival and therapeutic efficacy of implanted MSCs.
 
 We propose that glycogen engineering exerts broad effects on MSCs. These effects manifest as changes in multiple cellular characteristics, including proliferation, differentiation, surface marker expression, antioxidant capacity, and immunomodulatory activity – all crucial factors for the therapeutic purpose of MSCs.
 
 We believe these changes likely involve complex networks of interconnected regulatory factors. The underlying mechanisms might be clarified through proteomic and metabolomic profiling.
 
 However, comprehensively investigating these interconnected aspects requires significant time and resources. Some components of this research extend beyond the current scope of our project. Nevertheless, exploring these mechanisms remains an important objective, and we will actively work to investigate them further in our ongoing studies.
 
 AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.02.21.639504v2
www.biorxiv.org www.biorxiv.org

Toward Robust Neuroanatomical Normative Models: Influence of Sample Size and Covariates Distributions

4
1. Public_Reviews 13 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This important manuscript evaluates how sample size and demographic balance of reference cohorts affect the reliability of normative models. The evidence supporting the conclusions is convincing, although some additional analysis and clarifications could improve the generalisability of the conclusions. This work will be of interest to clinicians and scientists working with normative models.
  
  Summary
2. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Overall, this is a well-designed and carefully executed study that delivers clear and actionable guidance on the sample size and representative demographic requirements for robust normative modelling in neuroimaging. The central claims are convincingly supported.
  
  Strengths:
  
  The study has multiple strengths. First, it offers a comprehensive and methodologically rigorous analysis of sample size and age distribution, supported by multiple complementary fit indices. Second, the learning-curve results are compelling and reproducible and will be of immediate utility to researchers planning normative modelling projects. Third, the study includes both replication in an independent dataset and an adaptive transfer analysis from UK Biobank, highlighting both the robustness of the results and the practical advantages of transfer learning for smaller clinical cohorts. Finally, the clinical validation ties the methodological work back to clinical application.
  
  Weaknesses:
  
  There are two minor points for consideration:
  
  (1) Calibration of percentile estimates could be shown for the main evaluation (similar to that done in Figure 4E). Because the clinical utility of normative models often hinges on identifying individuals outside the 5th or 95th percentiles, readers would benefit from visual overlays of model-derived percentile curves on the curves from the full training data and simple reporting of the proportion of healthy controls falling outside these bounds for the main analyses (i.e., 2.1. Model fit evaluation).
  
  (2) The larger negative effect of left-skewed sampling likely reflects a mismatch between the younger training set and the older test set; accounting explicitly for this mismatch would make the conclusions more generalisable.
  
  Review 1
3. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors test how sample size and demographic balance of reference cohorts affect the reliability of normative models in ageing and Alzheimer's disease. Using OASIS-3 and replicating in AIBL, they change age and sex distributions and number of samples and show that age alignment is more important than overall sample size. They also demonstrate that models adapted from a large dataset (UK Biobank) can achieve stable performance with fewer samples. The results suggest that moderately sized but demographically well-balanced cohorts can provide robust performance.
  
  Strengths:
  
  The study is thorough and systematic, varying sample size, age, and sex distributions in a controlled way. Results are replicated in two independent datasets with relatively large sample sizes, thereby strengthening confidence in the findings. The analyses are clearly presented and use widely applied evaluation metrics. Clinical validation (outlier detection, classification) adds relevance beyond technical benchmarks. The comparison between within-cohort training and adaptation from a large dataset is valuable for real-world applications.
  
  The work convincingly shows that age alignment is crucial and that adapted models can reach good performance with fewer samples. However, some dataset-specific patterns (noted above) should be acknowledged more directly, and the practical guidance could be sharper.
  
  Weaknesses:
  
  The paper uses a simple regression framework, which is understandable for scalability, but limits generalization to multi-site settings where a hierarchical approach could better account for site differences. This limitation is acknowledged; a brief sensitivity analysis (or a clearer discussion) would help readers weigh trade-offs. Other than that, there are some points that are not fully explained in the paper:
  
  (1) The replication in AIBL does not fully match the OASIS results. In AIBL, left-skewed age sampling converges with other strategies as sample size grows, unlike in OASIS. This suggests that skew effects depend on where variability lies across the age span.
  
  (2) Sex imbalance effects are difficult to interpret, since sex is included only as a fixed effect, and residual age differences may drive some errors.
  
  (3) In Figure 3, performance drops around n≈300 across conditions. This consistent pattern raises the question of sensitivity to individual samples or sub-sampling strategy.
  
  (4) The total outlier count (tOC) analysis is interesting but hard to generalize. For example, in AIBL, left-skew sometimes performs slightly better despite a weaker model fit. Clearer guidance on how to weigh model fit versus outlier detection would strengthen the practical message.
  
  (5) The suggested plateau at n≈200 seems context-dependent. It may be better to frame sample size targets in relation to coverage across age bins rather than as an absolute number.
  
  Review 2
4. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Author response
  
  We would like to thank the editors and two reviewers for the assessment and the constructive feedback on our manuscript, “Toward Robust Neuroanatomical Normative Models: Influence of Sample Size and Covariates Distributions”. We appreciate the thorough reviews and believe the constructive suggestions will substantially strengthen the clarity and quality of our work. We plan to submit a revised version of the manuscript and a full point-by-point response addressing both the public reviews and the recommendations to the authors.
  
  Reviewer 1.
  
  In revision, we plan to address the reviewer’s comments by: (i) strengthen the interpretation of model fit through reporting the proportion of healthy controls within and outside the extreme percentile bounds; (ii) adding age-resolved overlays of model-derived percentile curves compared to those from the full reference cohort for key sample sizes and regions; (iii) quantifying age-distribution alignment between train and test set; and (iv) summarizing model performance as a joint function of age-distribution alignment and sample size.
  
  Reviewer 2.
  
  In the revised manuscript, we will (i) expand the Discussion to more clearly outline the trade-offs between simple regression frameworks and hierarchical models for normative modeling (e.g., scalability, handling of multi-site variation, computational considerations), and discuss alternative approaches and harmonization as important directions for multi-site settings; (ii) contextualize OASIS-3 vs AIBL differences by quantifying train– test age-alignment across sampling strategies and emphasize that skewness should be interpreted relative to the target cohort’s alignment rather than absolute numbers. (iii) reassess sex-imbalance effects by reporting expected age distributions per condition and re-evaluate sex effects while controlling for age; (iv) investigate the apparent dip at n≈300 dip by increasing sub-sampling seeds, testing neighboring sample sizes, and using an alternative age-binning scheme to clarify the observed artifact; (v) clarify potential divergence between tOC separation and global fit under discrepancies in demographic distributions and relate tOC to age-alignment distance; (vi) reframe the sample-size guidance in terms of distributional alignment rather than an absolute n.
  
  AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.26.672402v2
www.biorxiv.org www.biorxiv.org

Degradation of LMO2 in T cell leukaemia results in collateral breakdown of transcription complex partners and causes LMO2-dependent apoptosis

3
1. Public_Reviews 13 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This important paper reports the development of proteins and small molecules that induce degradation of a clinically-relevant oncogenic transcription factor, LMO2. The findings provide a proof of concept that PROTAC-type chemicals can be developed against intrinsically disordered proteins. The methods provide a blueprint for rational design of PROTACs starting from intracellular antibody paratopes. Overall, the paper is supported by solid evidence and will be of interest to chemical biologists and cancer pharmacologists.
 
 Summary
2. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 Sereesongsaeng et al. aimed to develop degraders for LMO2, an intrinsically disordered transcription factor activated by chromosomal translocation in T-ALL. The authors first focused on developing biodegraders, which are fusions of an anti-LMO2 intracellular domain antibody (iDAb) with cereblon. Following demonstrations of degradation and collateral degradation of associated proteins with biodegraders, the authors proceeded to develop PROTACs using antibody paratopes (Abd) that recruit VHL (Abd-VHL) or cereblon (Abd-CRBN). The authors show dose-dependent degradation of LMO2 in LMO2+ T-ALL cell lines, as well as concomitant dose-dependent degradation of associated bHLH proteins in the DNA-binding complex. LMO2 degradation via Abd-VHL was also determined to inhibit proliferation and induce apoptosis in LMO2+ T-ALL cell lines.
 
 Strengths:
 
 The topic of degrader development for intrinsically disordered proteins is of high interest and the authors aimed to tackle a difficult drug target. The authors evaluated methods including the development of biodegraders, as well as PROTACs that recruit two different E3 ligases. The study includes important chemical control experiments, as well as proteomic profiling to evaluate selectivity.
 
 Weaknesses:
 
 Several weaknesses remain in this study:
 
 (1) The overall degradation achieved is not highly potent (although important proof-of-concept);
 
 (2) The mechanism of collateral degradation is not completely addressed. The authors acknowledge possible explanations, which would require mutagenesis and structural studies to further dissect;
 
 (3) The proteomics experiments do not detect LMO2, which the authors attribute to its size, making it difficult to interpret.
 
 Review 1
3. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors describe the degradation of an intrinsically disordered transcription factor (LMO2) via PROTACs (VHL and CRBN) in T-ALL cells. Given the challenges of drugging transcription factors, I find the work solid and a significant scientific contribution to the field.
 
 Strengths:
 
 (1) Validation of LMO2 degradation by starting with biodegraders, then progressing to chemical degrades.
 
 (2)interrogation of the biology and downstream pathways upon LMO2 degradation (collateral degradation §
 
 (3) Cell line models that are dependent/overexpression of LMO2 vs LMO2 null cell lines.
 
 (4) CRBN and VHL-derived PROTACs were synthesized and evaluated.
 
 Weaknesses:
 
 (1) The conventional method used to characterize PROTACs in the literature is to calculate the DC50 and Dmax of the degraders, I did not find this information in the manuscript.
 
 As noted in the reply to referee’s point 4 below, our first generation compounds are not highly potent. The DC50 values have been computed specifically using Western blot reflected in the data shown in Fig. 2. The revised version Supplementary Fig. S3 shows these quantified Western blot data from a time course of treating KOPT-K1 cells with either Abd-CRBN and Abd-VHL, where the 24 hour blot data are shown in Figure 2, G and E, and the quantified data from each 24 hour treatment are quantified in Supplementary Fig. S3). With these data, the DC50 values 9 μM for Abd-CRBN and 15 μM Abd-VHL), included in in the main text and the Supplementary Fig. S3 figure legend.
 
 In addition, the loss of signal of the LMO2-Rluc reporter protein from PROTAC treated cells shown in Fig. 2M has been used to calculate a half-point of degradation; although strictly not DC50, as it measures a reporter protein, this yielded values are 10 μM for Abd-CRBN and 9 μM Abd-VHL.
 
 (2) The proteomics data is not very convincing, and it is not clear why LMO2 does not show in the volcano plot (were higher concentrations of the PROTAC tested? and why only VHL was tested and not CRBN-based PROTAC?).
 
 Due to the relatively small size of the LMO2 protein, it is challenging to produce enough unique peptides for reliable identification, especially to distinguish some proteins in the LMO2 complex.
 
 (3) The correlation between degradation potency and cell growth is not well-established (compare Figure 4C: P12-Ichikawa blots show great degradation at 24 and 48 hrs, but it is unclear if the cell growth in this cell line is any better than in PF-382 or MOLT-16) - Can the authors comment on the correlation between degradation and cell growth?
 
 In this study (Fig. 4) we did not aim to compare the effect of LMO2 loss on cell growth among LMO2 positive cells. Rather, we aimed to evaluate the LMO2 importance for cell growth in LMO2-expressing T-ALL cells compared to non-expressing cells and to correlate the loss of the protein with this effect on the cell growth. In addition, the treatment of cells with the LMO2 compounds did now show an effect to LMO2 negative cells until at least 48 hours of treatment indicating that low toxicity of our PROTAC compounds and providing correlation between LMO2 loss and cell growth.
 
 (4) The PROTACs are not very potent (double-digit micromolar range?) - can the authors elaborate on any challenges in the optimization of the degradation potency?
 
 The Abd methodology to use intracellular domain antibodies to screen for compounds that bind to intrinsically disordered proteins such as the LMO2 transcription factors offers a tractable approach to hard drug targets but, in so doing, creates challenging factors to improve the potency that are not the same as those targets for which structural data are available. LMO2 is an intrinsically disordered protein, for which soluble recombinant protein is not readily available to identify the binding pocket of compounds. The potency has so far been optimized solely based on the different moieties substituted in cell-based SAR studies (http://advances.sciencemag.org/cgi/content/full/7/15/eabg1950/DC1 ) and all new compounds were tested with BRET assays. Thus, currently optimization of the degradation potency (including properties such as improved solubility) for the LMO2-binding compounds relies on chemical modification the three areas of the compounds indicated in Fig. 2 B,C.
 
 (5) The authors mentioned trying six iDAb-E3 ligase proteins; I would recommend listing the E3 ligases tried and commenting on the results in the main text.
 
 The six chimaeric iDAb-E3 ligase proteins involved one anti-LMO2 iDAb and three different E3 ligase where either fused at the N- or the C-terminus of the VH (giving six protein formats). These six fusion proteins were described in the text referring to the degrader studies described in Supplementary Fig. 1.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 Sereesongsaeng et al. aimed to develop degraders for LMO2, an intrinsically disordered transcription factor activated by chromosomal translocation in T-ALL. The authors first focused on developing biodegraders, which are fusions of an anti-LMO2 intracellular domain antibody (iDAb) with cereblon. Following demonstrations of degradation and collateral degradation of associated proteins with biodegraders, the authors proceeded to develop PROTACs using antibody paratopes (Abd) that recruit VHL (Abd-VHL) or cereblon (Abd-CRBN). The authors show dose-dependent degradation of LMO2 in LMO2+ T-ALL cell lines, as well as concomitant dose-dependent degradation of associated bHLH proteins in the DNA-binding complex. LMO2 degradation via Abd-VHL was also determined to inhibit proliferation and induce apoptosis in LMO2+ T-ALL cell lines.
 
 Strengths:
 
 The topic of degrader development for intrinsically disordered proteins is of high interest, and the authors aimed to tackle a difficult drug target. The authors evaluated methods, including the development of biodegraders, as well as PROTACs that recruit two different E3 ligases. The study includes important chemical control experiments, as well as proteomic profiling to evaluate selectivity.
 
 Weaknesses:
 
 The overall degradation is relatively weak, and the mechanism of potential collateral degradation is not thoroughly evaluated
 
 The purpose of the study was to evaluate effects of LMO2 degraders. The mechanism of the observed collateral degradation could not be investigated directly within the scope of our study. In the main text, discussed two possible, not exclusive, explanations. One being that our work (and previously published, cited work) indicates that the DNA-binding bHLH proteins have relatively short half file (Supplementary Fig. S12) and may therefore be subject to normal turnover when the LMO2, which is in the complex, turns over. Further, the known structure of the LMO2-bHLH interactions (from Omari et al, doi: 10.1016/j.celrep.2013.06.008) was also examined for the location of lysines in the TAL1 & E47 partners (Supplementary Fig. S11). It is possible that their local association with the LMO2-E3-ligase complex created by the PROTAC interaction, could cause their concurrent degradation. Mutagenesis and structural analysis would be needed to establish this point.
 
 In addition, experiments comparing the authors' prior work with their anti-LMO2 iDAb or Abl-L are lacking, which would improve our understanding of the potential advantages of a degrader strategy for LMO2.
 
 A major motivation behind developing the Antibody-derived (Abd) method to select compounds, which are surrogates of the antibody paratope, is because using iDAbs directly as inhibitors requires the development of delivery technologies for these macromolecules, as protein directly or as vectors or mRNA for their expression. Ultimately, high affinity anti-LMO2 iDAbs should directly be used as tractable inhibitors when delivery methods redeveloped. In the meantime, Abd compounds were envisaged as being surrogates suitable for development into reagents, and potentially drugs, by medicinal chemistry. We evaluated selected first generation LMO2-binding Abd compounds previously, finding their ability to interfere with LMO2-iDAb BRET signal to ECmax about 50% but these compounds do not have potency to have an effect on the interaction of LMO2 with a non-mutated iDAb (nM affinity). These data indicated that efficacy improvement for the PROTACs was needed. In addition, in the current study, we observed viability effects in T-ALL lines at high concentrations (20 μM) irrespective of LMO2 expression (Supplementary Fig. S 2A, B) These data indicated that efficacy improvement was needed and potentially converting the degraders (PROTACs) would add to in-cell potency. By adding the E3 ligase ligands, we found the toxicity of non-LMO2 expressing Jurkat was significantly reduced (Supplementary Fig. S 2E, F).
 
 Reviewer #2 (Recommendations for the authors):
 
 Suggestions for additional experiments:
 
 (1) The data presented is primarily focused on demonstrating targeted degradation of LMO2, with a focus on phenotypes such as proliferation and apoptosis. In this manuscript, there are limited comparative evaluations of anti-LMO2 iDAb or Abl-L to show the potential benefits of a degrader approach to their previously described work, as well as why targeted degradation is in fact, advantageous. For example, the authors' previous work has shown that anti-LMO2 iDAb inhibits tumor growth in a mouse transplantation model. Comparisons in vitro would be supportive of the importance of continued degrader optimization/development.
 
 we have previously shown that an anti-LMO2 scFv inhibits tumour growth in a mouse model but this work used an expressed scFv antibody that binds to LMO2 in nM range. The Abd compounds are much lower potency that the antibody and, because recombinant LMO2 is difficult to work with, we could only evaluate interactions of compounds with LMO2 in cell-based assays like BRET (LMO2-iDAb BRET). In this cell-based assay, the first generation Abd compounds do not have sufficient potency to block LMO2-iDAb interaction unless the affinity of the iDAb is reduced to sub-μM. The justification for proceeding on the degrader process rather than just using the protein-protein interaction (PPI) inhibition was based largely around the low potency of the first generation PPI compounds in cell assays and that incorporation protein degradation with PPI inhibition would enhance the efficacy.
 
 In addition, the viability experiments are also very short-term; is there a reason why the authors did not carry out these experiments for 3-5 days to fully understand the impacts on proliferation?
 
 In Supplementary Fig. S5, we did show assays up to 3 days. In KOPT-K1 (LMO2+), the LMO2 levels were reduced during the time course of this assay (from a single compound dose at time zero) (Supplementary Fig S 5A, B). We also show CellTitreGlo assays up to 3 days and, with these second generation compounds, we observed sustained effects on KOPT-K1 (LMO2+) but low non-DMSO toxicity in Jurkat (LMO2-) (revised version Supplementary (Fig S5 C, D).
 
 (2) The potential mechanism of collateral degradation is interesting and important in evaluating the on-target responses and consequences of degrading LMO2. At this time, the data supporting collateral degradation is limited and would be strengthened by showing that it is not due to a change in mRNA levels and not due to complex dissociation. Overall, the kinetics and depth of loss of complex members such as E47 in Figure 3 appear more substantial than LMO2 itself, and as presented, collateral degradation is not effectively demonstrated. In addition, to aid in the readers' assessments, additional background and references around the roles of TAL1 and E47 would be helpful. For example, structurally, where do they (and other associated proteins that are not degraded) fit in the complex?
 
 We have responded above in relation to the Public Review Comments and note that a structure of the complex was in submitted version (now revised version Supplementary Fig. S11).
 
 (3) In Figure 1A, the blots show decreased levels of endogenous CRBN with iDAB-CRBN. Is this a known consequence of this approach in these cell lines? Does the partial recovery of endogenous CRBN in KOPTK1 cells have any indication of iDAB-CRBN levels?
 
 We cannot be sure why the endogenous level of CRBN decreases in doxycycline treated cells. It has been shown (DOI:10.1371/journal.pone.0064561) that doxycycline used in the inducible expression system (and its derivatives), such as the lentivirus we used, has an effect to gene expression patterns, which can be increase or decrease expression. Although the published study did not examine CRBN expression, the effect might explain the CRBN expression decrease on doxycycline addition and remains the same level after that.
 
 (4) In Figure S7, the authors do not fully explain the results and why there is minimal rescue with epoxomicin (S7A) or MLN4924 (S7J). This could indicate an alternative mechanism of degradation and loss at play, given the lack of rescue. Can the authors comment on this discrepancy, and have they looked autophagy inhibitor or other agents to achieve the chemical rescue?
 
 In the experiments such as in revised version Supplementary Fig. S6, we used KOPT-K1 cells with a single concentration of the inhibitors and the cells may less susceptible to the epoxomicin (0.8 μM) but lenalidomide and free thalidomide restored the LMO2 levels fully. In the main text Fig. 3D, we also showed that including epoxomicin and thalidomide with the Abd-CRBN in KOPT-K1 and CCRF-CEM restore LMO2 levels, supporting the conclusion that the main mechanism of degradation is through ubiquitination proteosomal route.
 
 (5) For the proteomics data, it would be helpful to have the proteins in yellow highlighted to have them noted in 5D and 5E. In addition, can the authors comment on why LMO2 or their collateral targets are not confirmed in the table? Furthermore, 5C is difficult to interpret; if there are no significantly changing proteins in the Jurkat cells, why are there pathways that are identified?
 
 As mentioned in reply to referee 1, due to the relatively small size of the LMO2 protein, it is challenging to produce enough unique peptides for reliable identification, especially to distinguish some proteins in the LMO2 complex where expression levels are low.
 
 AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.09.627495v3
www.biorxiv.org www.biorxiv.org

Independent Validation of Transgenerational Inheritance of Learned Pathogen Avoidance in Caenorhabditis elegans

5
1. Public_Reviews 13 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This valuable study concerns a model for transgenerational epigenetic inheritance, the learned avoidance by C. elegans of the PA14 pathogenic strain of Pseudomonas aeruginosa. A recent study questioned whether transgenerational inheritance in this paradigm lacks robustness. The authors of this study have worked independently of the group that reported the original phenomenon and also independently of the group that challenged the original report. With solid data, this study independently validates findings previously reported by the Murphy group, confirming that the paradigm is reproducible elsewhere. The reviewers also appreciated the information on reagent sources used by different groups. The present study is therefore of broad interest to anyone studying genetics, epigenetics, or learned behavior.
 
 Summary
2. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The manuscript addresses the discordant reports of the Murphy (Moore et al., 2019; Kaletsky et al., 2020; Sengupta et al., 2024) and Hunter (Gainey et al., 2025) groups on the existence (or robustness) of transgenerational epigenetic inheritance (TEI) controlling learned avoidance of C. elegans to Pseudomonas aeruginosa. Several papers from Colleen Murphy's group describe and characterize C. elegans transgenerational inheritance of avoidance behaviour. In the hands of the Murphy group, the learned avoidance is maintained for up to four generations, however, Gainey et al. (2025) reported an inability to observe inheritance of learned avoidance beyond the F1 generation. Of note, Gainey et al used a modified assay to measure avoidance, rather than the standard assay used by the Murphy lab. A response from the Murphy group suggested that procedural differences explained the inability of Gainey et al.(2025) to observe TEI. They found two sources of variability that could explain the discrepancy between studies: the modified avoidance assay and bacterial growth conditions (Kaletsky et al., 2025). The standard avoidance assay uses azide as a paralytic to capture worms in their initial decision, while the assay used by the Hunter group does not capture the worm's initial decision but rather uses cold to capture the location of the population at one point in time.
 
 In this short report, Akinosho, Alexander, and colleagues provide independent validation of transgenerational epigenetic inheritance (TEI) of learned avoidance to P. aeruginosa as described by the Murphy group by demonstrating learned avoidance in the F2 generation. These experiments used the protocol described by the Murphy group, demonstrating reproducibility and robustness.
 
 Strengths:
 
 Despite the extensive analyses carried out by the Murphy lab, doubt may remain for those who have not read the publications or for those who are unfamiliar with the data, which is why this report from the Vidal-Gadea group is so important. The observation that learned avoidance was maintained in the F2 generation provides independent confirmation of transgenerational inheritance that is consistent with reports from the Murphy group. It is of note that Akinosho, Alexander et al. used the standard avoidance assay that incorporates azide, and followed the protocol described by the Murphy lab, demonstrating that the data from the Moore and Kaletsky publications are reproducible, in contrast to what has been asserted by the Hunter group.
 
 Comments on revised version:
 
 I am happy with the responses to reviews.
 
 Review 1
3. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The manuscript "Independent validation of transgenerational inheritance of learned pathogen avoidance in C. elegans" by Akinosho and Vidal-Gadea offers evidence that learned avoidance of the pathogen PA14 can be inherited for at least two generations. In spite of initial preference for the pathogen when exposed in a 'training session', 24 hours of feeding on this pathogen evoked avoidance. The data are robust, replicated in 4 trials, and the authors note that diminished avoidance is inherited in generations F1 and F2.
 
 Strengths:
 
 These results contrast with those reported by Gainey et al, who only observed intergenerational inheritance for a single generation. Although the authors' study does not explain why Gainey et el fail to reproduce the Murphy lab results, one possibility is that a difference in a media ingredient could be responsible.
 
 Comments on revised version:
 
 The responses to the reviewer comments appear reasonable for the most part.
 
 Review 2
4. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This short paper aims to provide an independent validation of the transgenerational inheritance of learned behaviour (avoidance) that has been published by the Murphy lab. The robustness of the phenotype has been questioned by the Hunter lab. In this paper, the authors present one figure showing that transgenerational inheritance can be replicated in their hands. Overall, it helps to shed some light on a controversial topic.
 
 Strengths:
 
 The authors clearly outline their methods, particularly regarding the choice of assay, so that attempting to reproduce the results should be straightforward. It is nice to see these results repeated in an independent laboratory.
 
 Comments on revised version:
 
 I'm happy with the response to reviewers.
 
 Review 3
5. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Author response
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public Review):
 
 Confirmation of daf-7::GFP data and inheritance beyond F2
 
 Reviewer suggested confirming daf-7::GFP molecular marker data and testing inheritance beyond the F2 generation to further strengthen the findings.
 
 We agree these experiments would provide valuable mechanistic insights into the molecular basis of transgenerational inheritance. However, our study was specifically designed as a reproducibility study focusing on the central controversy regarding F2 inheritance (Gainey et al. vs. Murphy lab findings). The daf-7::GFP molecular marker experiments, while important for understanding mechanisms, represent a different research question requiring extensive additional resources and expertise beyond the scope of this validation study. Our primary goal was to provide independent confirmation of the disputed F2 inheritance using standardized behavioral assays. It is our hope that future work will pursue these important mechanistic validations.
 
 "Exhaustive attempts" language
 
 Reviewer disagreed with characterizing Gainey et al.'s efforts as "exhaustive attempts" since they modified the original protocol.
 
 We revised this statement in the Results and Discussion to more accurately reflect the experimental situation: "In contrast, Gainey et al. (2025), representing the Hunter group, reported that while parental and F1 avoidance behaviors were evident, transgenerational inheritance was not reliably observed beyond the F1 generation under their experimental conditions."
 
 Importance of sodium azide
 
 Reviewer suggested including more discussion about the recent findings on the importance of sodium azide in the assay, referencing the Murphy group's response paper.
 
 We have prominently highlighted the critical role of sodium azide in our Introduction with strengthened language that emphasizes its importance for resolving the scientific controversy: "Critically, Kaletsky et al. (2025) demonstrated that omission of sodium azide during scoring can completely abolish detection of inherited avoidance, revealing that this key methodological difference may explain the conflicting results between laboratories. The use of sodium azide to immobilize worms at the moment of initial bacterial choice appears essential for capturing the inherited behavioral response. These findings highlight how seemingly minor methodological variations can dramatically impact detection of transgenerational inheritance and underscore the need for independent replication using standardized protocols."
 
 Protocol fidelity statement
 
 Reviewer requested a more direct statement clarifying that we followed the Murphy group protocol, noting that we made some modifications.
 
 We followed the core Murphy lab protocol with two evidence-based optimizations that preserve the essential experimental elements: 1) We used 400 mM sodium azide instead of 1 M based on preliminary data showing the higher concentration caused premature paralysis before worms could make behavioral choices, and 2) We used liquid NGM buffer instead of M9 to maintain chemical consistency with the solid NGM plates used for worm culture, minimizing potential osmotic stress. These modifications improved experimental reliability while maintaining the critical components: sodium azide immobilization, bacterial lawn density standardization (OD600 = 1.0), and synchronized scoring conditions that are essential for detecting inherited avoidance.
 
 Overstated dilution claim
 
 Reviewer noted that the statement about "gradual decrease" in avoidance strength was overstated and didn't reflect the actual data presented in the manuscript.
 
 We removed this statement.
 
 Environmental variables phrasing
 
 Reviewer found the sentence about environmental variables unclear, noting that Gainey et al. didn't actually acknowledge variability but saw it as indicating error or stochastic processes.
 
 We refined this statement for greater precision and clarity: "This underscores the assay's sensitivity to environmental variables, such as synchronization method and bacterial lawn density. This highlights the importance of consistency across experimental setups and support the view that context-dependent variation may underlie previously reported discrepancies."
 
 Reviewer #2 (Public Review):
 
 Reagent sourcing
 
 Reviewer suggested listing the sources of media ingredients with company names and catalog numbers, as this might be important for reproducibility.
 
 To ensure complete reproducibility, we created a comprehensive Table S3 listing all reagents, suppliers, and catalog numbers used in our experiments. This detailed information enables exact replication of our experimental conditions and addresses potential variability that might arise from different reagent sources between laboratories.
 
 Reviewer #3 (Public Review):
 
 Raw data transparency
 
 Reviewer noted that while a spreadsheet with choice assay results was provided, the individual raw data from assays was not included, which would be helpful for assessing sample sizes.
 
 We now provide complete experimental transparency through Table S2, which contains individual choice indices from all 138 assays conducted across four independent trials. This comprehensive dataset allows full assessment of our experimental outcomes, statistical robustness, and reproducibility while enabling other researchers to perform independent statistical analyses.
 
 F1/F2 assay disparity
 
 Reviewer questioned whether the higher number of F2 assays compared to F1 represented truly independent assays, asking if multiple F2 assays were performed from offspring of one F1 plate (which would not represent independent assays).
 
 We clarified this important statistical consideration in Methods (Transgenerational Testing): "Each behavioral assay was conducted using animals from a biologically independent growth plate. While F2 plates were derived from pooled embryos from multiple F1 parents, each assay represents an independent biological replicate with no reuse of animals across assays. F2 assays (n=45) exceeded F1 assays (n=20) due to PA14-induced fecundity reduction in trained worms, limiting the number of viable F1 progeny. The higher number of F2 assays reflects the greater reproductive success of healthy F1 animals and provides additional statistical power for population-level behavioral comparisons." We also enhanced our Controls section to clarify that "Our experimental design employed population-level comparisons across generations using unpaired statistical analyses, with no attempt to track individual lineages across generations."
 
 Methodological variations overstatement
 
 Reviewer felt the Introduction overstated the findings by suggesting the authors "address potential methodological variations," when they only used one assay setup throughout.
 
 We have corrected the Introduction to accurately reflect our study design and scope: "Here, we adapted the protocol established by the Murphy group, maintaining the critical use of sodium azide to paralyze worms at the time of choice, to test whether parental exposure to PA14 elicits consistent avoidance in subsequent generations. Our study specifically focuses on the transmission of learned avoidance through the F2 generation, beyond the intergenerational (F1) effect, because this is where divergence between published studies begins."
 
 Reviewer #1 (Recommendations for the authors):
 
 Worm numbers
 
 Reviewer noted that information about the number of worms used should be included in the training and choice assay methods section rather than separated.
 
 We clarified worm numbers and sample sizes in the Methods (Controls and Additional Considerations): "Each individual assay averaged 62 ± 43 animals (range: 15-150 worms per assay), with a total of 138 assays conducted across four independent experimental trials. The variation in worm numbers per assay reflects natural variation in worm recovery and immobilization efficiency during choice assays. We conducted an average of 8.5 assays per condition during each of the four replicates."
 
 Figure 1 legend and consistency
 
 Reviewer identified several issues: inconsistent terminology ("treated" vs "trained"), incorrect statistical test naming, missing p-value annotations, and need for consistency between figure and legend. We have systematically addressed all figure consistency and statistical annotation issues:
 
 Replaced inconsistent "treated" terminology with "trained" throughout
 
 Corrected the statistical test description to accurately reflect our analysis: "Kruskal-Wallis oneway ANOVA followed by Dunn's post hoc" which properly corresponds to the statistical tests detailed in Table S1
 
 Added explicit p-value annotations in the figure legend: "*p<0.05, **p<0.01 means and SEM shown (see Table S1 for statistics and Table S2 for raw data)"
 
 Ensured consistent terminology between figure and legend
 
 NGM vs. M9 buffer
 
 Reviewer questioned whether we used NGM buffer or M9 buffer for washing steps, noting that NGM isn't usually referred to as "buffer."
 
 We have prominently featured and thoroughly clarified our rationale for using liquid NGM buffer in the Methods (Synchronization of Worms section). The explanation now appears upfront in the methods: "We used liquid NGM buffer instead of M9 buffer (as specified in the original Murphy protocol) to maintain chemical consistency with the solid NGM culture plates. This modification minimizes potential osmotic stress since liquid NGM matches the pH (6.0) and ionic composition of the growth medium, whereas M9 buffer has a different pH (7.0) and ionic profile." We provide detailed chemical differences and explain that this modification maintains consistency with culture conditions while preserving essential experimental procedures.
 
 Grammar/typos
 
 Reviewer noted that the manuscript needed thorough proofreading to address grammatical errors and typographical mistakes.
 
 We have conducted comprehensive proofreading and editing throughout the manuscript to resolve grammatical and typographical errors. Specific improvements include: clarified sentence structure in the Introduction and Results sections, corrected technical terminology consistency, improved figure legend clarity, and enhanced overall readability while maintaining scientific precision.
 
 Sodium azide concentration
 
 Reviewer noted that our sodium azide concentration differed from the Moore paper and requested comment on this difference.
 
 We have included explicit justification for our sodium azide concentration choice in the Methods (Training and Choice Assay): "We used 400 mM sodium azide rather than the 1 M concentration reported by Moore et al. (2019) because preliminary trials showed that higher concentrations caused premature paralysis before worms could reach either bacterial spot, potentially biasing choice measurements. The 400 mM concentration provided sufficient immobilization while preserving the behavioral choice window."
 
 Reviewer #2 (Recommendations for the authors):
 
 Comparative reagent analysis
 
 Reviewer suggested creating a supplemental table comparing reagent sources between our study, Gainey et al., and Murphy et al., proposing that media ingredient differences might explain the discrepancies.
 
 While direct reagent comparison between laboratories was beyond the scope of this validation study, we recognize this as an important consideration for understanding experimental variability. Our comprehensive reagent sourcing information (Table S3) provides the foundation for future comparative studies. We encourage collaborative efforts to systematically compare reagent sources across laboratories, as media component differences could contribute to the experimental variability observed between research groups. Such analyses would be valuable for establishing standardized protocols across the field.
 
 Conclusion
 
 We hope that these revisions satisfactorily address the reviewers’ concerns. We believe these improvements significantly strengthened the manuscript's contribution to resolving this important scientific controversy.
 
 We thank the reviewers again for their invaluable insights and constructive feedback, which have substantially improved the quality and impact of our work.
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.03.647070v2
www.biorxiv.org www.biorxiv.org

Adult-neurogenesis allows for representational stability and flexibility in early olfactory system

4
1. Public_Reviews 13 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This paper presents a valuable theory and analysis of the role of neurogenesis and inhibitory plasticity in the drift of neural representations in the olfactory system. For one of the findings, regarding the impact of neurogenesis on the drift, the evidence remains incomplete. The reason lies in the differences in variability/drift of the mitral/tufted cell responses observed in the model compared to experimental observations, where these responses remain stable over extended time scales.
 
 Summary
2. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors build a network model of the olfactory bulb and the piriform cortex and use it to run simulations and test their hypotheses. Given the model's settings, the authors observe drift across days in the responses to the same odors of both the mitral/tufted cells, as well as of piriform cortex neurons. When representing the M/T and PCx responses within a lower-dimensional space, the apparent drift is more prominent in the PCx, while the M/T responses appear in comparison more stable. The authors further note that introducing spike-time dependent plasticity (STDP) at bulb synapses involving abGCs slows down the drift in the PCx representations, and further link this to the observation that repeated exposure to the same odorant slows down drift in the piriform cortex.
 
 The model is clearly explained and relies on several assumptions and observations:
 
 (1) Random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity.
 
 (2) Higher dimensionality of piriform cortex representations compared to M/T responses, which enables superior decoding of odor identity in the piriform cortex.
 
 (3) Spike time-dependent plasticity (STDP) at synapses involving the abGCs.
 
 The authors address an open topical problem, and the model is elegant in its simplicity. I have however, several major concerns with the hypotheses underlying the model and with its biological plausibility.
 
 Concerns:
 
 (1) In their model, the authors propose that MTC remain stable at the population level, despite changes in individual MTC responses.
 
 The authors cite several experimental studies to support their claims that individual MTC responses to the same odors change (some increase, some decrease) across days. Interpreting the results of these studies must, however, take into account the variability of M/T responses across odor presentation repeats within the same session vs. across sessions. In the Shani-Narkiss et al., Frontiers in Neural Circuits, 2023 study referenced, a large fraction of the variability across days in M/T responses is also observed across repeats to the same odorant in the same session (Shani-Narkiss et al., Figure 4), while the authors have M/T responses in the same session that are highly reproducible. This is an important point to consider and address, since it constrains how much of the variability in M/T responses can be attributed to adult neurogenesis in the olfactory bulb versus to other networks' inhibitory mechanisms, which do not rely on neurogenesis. In the authors' model, the variability in M/T responses observed across days emerges as a result of adult-born neurogenesis, which does not need to be the main source of variability observed in imaging experiments (Shani-Narkiss et al., Figure 4).
 
 Another study (Kato et al., Neuron, 2012, Figure 4) reported that mitral cell responses to odors experienced repeatedly across 7 days tend to sparsen and decrease in amplitude systematically, while mitral cell responses to the same odor on day 1 vs. day 7 when the odor is not presented repeatedly in between seem less affected (although the authors also reported a decrease in the CI for this condition). As such, Kato et al. mostly report decreases in mitral cell odor responses with repeated odor exposure at both the individual and population level, and not so much increases and decreases in the individual mitral cell responses, and stability at the population level.
 
 (2) In Figure 1, a set of GCs is killed off, and new GCs are integrated in the network as abGC. Following the elimination of 10% of GCs in the network, new cells are added and randomly assigned synaptic weights between these abGCs and MTC, GCs, SACs, and top-down projections from PCx. This is done for 11 days, during which time all GCs have gone through adult neurogenesis.
 
 Is the authors' assumption here that across the 11 days, all GCs are being replaced? This seems to depart from the known biology of the olfactory bulb granule cells, i.e., GCs survive for a large fraction of the animal's life.
 
 (3) The authors' model relies on several key assumptions: random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity. These assumptions are not necessarily accurate, as recent work revealed structure in the projections from the olfactory bulb to the piriform cortex and structure within the piriform cortex connectivity itself (Fink et al., bioRxiv, 2025; Chae et al., Cell, 2022; Zeppilli et al., eLife, 2021).
 
 How do the results of the model relating adult neurogenesis in the bulb to drift in the piriform cortex representations change when considering an alternative scenario in which the olfactory bulb to piriform and intra-piriform connectivity is not fully distributed and indistinguishable from random, but rather is structured?
 
 (4) I didn't understand the logic of the low-dimensional space analysis for M/T cells and piriform cortex neurons (Figures 2 & 3). In the authors' model, the full-ensemble M/T responses are reorganized over time, presumably due to the adult-born neurogenesis. Analyzing a lower-dimensional projection of the ensemble trajectories reveals a lower degree of re-organization. This is the same for the piriform cortex, but relatively, the piriform ensembles displayed in a low-dimensional embedding appear to drift more compared to the M/T ensembles.
 
 This analysis triggers a few questions: which representation is relevant for the brain function - the high or the low-dimensional projection? What fraction of response variance is included in the low-dimensional space analysis? How did the authors decide the low-dimensional cut-off? Why does STDP cause more drift in piriform cortex ensembles vs. M/T ensembles? Is this because of the assumed higher dimensionality of the piriform cortex representations compared to the mitral cells?
 
 (5) Could the authors comment whether STDP at abGC synapses and its impact on decreasing drift represent a new insight, and also put it into context? Several studies (e.g., Lledo, Murthy, Komiyama groups) reported that abGC integrates in the network in an activity-dependent manner, and not randomly, and as such stabilizes the active neuronal responses, which is consistent with the authors' report.
 
 Related, I couldn't find through the manuscript which synapses involving abGCs they focus on, or what is the relative contribution of the various plastic synapses shown in the cartoon from Figure 4 A1 (circles and triangles).
 
 6) The study would be strengthened, in my opinion, by including specific testable predictions that the authors' models make, which can be further food for thought for experimentalists. How does suppression of adult-born neurogenesis in the OB impact the stability of mitral cell odor responses? How about piriform cortex ensembles?
 
 Review 1
3. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors address a critical problem in olfactory coding. It has long been known that adult neurogenesis, specifically in the form of adult-born granule cells that embed into the existing inhibitory networks on the olfactory bulb, can potentially alter the responses of Mitral/Tufted neurons that project activity to the Piriform Cortex and to other areas of the brain. Fundamentally, it would seem that these granule cells could alter the stability of neural codes in the OB over time. The authors develop a spiking network model to explore how stability can be achieved both in the OB over time and in the PC, which receives inputs. The model recapitulates published activity recordings of M/T cells and shows how activity in different M/T cells from the same glomerulus shifts over time in ways that, in spite of the shift, preserve population/glomerular level codes. However, these different M/T cells fan out onto different pyramidal cells of the PC, which gives rise to instability at that level. STDP then, is necessary to maintain stability at the PC level as long as odor environments remain constant. These results may also apply to a similar neurogenesis-based change in the Dentate Gyrus, which generates instability in CA1/3 regions of the hippocampus
 
 Strengths:
 
 A robust network model that untangles important, seemingly contradictory mechanisms that underlie olfactory coding.
 
 Weaknesses:
 
 The work is a significant contribution to understanding olfactory coding. But the manuscript would benefit from a brief discussion of why neurogenesis occurs in the first place - e.g., injury, ongoing needs for plasticity, and adapting to turnover of ORNs. There is literature on this topic. It seems counterintuitive to have a process in the MOB (and for that matter in the DG) that potentially disrupts the ability to generate stable codes both in the MOB and PC, and in particular a disruption that requires two different mechanisms - multiple M/T cells per glomerulus in the MOB and STDP in the PC - to counteract.
 
 Given that neurogenesis has an important function, and a mechanism is in place to compensate for it in the MOB, why would it then be disrupted in fan-out projections to the PC? The answer may lie in the need for fan-out projections so that pyramidal neurons in the PC can combinatorially represent many different inputs from the MOB. So something like STDP would be needed to maintain stability in the face of the need for this coding strategy.
 
 This kind of discussion, or something like it, would help readers understand why these mechanisms occur in the first place. It is interesting that PC stability requires that odor environments be stable, and that this stability drives PC representational stability. This result suggests experimental work to test this hypothesis. As such, it is a novel outcome of the research.
 
 Review 2
4. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary
 
 The authors set out to explore the potential relationship between adult neurogenesis of inhibitory granule cells in the olfactory bulb and cumulative changes over days in odor-evoked spiking activity (representational drift) in the olfactory stream. They developed a richly detailed spiking neuronal network model based on Izhikevich (2003), allowing them to capture the diversity of spiking behaviors of multiple neuron types within the olfactory system. This model recapitulates the circuit organization of both the main olfactory bulb (MOB) and the piriform cortex (PCx), including connections between the two (both feedforward and corticofugal). Adult neurogenesis was captured by shuffling the weights of the model's granule cells, preserving the distribution of synaptic weights. Shuffling of granule cell connectivity resulted in cumulative changes in stimulus-evoked spiking of the model's M/T cells. Individual M/T cell tuning changed with time, and ensemble correlations dropped sharply over the temporal interval examined (long enough that almost all granule cells in the model had shuffled their weights). Interestingly, these changes in responsiveness did not disrupt low-dimensional stability of olfactory representations: when projected into a low-dimensional subspace, population vector correlations in this subspace remained elevated across the temporal interval examined. Importantly, in the model's downstream piriform layer, this was not the case. There, shuffled GC connectivity in the bulb resulted in a complete shift in piriform odor coding, including for low-dimensional projections. This is in contrast to what the model exhibited in the M/T input layer. Interestingly, these changes in PCx extended to the geometrical structure of the odor representations themselves. Finally, the authors examined the effect of experience on representational drift. Using an STDP rule, they allowed the inputs to and outputs from adult-born granule cells to change during repeated presentations of the same odor. This stabilized stimulus-evoked activity in the model's piriform layer.
 
 Strengths
 
 This paper suggests a link between adult neurogenesis in the olfactory bulb and representational drift in the piriform cortex. Using an elegant spiking network that faithfully recapitulates the basic physiological properties of the olfactory stream, the authors tackle a question of longstanding interest in a creative and interesting manner. As a purely theoretical study of drift, this paper presents important insights: synaptic turnover of recurrent inhibitory input can destabilize stimulus-evoked activity, but only to a degree, as representations in the bulb (the model's recurrent input layer) retain their basic geometrical form. However, this destabilized input results in profound drift in the model's second (piriform) layer, where both the tuning of individual neurons and the layer's overall functional geometry are restructured. This is a useful and important idea in the drift field, and to my knowledge, it is novel. The bulb is not the only setting where inhibitory synapses exhibit turnover (whether through neurogenesis or synaptic dynamics), and so this exploration of the consequences of such plasticity on drift is valuable. The authors also elegantly explore a potential mechanism to stabilize representations through experience, using an STDP rule specific to the inhibitory neurons in the input layer. This has an interesting parallel with other recent theoretical work on drift in the piriform (Morales et al., 2025 PNAS), in which STDP in the piriform layer was also shown to stabilize stimulus representations there. It is fascinating to see that this same rule also stabilizes piriform representations when implemented in the bulb's granule cells.
 
 The authors also provide a thoughtful discussion regarding the differential roles of mitral and tufted cells in drift in piriform and AON and the potential roles of neurogenesis in archicortex.
 
 In general, this paper puts an important and much-needed spotlight on the role of neurogenesis and inhibitory plasticity in drift. In this light, it is a valuable and exciting contribution to the drift conversation.
 
 Weaknesses
 
 I have one major, general concern that I think must be addressed to permit proper interpretation of the results.
 
 I worry that the authors' model may confuse thinking on drift in the olfactory system, because of differences in the behavior of their model from known features of the olfactory bulb. In their model, the tuning of individual bulbar neurons drifts over time. This is inconsistent with the experimental literature on the stability of odor-evoked activity in the olfactory bulb.
 
 In a foundational paper, Bhalla & Bower (1997) recorded from mitral and tufted cells in the olfactory bulb of freely moving rats and measured the odor tuning of well-isolated single units across a five-day interval. They found that the tuning of a single cell was quite variable within a day, across trials, but that this variability did not increase with time. Indeed, their measure of response similarity was equivalent within and across days. In what now reads as a prescient anticipation of the drift phenomenon, Bhalla and Bower concluded: "it is clear, at least over five days, that the cell is bounded in how it can respond. If this were not the case, we would expect a continual increase in relative response variability over multiple days (the equivalent of response drift). Instead, the degree of variability in the responses of single cells is stable over the length of time we have recorded." Thus, even at the level of single cells, this early paper argues that the bulb is stable.
 
 This basic result has since been replicated by several groups. Kato et al. (2012) used chronic two-photon calcium imaging of mitral cells in awake, head-fixed mice and likewise found that, while odor responses could be modulated by recent experience (odor exposure leading to transient adaptation), the underlying tuning of individual cells remained stable. While experience altered mitral cell odor responses, those responses recovered to their original form at the level of the single neuron, maintaining tuning over extended periods (two months). More recently, the Mizrahi lab (Shani-Narkiss et al., 2023) extended chronic imaging to six months, reporting that single-cell odor tuning curves remained highly similar over this period. These studies reinforce Bhalla and Bower's original conclusion: despite trial-to-trial variability, olfactory bulb neurons maintain stable odor tuning across extended timescales, with plasticity emerging primarily in response to experience. (The Yamada et al., 2017 paper, which the authors here cite, is not an appropriate comparison. In Yamada, mice were exposed daily to odor. Therefore, the changes observed in Yamada are a function of odor experience, not of time alone. Yamada does not include data in which the tuning of bulb neurons is measured in the absence of intervening experience.)
 
 Therefore, a model that relies on instability in the tuning of bulbar neurons risks giving the incorrect impression that the bulb drifts over time. This difference should be explicitly addressed by the authors to avoid any potential confusion. Perhaps the best course of action would be to fit their model to Mizrahi's data, should this data be available, and see if, when constrained by empirical observation, the model still produces drift in piriform. If so, this would dramatically strengthen the paper. If this is not feasible, then I suggest being very explicit about this difference between the behavior of the model and what has been shown empirically. I appreciate that in the data there is modest drift (e.g., Shani-Narkiss' Figure 8C), but the changes reported there really are modest compared to what is exhibited by the model. A compromise would be to simply apply these metrics to the model and match the model's similarity to the Shani-Narkiss data. Then the authors could ask what effect this has on drift in piriform.
 
 The risk here is that people will conclude from this paper that drift in piriform may simply be inherited from instability in the bulb. This view is inconsistent with what has been documented empirically, and so great care is warranted to avoid conveying that impression to the community.
 
 Major comments (all related to the above point)
 
 (1) Lines 146-168: The authors find in their model that "individual M/T cells changed their responses to the same odor across days due to adult-neurogenesis, with some cells decreasing the firing rate responses (Fig.2A1 top) while other cells increased the magnitude of their responses (Fig. 2A2 bottom, Fig. S2)" they also report a significant decrease in the "full ensemble correlation" in their model over time. They claim that these changes in individual cell tuning are "similar to what has been observed by others using calcium imaging of M/T cell activity (Kato et al., 2012 and Yamada et al., 2017)" and that the decrease in full ensemble correlation is "consistent with experimental observations (Yamada et al., 2017)." However, the conditions of the Kato and Yamada experiments that demonstrate response change are not comparable here, as odors were presented daily to the animals in these experiments. Therefore, the changes in odor tuning found in the Kato and Yamada papers (Kato Figure 4D; Yamada Figure 3E) are a function of accumulated experience with odor. This distinction is crucial because experience-induced changes reflect an underlying learning process, whereas changes that simply accumulate over time are more consistent with drift. The conditions of their model are more similar to those employed in other experiments described in Kato et al. 2012 (Figure 6C) as well as Shani-Narkiss et al. (2023), in which bulb tuning is measured not as a function of intervening experience, but rather as a function of time (Kato's "recovery" experiment). What is found in Kato is that even across two months, the tuning of individual mitral cells is stable. What alters tuning is experience with odor, the core finding of both the Kato et al., 2012 paper and also Yamada et al., 2017. It is crucial that this is clarified in the text.
 
 (2) The authors show that in a reduced-space correlation metric, the correlation of low-dimensional trajectories "remained high across all days"..."consistent with a recent experimental study" (Shani-Narkiss et al., 2023). It is true that in the Shani-Narkiss paper, a consistent low-dimensional response is found across days (t-SNE analysis in Shani-Narkiss Figure 7B). However, the key difference between the Shani-Narkiss data and the results reported here is that Shani-Narkiss also observed relative stability in the native space (Shani-Narkiss Figure 8). They conclude that they "find a relatively stable response of single neurons to odors in either awake or anesthetized states and a relatively stable representation of odors by the MC population as a whole (Figures 6-8; Bhalla and Bower, 1997)." This should be better clarified in the text.
 
 (3) In the discussion, the authors state that "In the MOB, individual M/T cells exhibited variable odor responses akin to gain control, altering their firing rate magnitudes over time. This is consistent with earlier experimental studies using calcium-imaging." (L314-6). Again, I disagree that these data are consistent with what has been published thus far. Changes in gain would have resulted in increased variability across days in the Bhalla data. Moreover, changes in gain would be captured by Kato's change index ("To quantify the changes in mitral cell responses, we calculated the change index (CI) for each responsive mitral cell-odor pair on each trial (trial X) of a given day as (response on trial X - the initial response on day 1)/(response on trial X + the initial response on day 1). Thus, CI ranges from −1 to 1, where a value of −1 represents a complete loss of response, 1 represents the emergence of a new response, and 0 represents no change." Kato et al.). This index will capture changes in gain. However, as shown in Figure 4D (red traces), Figure 6C (Recovery and Odor set B during odor set A experience and vice versa), the change index is either zero or near zero. If the authors wish to claim that their model is consistent with these data, they should also compute Kato's change index for M/T odor-cell pairs in their model and show that it also remains at 0 over time, absent experience.
 
 Review 3
Visit annotations in context

Tags

Summary

Review 3

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.02.601573v3
www.biorxiv.org www.biorxiv.org

Distinct cortical encoding of acoustic and electrical cochlear stimulation

4
1. Public_Reviews 13 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This valuable study compares auditory cortex responses to sounds and cochlear implant stimulation measured with surface electrode grids in rats. Beyond the reduced frequency resolution of cochlear implants observed previously, this study suggests key discrepancies between neuronal representations of cochlear stimulations and natural sounds. However, the evidence for this potentially interesting result is incomplete because there is a lack of evidence for the effectiveness of the comparison method. This study is of interest to researchers in the auditory neuroscience field and clinicians implementing treatments with cochlear implants.
  
  Summary
2. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This manuscript addresses an important question: whether cortical population codes for cochlear-implant (CI) stimulation resemble those for natural acoustic input or constitute a qualitatively different representation. The authors record intracranial EEG (µECoG) responses to pure tones in normal-hearing rats and to single-channel CI pulses in bilaterally deafened, acutely implanted rats, analysing the data with ERP/high-gamma measures, tensor component analysis (TCA), and information-theoretic decoding. Across several readouts, the acoustic condition supports better single-trial stimulus classification than the CI condition. However, stronger decoding does not, on its own, establish that the acoustic responses instantiate a "richer" cortical code, and the evidence for orderly spatial organisation is not compelling for CI, and is also less evident than expected for normal-hearing, given prior knowledge. The overall narrative is interesting, but at present, the conclusions outpace the data because of statistical, methodological, and presentation issues.
  
  Strengths:
  
  The study poses a timely, clinically relevant question with clear implications for CI strategy. The analytical toolkit is appropriate: µECoG captures mesoscale patterns; TCA offers a transparent separation of spatial and temporal structure; and mutual-information decoding provides an interpretable measure of single-trial discriminability. Within-subject recordings in a subset of animals, in principle, help isolate modality effects from inter-animal variability. Where analyses are most direct, the acoustic condition yields higher single-trial decoding accuracy, which is a meaningful and clearly presented result.
  
  Weaknesses:
  
  Several limitations constrain how far the conclusions can be taken. Parts of the statistical treatment do not match the data structure: some comparisons mix paired and unpaired animals but are analysed as fully paired, raising concerns about misestimated uncertainty. Methodological reporting is incomplete in places; essential parameters for both acoustic and electrical stimulation, as well as objective verification of implantation and deafening, are not described with sufficient detail to support confident interpretation or replication. Figure-level clarity also undermines the message. In Figure 2, non-significant slopes for CI, repeated identification of a single "best channel," mismatched axes, and unclear distinctions between example and averaged panels make the assertion of spatial organisation unconvincing; importantly, the normal-hearing panels also do not display tonotopy as clearly as expected, which weakens the key contrast the paper seeks to establish. Finally, the decoding claims would be strengthened by simple internal controls, such as within-modality train/test splits and decoding on raw ERP/high-gamma features to demonstrate that poor cross-modal transfer reflects genuine differences in the underlying responses rather than limitations of the modelling pipeline.
  
  Review 1
3. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This article reports measurements of iEEG signals on the rat auditory cortex during cochlear implant or sound stimulation in separate groups of rats. The observations indicate some spatial organization of cochlear implant stimuli, but that is very different from cochlear implants.
  
  Strengths:
  
  The study includes interesting analyses of the sound and cochlear implant representation structure based on decoders.
  
  Weaknesses:
  
  The observation that responses to cochlear implant stimulation (stimulation) are spatially organized is not new (e.g., Adenis et al. 2024).
  
  The claim that spatial and temporal dimensions contribute information about the sound is also not new; there is a large literature on this topic. Moreover, the results shown here are extremely weak. They show similar levels of information in the spatial and temporal dimensions, and no synergy between the two dimensions. This is however, likely the consequence of high measurement noise leading to poor accuracy in the information estimates, as the authors state.
  
  The main claim of the study - the mismatch between cochlear implant and sound representation - is not supported. The responses to each modality are measured in different animals. The authors do not show that they actually can compare representations across animals (e.g., for the same sounds). Without this positive control, there is no reason to think that it is possible to decode from one animal with a decoder trained on another, and the negative result shown by the authors is therefore not surprising.
  
  Review 2
4. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Through micro-electroencephalography, Hight and colleagues studied how the auditory cortex in its ensemble responds to cochlear implant stimulation compared to the classic pure tones. Taking advantage of a double-implanted rat model (Micro-ECoG and Cochlear Implant), they tracked and analyzed changes happening in the temporal and spatial aspects of the cortical evoked responses in both normal hearing and cochlear-implanted animals. After establishing that single-trial responses were sufficient to encode the stimuli's properties, the authors then explored several decoder architectures to study the cortex's ability to encode each stimulus modality in a similar or different manner. They conclude that a) intracranial EEG evoked responses can be accurately recorded and did not differed between normal hearing and cochlear-implanted rats; b) Although coarsely spatially organized, CI-evoked responses had higher trial-by-trial variability than pure tones; c) Stimulus identity is independently represented by temporal and spatial aspect of cortical representations and can be accurately decoded by various means from single trials; d) and that Pure tones trained decoder can't decode CI-stimulus identity accurately.
  
  Strength:
  
  The model combining micro-eCoG and cochlear implantation and the methodology to extract both the Event Related Potentials (ERPs) and High-Gammas (HGs) is very well designed and appropriately analyzed. Likewise, the PCA-LDA and TCA-LDA are powerful tools that take full advantage of the information provided by the cortical ensembles.
  
  The overall structure of the paper, with a paced and exhaustive progress through each step and evolution of the decoder, is very appreciable and easy to follow. The exploration of single-trial encoding and stimulus identity through temporal and spatial domains is providing new avenues to characterize the cortical responses to CI stimulations and their central representation. The fact that single trials suffice to decode the stimulus identity regardless of their modality is of great interest and noteworthy. Although the authors confirm that iEEG remains difficult to transpose in the clinic, the insights provided by the study confirm the potential benefit of using central decoders to help in clinic settings.
  
  Weaknesses:
  
  The conclusion of the paper, especially the concept of distinct cortical encoding for each modality, is unfortunately partially supported by the results, as the authors did not adequately consider fundamental limitations of CI-related stimulation.
  
  First, the reviewer assumed that the authors stimulated in a Monopolar mode, which, albeit being clinically relevant, notoriously generates a high current spread in rodent models. Second, comparing the averaged BF maps for iEEG (Figure 2A, C), BFs ranged from 4 to 16kHz with a predominance of 4kHz BFs. The lack of BFs at higher frequencies hints at a potential location mismatch between the frequency range sampled at the level of the cortex (low to medium frequencies) and the frequency range covered by the CI inserted mostly in the first turn-and-a-half of the cochlea (high to medium frequencies). Looking at Figure 2F (and to some extent 2A), most of the CI electrodes elicited responses around the 4kHz regions, and averaged maps show a predominance of CI-3-4 across the cortex (Figure 2C, H) from areas with 4kHz BF to areas with 16kHz BF. It is doubtful that CI-3-4 are located near the 4kHz region based on Müller's work (1991) on the frequency representation in the rat cochlea.
  
  Taken together with the Pearsons correlations being flat, the decoder examples showing a strong ability to identify CI-4 and 3 and the Fig-8D, E presenting a strong prediction of 4kHz and 8kHz for all the CI electrodes when using a pure tone trained decoder, it is possible that current spread ended stimulating indistinctly higher turns of the cochlea or even the modiolus in a non-specific manner, greatly reducing (or smearing) the place-coding/frequency resolution of each electrode, which in turn could explain the coarse topographic (or coarsely tonotopic according to the manuscript) organization of the cortical responses. Thus, the conclusion that there are distinct encodings for each modality is biased, as it might not account for monopolar smearing. To that end, and since it is the study's main message and title, it would have benefited from having a subgroup of animals using bipolar stimulations (or any focused strategy since they provide reduced current spread) to compare the spatial organization of iEEG responses and the performances of the different decoders to dismiss current spread and strengthen their conclusion.
  
  Nevertheless, the reviewer wants to reiterate that the study proposed by Hight et al. is well constructed, relevant to the field, and that the overall proposal of improving patient performances and helping their adaptation in the first months of CI use by studying central responses should be pursued as it might help establish new guidelines or create new clinical tools.
  
  Review 3
Visit annotations in context

Tags

Summary

Review 3

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.01.668170v1
www.biorxiv.org www.biorxiv.org

Frequency and Laminar Profile of Feature Specific Visual Activity Revealed by Interleaved EEG-fMRI

4
1. Public_Reviews 13 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This important study uses simultaneous EEG and fMRI recordings to shed light on the relationship between alpha and gamma oscillations and specific cortical layers. The sophisticated methodology provides solid evidence for correlations between oscillatory power and the strength and contents of fMRI signals in different cortical layers, though some caveats remain. This paper will be of interest to neuroscientists studying the role and mechanisms of alpha and gamma oscillations.
 
 Summary
2. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 In this manuscript, Clausner and colleagues use simultaneous EEG and fMRI recordings to clarify how visual brain rhythms emerge across layers of early visual cortex. They report that gamma activity correlates positively with feature-specific fMRI signals in superficial and deep layers. By contrast, alpha activity generally correlated negatively with fMRI signals, with two higher frequencies within the alpha reflecting feature-specific fMRI signals. This feature-specific alpha code indicates an active role of alpha oscillations in visual feature coding, providing compelling evidence that the functions of alpha oscillations go beyond cortical idling or feature-unspecific suppression.
 
 The study is very interesting and timely. Methodologically, it is state-of-the-art. The findings on a more active role of alpha activity that goes beyond the classical idling or suppression accounts are in line with recent findings and theories. In sum, this paper makes a very nice contribution. I still have a few comments that I outline below, regarding the data visualization, some methodological aspects, and a couple of theoretical points.
 
 (1) The authors put a lot of effort into the figure design. For instance, I really like Figure 1, which conveys a lot of information in a nice way. Figures 3 and 4, however, seem overengineered, and it takes a lot of time to distill the contents from them. The fact that they have a supplementary figure explaining the composition of these figures already indicates that the authors realized this is not particularly intuitive. First of all, the ordering of the conditions is not really intuitive. Second, the indication of significance through saturation does not really work; I have a hard time discerning the more and less saturated colors. And finally, the white dots do not really help either. I don't fully understand why they are placed where they are placed (e.g., in Figure 3). My suggestion would be to get rid of one of the factors (I think the voxel selection threshold could go: the authors could run with one of the stricter ones, and the rest could go into the supplement?) and then turn this into a few line plots. That would be so much easier to digest.
 
 (2) The division between high- and low-frequency alpha in the feature-specific signal correspondence is very interesting. I am wondering whether there is an opposite effect in the feature-unspecific signal correspondence. Would the high-frequency alpha show less of a feature-unspecific correlation with the BOLD?
 
 (3) In the discussion (line 330 onwards), the authors mention that low-frequency alpha is predominantly related to superficial layers, referencing Figure 4A. I have a hard time appreciating this pattern there. Can the authors provide some more information on where to look?
 
 (4) How did the authors deal with the signal-to-noise ratio (SNR) across layers, where the presence of larger drain veins typically increases BOLD (and thereby SNR) in superficial layers? This may explain the pattern of feature-unspecific effects in the alpha (Figure 3). Can the authors perform some type of SNR estimate (e.g., split-half reliability of voxel activations or similar) across layers to check whether SNR plays a role in this general pattern?
 
 (5) The GLM used for modelling the fMRI data included lots of regressors, and the scanning was intermittent. How much data was available in the end for sensibly estimating the baseline? This was not really clear to me from the methods (or I might have missed it). This seems relevant here, as the sign of the beta estimates plays a major role in interpreting the results here.
 
 (6) Some recent research suggests that gamma activity, much in contrast to the prevailing view of the mechanism for feedforward information propagation, relates to the feedback process (e.g., Vinck et al., 2025, TiCS). This view kind of fits with the localization of gamma to the deep layer here?
 
 (7) Another recent review (Stecher et al., 2025, TiNS) discusses feature-specific codes in visual alpha rhythms quite a bit, and it might be worth discussing how your results align with the results reported there.
 
 Review 1
3. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 The authors address a long-standing controversy regarding the functional role of neural oscillations in cortical computations and layer-specific signalling. Several studies have implicated gamma oscillations in bottom-up processing, while lower-frequency oscillations have been associated with top-down signalling. Therefore, the question the authors investigate is both timely and theoretically relevant, contributing to our understanding of feedforward and feedback communication in the brain. This paper presents a novel and complicated data acquisition technique, the application of simultaneous EEG and fMRI, to benefit from both temporal and spatial resolution. A sophisticated data analysis method was executed in order to understand the underlying neural activity during a visual oddball task. Figures are well-designed and appropriately represent the results, which seem to support the overall conclusions. However, some of the claims (particularly those regarding the contribution of gamma oscillations) feel somewhat overstated, as the results offer indeed some significant evidence, but most seem more like a suggestive trend. Nonetheless, the paper is well-written, addresses a relevant and timely research question, introduces a novel and elegant analysis approach, and presents interesting findings. Further investigation will be important to strengthen and expand upon these insights.
 
 One of the main strengths of the paper lies in the use of a well-established and straightforward experimental paradigm (the visual oddball task). As a result, the behavioural effects reported were largely expected and reassuring to see replicated. The acquisition technique used is very novel, and while this may introduce challenges for data analysis, the authors appear to have addressed these appropriately.
 
 Later findings are very interesting, and mainly in line with our current understanding of feedback and feedforward signalling. However, the layer weight calculation is lacking in the manuscript. While it is discussed in the methods, it would help to briefly explain in the results how these weights are calculated, so that the reader can better follow what is being interpreted.
 
 Line 104 states there is one virtual channel per hemisphere for low and high frequencies. It may be helpful to include the number of channels (n=4) in the results section, as specified in the methods. Also, this raises the question of whether a single virtual channel (i.e., voxel) provides sufficient information for reproducibility.
 
 One area that would benefit from further clarification is the interpretation of gamma oscillations. The evidence for gamma involvement in the observed effects appears somewhat limited. For example, no significant gamma-related clusters were found for the feature-unspecific BOLD signal (Figure 2). Significant effects emerged only when the analysis was restricted to positively responding voxels, and even then, only for the contrast between EEG-coherent and EEG-incoherent conditions in the feature-specific BOLD response. It remains unclear how to interpret this selective emergence of gamma-related effects. Given previous literature linking gamma to feedforward processing, one might expect more robust involvement in broader, feature-unspecific contrasts. The current discussion presents the gamma-related findings with some confidence, and the manuscript would benefit from a more nuanced reflection on why these effects may not have appeared more broadly. The explanation provided in line 230, that restricting the analysis to positively responding voxels may have increased the SNR, is reasonable, but it may not fully account for the absence of gamma effects in V1's feature-unspecific response. Including the actual beta values from Figure 4 in the legend or main text would also help readers better assess the strength and specificity of the reported effects.
 
 Relating to behavioural findings for underlying neural activity, could the authors test on a trial-by-trial basis how behavioural performance relates to the BOLD signal / oscillatory activity change? Line 305 states that "Since behavioural performance in the present study was consistently high at 94% on average and participants were instructed to respond quickly to potential oddball stimuli, a higher alpha frequency might reflect a more successful stimulus encoding and hence faster and more accurate behavioural performance." Also, this might help to relate the findings to the lower vs upper alpha functionality difference.
 
 In Figure 4, the EEG alpha specificity plot shows relatively large error bars, and there is visible overlap between the lower and upper alpha in both congruent and incongruent conditions. While upper alpha shows a positive slope across conditions and lower alpha remains flat, the interaction appears to be driven by the change from congruent to incongruent in upper alpha. It is worth clarifying whether the simple effects (e.g., lower vs upper within each condition) were tested, given the visual similarity at the incongruent condition. Overall, the significant interaction (p < 0.001, FDR-corrected) is consistent with diverging trends, but a breakdown of simple effects would help interpret the result more clearly. Was there a significant difference between lower and upper alpha in congruent or incongruent conditions?
 
 Overall, this study provides a valuable contribution to the literature on oscillatory dynamics and laminar fMRI, though some interpretations would benefit from further clarification or qualification.
 
 Review 2
4. Public_Reviews 13 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 Clausner et al. investigate the relationship between cortical oscillations in the alpha and gamma bands and the feature-specific and feature-unspecific BOLD signals across cortical layers. Using a well-designed stimulus and GLM, they show a method by which different BOLD signals can be differentiated and investigated alongside multiple cortical oscillatory frequencies. In addition to the previously reported positive relationship between gamma and BOLD signals in superficial layers, they show a relationship between gamma and feature-specific BOLD in the deeper layers. Alpha-band power is shown to have a negative relationship with the negative BOLD response for both feature-specific and feature-unspecific contrasts. When separated into lower (8-10Hz) and upper (11-13Hz) alpha oscillations, they show that higher frequency alpha showed a significantly stronger negative relationship with congruency, and can therefore be interpreted as more feature-specific than lower frequency alpha.
 
 Strengths:
 
 The use of interleaved EEG-fMRI has provided a rich dataset that can be used to evaluate the relationship of cortical layer BOLD signals with multiple EEG frequencies. The EEG data were of sufficient quality to see the modulation of both alpha-band and gamma-band oscillations in the group mean VE-channel TFS. The good EEG data quality is backed up with a highly technical analysis pipeline that ultimately enables the interpretation of the cortical layer relationship of the BOLD signal with a range of frequencies in the alpha and gamma bands. The stimulus design allowed for the generation of multiple contrasts for the BOLD signal and the alpha/gamma oscillations in the GLM analysis. Feature-specific and unspecific BOLD contrasts are used with congruently or incongruently selected EEG power regressors to delineate between local and global alpha modulations. A transparent approach is used for the selection of voxels contributing to the final layer profiles, for which statistical analysis is comprehensive but uses an alternative statistical test, which I have not seen in previous layer-fMRI literature.
 
 A significant negative relationship between alpha-band power and the BOLD signal was seen in congruently (EEGco) selected voxels (predominantly in superficial layers) and in feature-contrast (EEGco-inco) selected (superficial and deep layers). When separated into lower (8-10Hz) and upper (11-13Hz) alpha oscillations, they show that higher frequency alpha showed a significantly stronger negative relationship with congruency than lower frequency alpha. This is interpreted as a frequency dissociation in the alpha-BOLD relationship, with upper frequency alpha being feature-specific and lower frequency alpha corresponding to general modulation. These results are a valuable addition to the current literature and improve our current understanding of the role of cortical alpha oscillations.
 
 There is not much work in the literature on the relationship between alpha power and the negative BOLD response (NBR), so the data provided here are particularly valuable. The negative relationship between the NBR and alpha power shown here suggests that there is a reduction in alpha power, linked to locally reduced BOLD activity, which is in line with the previously hypothesized inhibitory nature of alpha.
 
 Weaknesses:
 
 It is not entirely clear how the draining vein effect seen in GE-BOLD layer-fMRI data has been accounted for in the analysis. For the contrast of congruent-incongruent, it is assumed that the underlying draining effect will be the same for both conditions, and so should be cancelled out. However, for the other contrasts, it is unclear how the final layer profiles aren't confounded by the bias in BOLD signal towards the superficial layers. Many of the profiles in Figure 3 and Figure 4A show an increased negative correlation between alpha power and the BOLD signal towards the superficial layers.
 
 When investigating if high alpha (8-10 Hz) and low alpha (11-13 Hz) are two different sources of alpha, it would be beneficial to show if this effect is only seen at the group level or can be seen in any single subjects. Inter-subject variability in peak alpha power could result in some subjects having a single low alpha peak and some a single high alpha peak rather than two peaks from different sources.
 
 The figure layout used to present the main findings throughout is an innovative way to present so much information, but it is difficult to decipher the main findings described in the text. The readability would be improved if the example (Appendix 0 - Figure 1) in the supplementary material is included as a second panel inside Figure 3, or, if this is not possible, the example (Appendix 0 - Figure 1) should be clearly referred to in the figure caption.
 
 Review 3
Visit annotations in context

Tags

Summary

Review 3

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.31.605816v2
www.biorxiv.org www.biorxiv.org

Allocentric and egocentric cues constitute an internal reference frame for real-world visual search

4
1. Public_Reviews 13 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This important study shows that visual search for upright and rotated objects is affected by rotating participants in a VR and gravitational reference frame. However, the evidence supporting this conclusion is incomplete, given the authors' use of normalized response time and the assumption that object recognition across rotations requires mental rotation.
  
  Summary
2. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The current study sought to understand which reference frames humans use when doing visual search in naturalistic conditions. To this end, they had participants do a visual search task in a VR environment while manipulating factors such as object orientation, body orientation, gravitational cues, and visual context (where the ground is). They generally found that all cues contributed to participants' performance, but visual context and gravitational cues impacted performance the most, suggesting that participants represent space in an allocentric reference frame during visual search.
  
  Strengths:
  
  The study is valuable in that it sheds light on which cues participants use during visual search. Moreover, I appreciate the use of VR and precise psychophysical predictions (e.g., slope vs. intercept) to dissociate between possible reference frames.
  
  Weaknesses:
  
  It's not clear what the implications of the study are beyond visual search. Moreover, I have some concerns about the interpretation of Experiment 1, which relies on an incorrect interpretation of mental rotation. Thus, most of the conclusions rely on Experiment 2, which has a small sample size (n = 10). Finally, the statistical analyses could be strengthened with measures of effect size and non-parametric statistics.
  
  Review 1
3. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This paper addresses an interesting issue: how is the search for a visual target affected by its orientation (and the viewer's) relative to other items in the scene and gravity? The paper describes a series of visual search tasks, using recognizable targets (e.g., a cat) positioned within a natural scene. Reaction times and accuracy at determining whether the target was present or absent, trial-to-trial, were measured as the target's orientation, that of the context, and of the viewer themselves (via rotation in a flight simulator) were manipulated. The paper concludes that search is substantially affected by these manipulations, primarily by the reference frame of gravity, then visual context, followed by the egocentric reference frame.
  
  Strengths:
  
  This work is on an interesting topic, and benefits from using natural stimuli in VR / flight simulator to change participants' POV and body position.
  
  Weaknesses:
  
  There are several areas of weakness that I feel should be addressed.
  
  (1) The literature review/introduction seems to be lacking in some areas. The authors, when contemplating the behavioral consequences of searching for a 'rotated' target, immediately frame the problem as one of rotation, per se (i.e., contrasting only rotation-based explanations; "what rotates and in which 'reference frame[s]' in order to allow for successful search?"). For a reader not already committed to this framing, many natural questions arise that are worth addressing.
  
  1a) Why do we need to appeal to rotation at all as opposed to, say, familiarity? A rotated cat is less familiar than a typically oriented one. This is a long-standing literature (e.g., Wang, Cavanagh, and Green (1994)), of course, with a lot to unpack.
  
  1b) What are the triggers for the 'corrective' rotation that presumably brings reference frames back into alignment? What if the rotation had not been so obvious (i.e. for a target that may not have a typical orientation, like a hand, or a ball, or a learned, nonsense object?) or the background had not had such clear orientation (like a cluttered non-naturalistic background of or a naturalistic backdrop, but viewed from an unfamiliar POV (e.g., from above) or a naturalistic background, but not all of the elements were rotated)? What, ultimately, is rotated? The entire visual field? Does that mean that searching for multiple targets at different angles of rotation would interfere with one another?
  
  1c) Relatedly, what is the process by which the visual system comes to know the 'correct' rotation? (Or, alternatively, is 'triggered to realize' that there is a rotation in play?) Is this something that needs to be learned? Is it only learned developmentally, through exposure to gravity? Could it be learned in the context of an experiment that starts with unfamiliar stimuli?
  
  1d) Why the appeal to natural images? I appreciate any time a study can be moved from potentially too stripped-down laboratory conditions to more naturalistic ones, but is this necessary in the present case? Would the pattern of results have been different if these were typical laboratory 'visual search' displays of disconnected object arrays?
  
  1e) How should we reconcile rotation-based theories of 'rotated-object' search with visual search results from zero gravity environments (e.g., for a review, see Leone (1998))?
  
  1f) How should we reconcile the current manipulations with other viewpoint-perspective manipulations (e.g., Zhang & Pan (2022))?
  
  (2) The presentation/interpretation of results would benefit from more elaboration and justification.
  
  2a) All of the current interpretations rely on just the RT data. First, the RT results should also be presented in natural units (i.e., seconds/ms), not normalized. As well, results should be shown as violin plots or something similar that captures distribution - a lot of important information is lost when just presenting one 'average' dot across participants. More fundamentally, I think we need to have a better accounting for performance (percent correct or d') to help contextualize the RT results. We should at least be offered some visualization (Heitz, 2014) of the speed accuracy trade-off for each of the conditions. Following this, the authors should more critically evaluate how any substantial SAT trends could affect the interpretation of results.
  
  2b) Unless I am missing something, the interpretation of the pattern of results (both qualitatively and quantitatively in their 'relative weight' analysis) relies on how they draw their contrasts. For instance, the authors contrast the two 'gravitational' conditions (target 0 deg versus target 90 deg) as if this were a change in a single variable/factor. But there are other ways to understand these manipulations that would affect contrasts. For instance, if one considers whether the target was 'consistent' (i.e., typically oriented) with respect to the context, egocentric, and gravitational frames, then the 'gravitational 0 deg' condition is consistent with context, egocentric view, but inconsistent with gravity. And, the 'gravitational 90 deg' condition, then, is inconsistent with context, egocentric view, but consistent with gravity. Seen this way, this is not a change in one variable, but three. The same is true of the baseline 0 deg versus baseline 90 deg condition, where again we have a change in all three target-consistency variables. The 'one variable' manipulations then would be: 1) baseline 0 versus visual context 0 (i.e., a change only in the context variable); 2) baseline 0 versus egocentric 0 (a change only in the egocentric variable); and 3) baseline 0 versus gravitational 0 (a change only in the gravitational variable). Other contrasts (e.g., gravitational 90 versus context 90) would showcase a change in two variables (in this case, a change in both context and gravity). My larger point is, again, unless I am really missing something, that the choice of how to contrast the manipulations will affect the 'pattern' of results and thereby the interpretation. If the authors agree, this needs to be acknowledged, plausible alternative schemes discussed, and the ultimate choice of scheme defended as the most valid.
  
  2c) Even with this 'relative weight' interpretation, there are still some patterns of results that seem hard to account for. Primarily, the egocentric condition seems hard to account for under any scheme, and the authors need to spend more time discussing/reconciling those results.
  
  2d) Some results are just deeply counterintuitive, and so the reader will crave further discussion. Most saliently for me, based on the results of Experiment 2 (specifically, the fact that gravitational 90 had better performance than gravitational 0), designers of cockpits should have all gauges/displays rotate counter to the airplane so that they are always consistent with gravity, not the pilot. Is this indeed a fair implication of the results?
  
  2e) I really craved some 'control conditions' here to help frame the current results. In keeping with the rhetorical questions posed above in 1a/b/c/d, if/when the authors engage with revisions to this paper, I would encourage the inclusion of at least some new empirical results. For me the most critical would be to repeat some core conditions, but with a symmetric target (e.g. a ball) since that would seem to be the only way (given the current design) to tease out nuisance confounding factors such as, say, the general effect of performing search while sideways (put another way, the authors would have to assume here that search (non-normalized RT's and search performance) for a ball-target in the baseline condition would be identical to that in the gravitational condition.)
  
  Review 2
4. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  The study tested how people search for objects in natural scenes using virtual reality. Participants had to find targets among other objects, shown upright or tilted. The main results showed that upright objects were found faster and more accurately. When the scene or body was rotated, performance changed, showing that people use cues from the environment and gravity to guide search.
  
  The manuscript is clearly written and well designed, but there are some aspects related to methods and analyses that would benefit from stronger support.
  
  First, the sample size is not justified with a power analysis, nor is it explained how it was determined. This is an important point to ensure robustness and replicability.
  
  Second, the reaction time data were processed using different procedures, such as the use of the median to exclude outliers and an ad hoc cut-off of 50 ms. These choices are not sufficiently supported by a theoretical rationale, and could appear as post-hoc decisions.
  
  Third, the mixed-model analyses are overall well-conducted; however, the specification of the random structure deserves further consideration. The authors included random intercepts for participants and object categories, which is appropriate. However, they did not include random slopes (e.g., for orientation or set size), meaning that variability in these effects across participants was not modelled. This simplification can make the models more stable, but it departs from the maximal random structure recommended by Barr et al. (2013). The authors do not explicitly justify this choice, and a reviewer may question why participant-specific variability in orientation effects, for example, was not allowed. Given the modest sample sizes (20 in Experiment 1 and 10 in Experiment 2), convergence problems with more complex models are likely. Nonetheless, ignoring random slopes can, in principle, inflate Type I error rates, so this issue should at least be acknowledged and discussed.
  
  Review 3
Visit annotations in context

Tags

Summary

Review 3

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.14.648618v2
www.biorxiv.org www.biorxiv.org

Absence of Systematic Effects of Internalizing Psychopathology on Learning Under Uncertainty

3
1. Public_Reviews 13 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This study provides important results with regard to the ongoing debate of the relationship between internalizing psychopathology and learning under uncertainty. The methods and analyses are solid, and the results are backed by a large sample size, yet the study could still benefit from a more detailed discussion about the difference in experimental design and analysis compared to previous studies. If these concerns are addressed, this study would be of interest to researchers in clinical and computational psychiatry for the behavioral markers of psychopathological symptoms.
  
  Summary
2. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  The authors conducted a series of experiments using two established decision-making tasks to clarify the relationship between internalizing psychopathology (anxiety and depression) and adaptive learning in uncertain and volatile environments. While prior literature has reported links between internalizing symptoms - particularly trait anxiety - and maladaptive increases in learning rates or impaired adjustment of learning rates, findings have been inconsistent. To address this, the authors designed a comprehensive set of eight experiments that systematically varied task conditions. They also employed a bifactor analysis approach to more precisely capture the variance associated with internalizing symptoms across anxiety and depression. Across these experiments, they found no consistent relationship between internalizing symptoms and learning rates or task performance, concluding that this purported hallmark feature may be more subtle than previously assumed.
  
  Strengths:
  
  (1) A major strength of the paper lies in its impressive collection of eight experiments, which systematically manipulated task conditions such as outcome type, variability, volatility, and training. These were conducted both online and in laboratory settings. Given that trial conditions can drive or obscure observed effects, this careful, systematic approach enables a robust assessment of behavior. The consistency of findings across online and lab samples further strengthens the conclusions.
  
  (2) The analyses are impressively thorough, combining model-agnostic measures, extensive computational modeling (e.g., Bayesian, Rescorla-Wagner, Volatile Kalman Filter), and assessments of reliability. This rigor contributes meaningfully to broader methodological discussions in computational psychiatry, particularly concerning measurement reliability.
  
  (3) The study also employed two well-established, validated computational tasks: a game-based predictive inference task and a binary probabilistic reversal learning task. This choice ensures comparability with prior work and provides a valuable cross-paradigm perspective for examining learning processes.
  
  (4) I also appreciate the open availability of the analysis code that will contribute substantially to the field using similar tasks.
  
  Weakness:
  
  (1) While the overall sample size (N = 820 across eight experiments) is commendable, the number of participants per experiment is relatively modest, especially in light of the inherent variability in online testing and the typically small effect sizes in correlations with mental health traits (e.g., r = 0.1-0.2). The authors briefly acknowledge that any true effects are likely small; however, the rationale behind the sample sizes selected for each experiment is unclear. This is especially important given that previous studies using the predictive inference task (e.g., Seow & Gillan, 2020, N > 400; Loosen et al., 2024, N > 200) have reported non-significant associations between trait anxiety symptoms and learning rates.
  
  (2) The motivation for focusing on the predictive inference task is also somewhat puzzling, given that no cited study has reported associations between trait anxiety and parameters of this task. While this is mitigated by the inclusion of a probabilistic reversal learning task, which has a stronger track record in detecting such effects, the study misses an opportunity to examine whether individual differences in learning-related measures correlate across the two tasks, which could clarify whether they tap into shared constructs.
  
  (3) The parameterization of the tasks, particularly the use of high standard deviations (SDs) of 20 and 30 for outcome distributions and hazard rates of 0.1 and 0.16, warrants further justification. Are these hazard rates sufficiently distinct? Might the wide SDs reduce sensitivity to volatility changes? Prior studies of the circle version of this predictive inference task (e.g., Vaghi et al., 2019; Seow & Gillan, 2020; Marzuki et al., 2022; Loosen et al., 2024; Hoven et al., 2024) typically used SDs around 12. Indeed, the Supplementary Materials suggest that variability manipulations did not seem to substantially affect learning rates (Figure S5)-calling into question whether the task manipulations achieved their intended cognitive effects.
  
  (4) Relatedly, while the predictive inference task showed good reliability, the reversal learning task exhibited only "poor-to-moderate" reliability in its learning-rate estimates. Given that previous findings linking anxiety to learning rates have often relied on this task, these reliability issues raise concerns about the robustness and generalizability of conclusions drawn from it.
  
  (5) As the authors note, the study relies on a subclinical sample. This limits the generalizability of the findings to individuals with diagnosed disorders. A growing body of research suggests that relationships between cognition and symptomatology can differ meaningfully between general population samples and clinical groups. For example, Hoven et al. (2024) found differing results in the predictive inference task when comparing OCD patients, healthy controls, and high- vs. low-symptom subgroups.
  
  (6) Finally, the operationalization of internalizing symptoms in this study appears to focus on anxiety and depression. However, obsessive-compulsive disorder is also generally considered an internalizing disorder, which presents a gap in the current cited literature of the paper, particularly when there have been numerous studies with the predictive inference task and OCD/compulsivity (e.g., Vaghi et al., 2019; Seow & Gillan, 2020; Marzuki et al., 2022; Loosen et al., 2024; Hoven et al., 2024), rather than trait anxiety per se.
  
  Overall:
  
  Despite the named limitations, the authors have done very impressive work in rigorously examining the relationship between anxiety/internalizing symptoms and learning rates in commonly used decision-making tasks under uncertainty. Their conclusion is well supported by the consistency of their null findings across diverse task conditions, though its generalizability may be limited by some features of the task design and its sample. This study provides strong evidence that will guide future research, whether by shifting the focus of examining dysfunctions of larger effect sizes or by extending investigations to clinical populations.
  
  Review 1
3. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this work, the authors recruited a large sample of participants to complete two well-established paradigms: the predictive inference task and the volatile reversal learning task. With this dataset, they not only replicated several classical findings on uncertainty-based learning from previous research but also demonstrated that individual differences in learning behavior are not systematically associated with internalizing psychopathology. These results provide valuable large-scale evidence for this line of research.
  
  Strengths:
  
  (1) Use of two different tasks.
  
  (2) Recruitment of a large sample of participants.
  
  (3) Inclusion of multiple experiments with different conditions, demonstrating strong scientific rigor.
  
  Weaknesses:
  
  Below are questions rather than 'weaknesses':
  
  (1) This study uses a large human sample, which is a clear strength. However, was the study preregistered? It would also be useful to report a power analysis to justify the sample size.
  
  (2) Previous studies have tested two core hypotheses: (a) that internalizing psychopathology is associated with overall higher learning rates, and (b) that it is associated with learning rate adaptation. In the first experiment, the findings seem to disconfirm only the first hypothesis. I found it unclear how, in the predator task, participants were expected to adjust their learning rate to adapt to volatility. Could the authors clarify this point?
  
  (3) According to the Supplementary Information, Model 13 showed the best fit, yet the authors selected Model 12 due to the larger parameter variance in Model 13. What would the results of Model 13 look like? Furthermore, do Models 12 and 13 correspond to the optimal models identified by Gagne et al. (2020)? Please clarify.
  
  (4) In the Discussion, the authors addressed both task reliability and parameter reliability. However, the term reliability seems to be used differently in these two contexts. For example, good parameter recovery indicates strong reliability in one sense, but can we then directly equate this with parameter reliability? It would be helpful to define more precisely what is meant by reliability in each case.
  
  (5) The Discussion also raises the possibility that limited reliability may represent a broader challenge facing the interdisciplinary field of computational psychiatry. What, in the authors' view, are the key future directions for the field to mitigate this issue?
  
  Review 2
Visit annotations in context

Tags

Summary

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.05.12.653409v1
www.biorxiv.org www.biorxiv.org

MerQuaCo: a computational tool for quality control in image-based spatial transcriptomics

4
1. Public_Reviews 13 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This valuable study describes MerQuaCo, a computational and automatic quality control tool for spatial transcriptomics datasets. The authors have collected a remarkable number of tissues to construct the main algorithm. The compelling strength of the evidence is demonstrated through a combination of empirical observations, automated computational approaches, and validation against existing software packages. MerQuaCo will interest researchers who routinely perform spatial transcriptomic imaging (especially MERSCOPE), as it provides an imperfection detector and quality control measures for reliable and reproducible downstream analysis.
  
  Summary
2. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors present MerQuaCo, a computational tool that fills a critical gap in the field of spatial transcriptomics: the absence of standardized quality control (QC) tools for image-based datasets. Spatial transcriptomics is an emerging field where datasets are often imperfect, and current practices lack systematic methods to quantify and address these imperfections. MerQuaCo offers an objective and reproducible framework to evaluate issues like data loss, transcript detection variability, and efficiency differences across imaging planes.
  
  Strengths:
  
  (1) The study draws on an impressive dataset comprising 641 mouse brain sections collected on the Vizgen MERSCOPE platform over two years. This scale ensures that the documented imperfections are not isolated or anecdotal but represent systemic challenges in spatial transcriptomics. The variability observed across this large dataset underscores the importance of using sufficiently large sample sizes when benchmarking different image-based spatial technologies. Smaller datasets risk producing misleading results by over-representing unusually successful or unsuccessful experiments. This comprehensive dataset not only highlights systemic challenges in spatial transcriptomics but also provides a robust foundation for evaluating MerQuaCo's metrics. The study sets a valuable precedent for future quality assessment and benchmarking efforts as the field continues to evolve.
  
  (2) MerQuaCo introduces thoughtful metrics and filters that address a wide range of quality control needs. These include pixel classification, transcript density, and detection efficiency across both x-y axes (periodicity) and z-planes (p6/p0 ratio). The tool also effectively quantifies data loss due to dropped images, providing tangible metrics for researchers to evaluate and standardize their data. Additionally, the authors' decision to include examples of imperfections detectable by visual inspection but not flagged by MerQuaCo reflects a transparent and balanced assessment of the tool's current capabilities.
  
  Weaknesses:
  
  (1) The study focuses on cell-type label changes as the main downstream impact of imperfections. Broadening the scope to explore expression response changes of downstream analyses would offer a more complete picture of the biological consequences of these imperfections and enhance the utility of the tool.
  
  (2) While the manuscript identifies and quantifies imperfections effectively, it does not propose post-imaging data processing solutions to correct these issues, aside from the exclusion of problematic sections or transcript species. While this is understandable given the study is aimed at the highest quality atlas effort, many researchers don't need that level of quality to compare groups. It would be important to include discussion points as to how those cut-offs should be decided for a specific study.
  
  (3) Although the authors demonstrate the applicability of MerQuaCo on a large MERFISH dataset, and the limited number of sections from other platforms, it would be helpful to describe its limitations in its generalizability.
  
  Review 1
3. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors present MerQuaCo, a computational tool for quality control in image-based spatial transcriptomic, especially MERSCOPE. They assessed MerQuaCo on 641 slides that are produced in their institute in terms of the ratio of imperfection, transcript density, and variations of quality by different planes (x-axis).
  
  Strengths:
  
  This looks to be a valuable work that can be a good guideline of quality control in future spatial transcriptomics. A well-controlled spatial transcriptomics dataset is also important for the downstream analysis.
  
  Weaknesses:
  
  The results section needs to be more structured.
  
  Review 2
4. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  MerQuaCo is an open-source computational tool developed for quality control in image-based spatial transcriptomics data, with a primary focus on data generated by the Vizgen MERSCOPE platform. The authors analyzed a substantial dataset of 641 fresh-frozen adult mouse brain sections to identify and quantify common imperfections, aiming to replace manual quality assessment with an automated, objective approach, providing standardized data integrity measures for spatial transcriptomics experiments.
  
  Strengths:
  
  The manuscript's strengths lie in its timely utility, rigorous empirical validation, and practical contributions to methodology and biological discovery in spatial transcriptomics.
  
  Weaknesses:
  
  While MerQuaCo demonstrates utility in large datasets and cross-platform potential, its generalizability and validation require expansion, particularly for non-MERSCOPE platforms and real-world biological impact.
  
  Review 3
Visit annotations in context

Tags

Summary

Review 3

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.04.626766v1
www.biorxiv.org www.biorxiv.org

MerQuaCo: a computational tool for quality control in image-based spatial transcriptomics

4
1. Public_Reviews 13 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This study provides a valuable contribution to spatial transcriptomics by introducing MerQuaCo, a computational tool for standardizing quality control in image-based spatial transcriptomics datasets. The tool addresses the lack of consensus in the field and provides robust metrics to identify and quantify common imperfections in datasets. The work is supported by an impressive dataset and compelling analyses, and will be of significant interest to researchers focused on data reproducibility and downstream analysis reliability in spatial transcriptomics.
  
  Summary
2. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  The authors present MerQuaCo, a computational tool that fills a critical gap in the field of spatial transcriptomics: the absence of standardized quality control (QC) tools for image-based datasets. Spatial transcriptomics is an emerging field where datasets are often imperfect, and current practices lack systematic methods to quantify and address these imperfections. MerQuaCo offers an objective and reproducible framework to evaluate issues like data loss, transcript detection variability, and efficiency differences across imaging planes.
  
  Strengths
  
  (1) The study draws on an impressive dataset comprising 641 mouse brain sections collected on the Vizgen MERSCOPE platform over two years. This scale ensures that the documented imperfections are not isolated or anecdotal but represent systemic challenges in spatial transcriptomics. The variability observed across this large dataset underscores the importance of using sufficiently large sample sizes when benchmarking different image-based spatial technologies. Smaller datasets risk producing misleading results by over-representing unusually successful or unsuccessful experiments. This comprehensive dataset not only highlights systemic challenges in spatial transcriptomics but also provides a robust foundation for evaluating MerQuaCo's metrics. The study sets a valuable precedent for future quality assessment and benchmarking efforts as the field continues to evolve.
  
  (2) MerQuaCo introduces thoughtful metrics and filters that address a wide range of quality control needs. These include pixel classification, transcript density, and detection efficiency across both x-y axes (periodicity) and z-planes (p6/p0 ratio). The tool also effectively quantifies data loss due to dropped images, providing tangible metrics for researchers to evaluate and standardize their data. Additionally, the authors' decision to include examples of imperfections detectable by visual inspection but not flagged by MerQuaCo reflects a transparent and balanced assessment of the tool's current capabilities.
  
  Comments on revisions:
  
  All previous concerns have been fully addressed. The revised manuscript presents a robust, well-documented, and user-friendly tool for quality control in image-based spatial transcriptomics, a rapidly advancing area where objective assessment tools are urgently needed.
  
  Review 1
3. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  MerQuaCo is an open-source computational tool developed for quality control in image-based spatial transcriptomics data, with a primary focus on data generated by the Vizgen MERSCOPE platform. The authors analyzed a substantial dataset of 641 fresh-frozen adult mouse brain sections to identify and quantify common imperfections, aiming to replace manual quality assessment with an automated, objective approach, providing standardized data integrity measures for spatial transcriptomics experiments.
  
  Strengths:
  
  The manuscript's strengths lie in its timely utility, rigorous empirical validation, and practical contributions to methodology and biological discovery in spatial transcriptomics.
  
  Weaknesses:
  
  While MerQuaCo demonstrates utility in large datasets and cross-platform potential, its generalizability and validation are currently limited by the availability of sufficient datasets from non-MERSCOPE platforms and non-brain tissues. The evaluation of data imperfections' impact on downstream analyses beyond cell typing (e.g., differential expression, spatial statistics, and cell-cell interactions) is also constrained by space and scope. However, these represent valuable directions for future work as more datasets become available.
  
  Review 2
4. Public_Reviews 13 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  The authors present MerQuaCo, a computational tool that fills a critical gap in the field of spatial transcriptomics: the absence of standardized quality control (QC) tools for image-based datasets. Spatial transcriptomics is an emerging field where datasets are often imperfect, and current practices lack systematic methods to quantify and address these imperfections. MerQuaCo offers an objective and reproducible framework to evaluate issues like data loss, transcript detection variability, and efficiency differences across imaging planes.
  
  Strengths:
  
  (1) The study draws on an impressive dataset comprising 641 mouse brain sections collected on the Vizgen MERSCOPE platform over two years. This scale ensures that the documented imperfections are not isolated or anecdotal but represent systemic challenges in spatial transcriptomics. The variability observed across this large dataset underscores the importance of using sufficiently large sample sizes when benchmarking different image-based spatial technologies. Smaller datasets risk producing misleading results by over-representing unusually successful or unsuccessful experiments. This comprehensive dataset not only highlights systemic challenges in spatial transcriptomics but also provides a robust foundation for evaluating MerQuaCo's metrics. The study sets a valuable precedent for future quality assessment and benchmarking efforts as the field continues to evolve.
  
  (2) MerQuaCo introduces thoughtful metrics and filters that address a wide range of quality control needs. These include pixel classification, transcript density, and detection efficiency across both x-y axes (periodicity) and z-planes (p6/p0 ratio). The tool also effectively quantifies data loss due to dropped images, providing tangible metrics for researchers to evaluate and standardize their data. Additionally, the authors' decision to include examples of imperfections detectable by visual inspection but not flagged by MerQuaCo reflects a transparent and balanced assessment of the tool's current capabilities.
  
  Weaknesses:
  
  (1) The study focuses on cell-type label changes as the main downstream impact of imperfections. Broadening the scope to explore expression response changes of downstream analyses would offer a more complete picture of the biological consequences of these imperfections and enhance the utility of the tool.
  
  Here, we focused on the consequences of imperfections on cell-type labels, one common use for spatial transcriptomics datasets. Spatial datasets are used for so many other purposes that there are almost endless ways in which imperfections could impact downstream analyses. It is difficult to see how we might broaden the scope to include more downstream effects, while providing enough analysis to derive meaningful conclusions, all within the scope of a single paper. Existing studies bring some insight into the impact of imperfections and we expect future studies will extend our understanding of consequences in other biological contexts.
  
  (2) While the manuscript identifies and quantifies imperfections effectively, it does not propose post-imaging data processing solutions to correct these issues, aside from the exclusion of problematic sections or transcript species. While this is understandable given the study is aimed at the highest quality atlas effort, many researchers don't need that level of quality to compare groups. It would be important to include discussion points as to how those cut-offs should be decided for a specific study.
  
  Studies differ greatly in their aims and, as a result, the impact of imperfections in the underlying data will differ also, preventing us from offering meaningful guidance on how cut-offs might best be identified. Rather, our aim with MerQuaCo was to provide researchers with tools to generate information on their spatial datasets, to facilitate downstream decisions on data inclusion and cut-offs.
  
  (3) Although the authors demonstrate the applicability of MerQuaCo on a large MERFISH dataset, and the limited number of sections from other platforms, it would be helpful to describe its limitations in its generalizability.
  
  In figure 9, we addressed the limitations and generalizability of MerQuaCo as best we could with the available datasets. Gaining deep insight into the limitations and generalizability of MerQuaCo would require application to multiple large datasets and, to the best of our knowledge, these datasets are not available.
  
  Reviewer #2 (Public review):
  
  The authors present MerQuaCo, a computational tool for quality control in image-based spatial transcriptomic, especially MERSCOPE. They assessed MerQuaCo on 641 slides that are produced in their institute in terms of the ratio of imperfection, transcript density, and variations of quality by different planes (x-axis).
  
  Strengths:
  
  This looks to be a valuable work that can be a good guideline of quality control in future spatial transcriptomics. A well-controlled spatial transcriptomics dataset is also important for the downstream analysis.
  
  Weaknesses:
  
  The results section needs to be more structured.
  
  We have split the ‘Transcript density’ subsection of the results into 3 new subsections.
  
  Reviewer #3 (Public review):
  
  MerQuaCo is an open-source computational tool developed for quality control in imagebased spatial transcriptomics data, with a primary focus on data generated by the Vizgen MERSCOPE platform. The authors analyzed a substantial dataset of 641 freshfrozen adult mouse brain sections to identify and quantify common imperfections, aiming to replace manual quality assessment with an automated, objective approach, providing standardized data integrity measures for spatial transcriptomics experiments.
  
  Strengths:
  
  The manuscript's strengths lie in its timely utility, rigorous empirical validation, and practical contributions to methodology and biological discovery in spatial transcriptomics.
  
  Weaknesses:
  
  While MerQuaCo demonstrates utility in large datasets and cross-platform potential, its generalizability and validation require expansion, particularly for non-MERSCOPE platforms and real-world biological impact.
  
  We agree that there is value in expanding our analyses to non-Merscope platforms, to tissues other than brain, and to analyses other than cell typing. The limiting factor in all these directions is the availability of large enough datasets to probe the limits of MerQuaCo. We look forward to a future in which more datasets are available and it’s possible to extend our analyses
  
  Reviewer #1(Recommendation for the Author):
  
  (1) To better capture the downstream impacts of imperfections, consider extending the analysis to additional metrics, such as specificity variation across cell types, gene coexpression, or spatial gene patterning. This would deepen insights into how these imperfections shape biological interpretations and further demonstrate the versatility of MerQuaCo.
  
  These are compelling ideas, but we are unable to study so many possible downstream impacts in sufficient depth in a single study. Insights into these topics will likely come from future studies.
  
  (2) In Figure 7 legend, panel label (D) is repeated thus panels E-F are mislabelled.
  
  We have corrected this error.
  
  (3) Ensure that the image quality is high for the figures.
  
  We will upload Illustrator files, ensuring that images are at full resolution.
  
  Reviewer #2 (Recommendation for the Author):
  
  (1) A result subsection "Transcript density" looks too long. Please provide a subsection heading for each figure.
  
  We have split this section into 3 with new subheadings.
  
  (2) The result subsection title "Transcript density" sounds ambiguous. Please provide a detailed title describing what information this subsection contains.
  
  We have renamed this section ‘Differences in transcript density between MERSCOPE experiments’.
  
  Minor:
  
  (1) There is no explanation of the black and grey bars in Figure 2A.
  
  We have added information to the figure legend, identifying the datasets underlying the grey and black bars.
  
  (2) In the abstract, the phrase "High-dimension" should be "High-dimensional".
  
  We have changed ‘high-dimension’ to ‘high-dimensional’.
  
  (3) In the abstract, "Spatial results" is an unclear expression. What does it stand for?
  
  We have replaced the term ‘spatial results’ with ‘the outputs of spatial transcriptomics platforms’.
  
  Reviewer #3 (Recommendation for the Author):
  
  (1) While the tool claims broad applicability, validation is heavily centered on MERSCOPE data, with limited testing on other platforms. The authors should expand validation to include more diverse platforms and add a small analysis of non-brain tissue. If broader validation isn't feasible, modify the title and abstract to reflect the focus on the mouse brain explicitly.
  
  We agree that expansion to other platforms is desirable, but to the best of our knowledge sufficient datasets from other platforms are not available. In the abstract, we state that ‘… we describe imperfections in a dataset of 641 fresh-frozen adult mouse brain sections collected using the Vizgen MERSCOPE.’
  
  (2) The impact of data imperfections on downstream analysis needs a more comprehensive evaluation. The authors should expand beyond cluster label changes to include a) differential expression analysis with simulated imperfections, b) impact on spatial statistics and pattern detection, and c) effects on cell-cell interactions.
  
  Each of these ideas could support a substantial study. We are unable to do them justice in the limited space available as an addition to the current study.
  
  (3) The pixel classification workflow and validation process need more detailed documentation.
  
  The methods and results together describe the workflow and validation in depth. We are unclear what details are missing.
  
  (4) The manuscript lacks comparison to existing. QC pipelines such as Squidpy and Giotto. The authors should benchmark MerQuaCo against them and provide integration options with popular spatial analysis tools with clear documentation.
  
  To the best of our knowledge, Squidpy and Giotto lack QC benchmarks, certainly of the parameters characterized by MerQuaCo. Direct comparison isn’t possible.
  
  AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.04.626766v2
www.biorxiv.org www.biorxiv.org

Ribosomal RNA synthesis by RNA polymerase I is subject to premature termination of transcription

5
1. Public_Reviews 10 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This manuscript characterizes a mutated clone of RNA polymerase I in yeast, referred to as SuperPol, to understand the mechanisms of RNA polymerase I elongation and termination. The authors present convincing evidence that demonstrates the existence of premature termination in Pol I transcription. Overall, the characterization of this RNA pol I offers important insights into the regulation of ribosomal RNA transcription and its potential application in cancer pharmacology.
  
  [Editors' note: this paper was reviewed by Review Commons.]
  
  Summary
2. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The study characterises an RNA polymerase (Pol) I mutant (RPA135-F301S) named SuperPol. This mutant was previously shown to increase yeast ribosomal RNA (rRNA) production by Transcription Run-On (TRO). In this work, the authors confirm this mutation increases rRNA transcription using a slight variation of the TRO method, Transcriptional Monitoring Assay (TMA), which also allows the analysis of partially degraded RNA molecules. The authors show a reduction of abortive rRNA transcription in cells expressing the SuperPol mutant and a modest occupancy decrease at the 5' region of the rRNA genes compared to WT Pol I. These results suggest that the SuperPol mutant displays a lower frequency of premature termination. Using in vitro assays, the authors found that the mutation induces an enhanced elongation speed and a lower cleavage activity on mismatched nucleotides at the 3' end of the RNA. Finally, SuperPol mutant was found to be less sensitive to BMH-21, a DNA intercalating agent that blocks Pol I transcription and triggers the degradation of the Pol I subunit, Rpa190. Compared to WT Pol I, short BMH-21 treatment has little effect on SuperPol transcription activity, and consequently, SuperPol mutation decreases cell sensitivity to BMH-21.
  
  Significance:
  
  The work further characterises a single amino acid mutation of one of the largest yeast Pol I subunits (RPA135-F301S). While this mutation was previously shown to increase rRNA synthesis, the current work expands the SuperPol mutant characterisation, providing details of how RPA135-F301S modifies the enzymatic properties of yeast Pol I. In addition, their findings suggest that yeast Pol I transcription can be subjected to premature termination in vivo. The molecular basis and potential regulatory functions of this phenomenon could be explored in additional studies.
  
  Our understanding of rRNA transcription is limited, and the findings of this work may be interesting to the transcription community. Moreover, targeting Pol I activity is an open strategy for cancer treatment. Thus, the resistance of SuperPol mutant to BMH-21 might also be of interest to a broader community, although these findings are yet to be confirmed in human Pol I and with more specific Pol I inhibitors in future.
  
  Comments on revision:
  
  The authors' response addressed all the points I raised adequately.
  
  Review 1
3. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This article presents a study on a mutant form of RNA polymerase I (RNAPI) in yeast, referred to as SuperPol, which demonstrates increased rRNA production compared to the wild-type enzyme. While rRNA production levels are elevated in the mutant, RNAPI occupancy as detected by CRAC is reduced at the 5' end of rDNA transcription units. The authors interpret these findings by proposing that the wild-type RNAPI pauses in the external transcribed spacer (ETS), leading to premature transcription termination (PTT) and degradation of truncated rRNAs by the RNA exosome (Rrp6). They further show that SuperPol's enhanced activity is linked to a lower frequency of PTT events, likely due to altered elongation dynamics and reduced RNA cleavage activity, as supported by both in vivo and in vitro data.
  
  The study also examines the impact of BMH-21, a drug known to inhibit Pol I elongation, and shows that SuperPol is less sensitive to this drug, as demonstrated through genetic, biochemical, and in vivo approaches. The authors show that BMH-21 treatment induces premature termination in wild-type Pol I, but only to a lesser extent in SuperPol. They suggest that BMH-21 promotes termination by targeting paused Pol I complexes and propose that PTT is an important regulatory mechanism for rRNA production in yeast.
  
  The data presented are of high quality and support the notion that 1) premature transcription termination occurs at the 5' end of rDNA transcription units; 2) SuperPol has an increased elongation rate with reduced premature termination; and 3) BMH-21 promotes both pausing and termination. The authors employ several complementary methods, including in vitro transcription assays. These results are significant and of interest for a broad audience.
  
  Adding experiments in different growth conditions to support the claim of regulation by PTT (as the authors propose) will also be an important addition. The revisions further support the claim, with in particular the notion that increased elongation rate of superpol occurs at the expense of fidelity.
  
  Significance:
  
  These results are significant and of interest for a basic research audience.
  
  Review 2
4. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  In the manuscript "Ribosomal RNA synthesis by RNA polymerase I is regulated by premature termination of transcription", Azouzi and co-authors investigate the regulatory mechanisms of ribosomal RNA (rRNA) transcription by RNA Polymerase I (RNAPI) in the budding yeast S. cerevisiae. They follow up on exploring the molecular basis of a mutant allele of the second-largest subunit of RNAPI, RPA135-F301S, also dubbed SuperPol, that they had previously reported (Darrière et al, 2019), and which was shown to rescue Rpa49-linked growth defects, possibly by increasing rRNA production.
  
  Through a combination of genomic and in vitro approaches, the authors test the hypothesis that RNAPI activity could be subjected to a premature transcription termination (PTT) mechanism, akin to what is observed for RNA Polymerase II (RNAPII). The authors demonstrate that SuperPol increased processivity "desensitizes" RNAPI to abortive transcription cycles at the expense of decreased fidelity. In agreement, SuperPol is shown to be resistant to BMH-21, a drug previously shown to impair RNAPI elongation.
  
  Overall, this work expands the mechanistic understanding of the early dynamics of RNAPI transcription. The presented results are of interest for researchers studying transcription regulation, particularly those interested in RNAPI's transcription mechanisms and fidelity.
  
  Strengths:
  
  Overall, the experiments are performed with rigor and include the appropriate controls and statistical analyses. Conclusions are drawn from appropriate experiments. Both the figures and the text present the data clearly. The Materials and Methods section is detailed enough.
  
  Weaknesses:
  
  The biological significance of this phenomenon remains unaddressed and thus unclear. The lack of experiments to test a specific regulatory function (such as UTP-A loading checkpoint or other mechanisms) limit these termination events to possibly abortive actions of unclear significance.
  
  Comments on revised version:
  
  I appreciated the additional experiments and the other changes made by the authors in the revised version.
  
  Review 3
5. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  General Statements:
  
  In our manuscript, we demonstrate for the first time that RNA Polymerase I (Pol I) can prematurely release nascent transcripts at the 5' end of ribosomal DNA transcription units in vivo. This achievement was made possible by comparing wild-type Pol I with a mutant form of Pol I, hereafter called SuperPol previously isolated in our lab (Darrière at al., 2019). By combining in vivo analysis of rRNA synthesis (using pulse-labelling of nascent transcript and cross-linking of nascent transcript - CRAC) with in vitro analysis, we could show that Superpol reduced premature transcript release due to altered elongation dynamics and reduced RNA cleavage activity. Such premature release could reflect regulatory mechanisms controlling rRNA synthesis. Importantly, This increased processivity of SuperPol is correlated with resistance with BMH-21, a novel anticancer drugs inhibiting Pol I, showing the relevance of targeting Pol I during transcriptional pauses to kill cancer cells. This work offers critical insights into Pol I dynamics, rRNA transcription regulation, and implications for cancer therapeutics.
  
  We sincerely thank the three reviewers for their insightful comments and recognition of the strengths and weaknesses of our study. Their acknowledgment of our rigorous methodology, the relevance of our findings on rRNA transcription regulation, and the significant enzymatic properties of the SuperPol mutant is highly appreciated. We are particularly grateful for their appreciation of the potential scientific impact of this work. Additionally, we value the reviewer’s suggestion that this article could address a broad scientific community, including in transcription biology and cancer therapy research. These encouraging remarks motivate us to refine and expand upon our findings further.
  
  All three reviewers acknowledged the increased processivity of SuperPol compared to its wildtype counterpart. However, two out of three questions our claims that premature termination of transcription can regulate ribosomal RNA transcription. This conclusion is based on SuperPol mutant increasing rRNA production. Proving that modulation of early transcription termination is used to regulate rRNA production under physiological conditions is beyond the scope of this study. Therefore, we propose to change the title of this manuscript to focus on what we have unambiguously demonstrated:
  
  “Ribosomal RNA synthesis by RNA polymerase I is subjected to premature termination of transcription”.
  
  Reviewer 1 main criticisms centers on the use of the CRAC technique in our study. While we address this point in detail below, we would like to emphasize that, although we agree with the reviewer’s comments regarding its application to Pol II studies, by limiting contamination with mature rRNA, CRAC remains the only suitable method for studying Pol I elongation over the entire transcription units. All other methods are massively contaminated with fragments of mature RNA which prevents any quantitative analysis of read distribution within rDNA. This perspective is widely accepted within the Pol I research community, as CRAC provides a robust approach to capturing transcriptional dynamics specific to Pol I activity.
  
  We hope that these findings will resonate with the readership of your journal and contribute significantly to advancing discussions in transcription biology and related fields.
  
  Description of the planned revisions:
  
  Despite numerous text modification (see below), we agree that one major point of discussion is the consequence of increased processivity in SuperPol mutant on the “quality” of produced rRNA. Reviewer 3 suggested comparisons with other processive alleles, such as the rpb1-E1103G mutant of the RNAPII subunit (Malagon et al., 2006). This comparison has already been addressed by the Schneider lab (Viktorovskaya OV, Cell Rep., 2013 - PMID: 23994471), which explored Pol II (rpb1-E1103G) and Pol I (rpa190-E1224G). The rpa190-E1224G mutant revealed enhanced pausing in vitro, highlighting key differences between Pol I and Pol II catalytic ratelimiting steps (see David Schneider's review on this topic for further details).
  
  Reviewer 2 and 3 suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Pol I mutant with decreased rRNA cleavage have been characterized previously, and resulted in increased errorrate. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively. This could provide valuable insights into the mechanistic differences between SuperPol and the wild-type enzyme. SuperPol is the first pol I mutant described with an increased processivity in vitro and in vivo, and we agree that this might be at the cost of a decreased fidelity.
  
  Regulatory aspect of the process:
  
  To address the reviewer’s remarks, we propose to test our model by performing experiments that would evaluate PTT levels in Pol I mutant’s or under different growth conditions. These experiments would provide crucial data to support our model, which suggests that PTT is a regulatory element of Pol I transcription. By demonstrating how PTT varies with environmental factors, we aim to strengthen the hypothesis that premature termination plays an important role in regulating Pol I activity.
  
  We propose revising the title and conclusions of the manuscript. The updated version will better reflect the study's focus and temper claims regarding the regulatory aspects of termination events, while maintaining the value of our proposed model.
  
  Description of the revisions that have already been incorporated in the transferred manuscript:
  
  Some very important modifications have now been incorporated:
  
  Statistical Analyses and CRAC Replicates:
  
  Unlike reviewers 2 and 3, reviewer 1 suggests that we did not analyze the results statistically. In fact, the CRAC analyses were conducted in biological triplicate, ensuring robustness and reproducibility. The statistical analyses are presented in Figure 2C, which highlights significant findings supporting the fact WT Pol I and SuperPol distribution profiles are different. We CRAC replicates exhibit a high correlation and we confirmed significant effect in each region of interest (5’ETS, 18S.2, 25S.1 and 3’ ETS, Figure 1) to confirm consistency across experiments. We finally took care not to overinterpret the results, maintaining a rigorous and cautious approach in our analysis to ensure accurate conclusions.
  
  CRAC vs. Net-seq:
  
  Reviewer 1 ask to comment differences between CRAC and Net-seq. Both methods complement each other but serve different purposes depending on the biological question on the context of transcription analysis. Net-seq has originally been designed for Pol II analysis. It captures nascent RNAs but does not eliminate mature ribosomal RNAs (rRNAs), leading to high levels of contamination. While this is manageable for Pol II analysis (in silico elimination of reads corresponding to rRNAs), it poses a significant problem for Pol I due to the dominance of rRNAs (60% of total RNAs in yeast), which share sequences with nascent Pol I transcripts. As a result, large Net-seq peaks are observed at mature rRNA extremities (Clarke 2018, Jacobs 2022). This limits the interpretation of the results to the short lived pre-rRNA species. In contrast, CRAC has been specifically adapted by the laboratory of David Tollervey to map Pol I distribution while minimizing contamination from mature rRNAs (The CRAC protocol used exclusively recovers RNAs with 3′ hydroxyl groups that represent endogenous 3′ ends of nascent transcripts, thus removing RNAs with 3’-Phosphate, found in mature rRNAs). This makes CRAC more suitable for studying Pol I transcription, including polymerase pausing and distribution along rDNA, providing quantitative dataset for the entire rDNA gene.
  
  CRAC vs. Other Methods:
  
  Reviewer 1 suggests using GRO-seq or TT-seq, but the experiments in Figure 2 aim to assess the distribution profile of Pol I along the rDNA, which requires a method optimized for this specific purpose. While GRO-seq and TT-seq are excellent for measuring RNA synthesis and cotranscriptional processing, they rely on Sarkosyl treatment to permeabilize cellular and nuclear membranes. Sarkosyl is known to artificially induces polymerase pausing and inhibits RNase activities which are involved in the process. To avoid these artifacts, CRAC analysis is a direct and fully in vivo approach. In CRAC experiment, cells are grown exponentially in rich media and arrested via rapid cross-linking, providing precise and artifact-free data on Pol I activity and pausing.
  
  Pol I ChIP Signal Comparison:
  
  The ChIP experiments previously published in Darrière et al. lack the statistical depth and resolution offered by our CRAC analyses. The detailed results obtained through CRAC would have been impossible to detect using classical ChIP. The current study provides a more refined and precise understanding of Pol I distribution and dynamics, highlighting the advantages of CRAC over traditional methods in addressing these complex transcriptional processes.
  
  BMH-21 Effects:
  
  As highlighted by Reviewer 1, the effects of BMH-21 observed in our study differ slightly from those reported in earlier work (Ref Schneider 2022), likely due to variations in experimental conditions, such as methodologies (CRAC vs. Net-seq), as discussed earlier. We also identified variations in the response to BMH-21 treatment associated with differences in cell growth phases and/or cell density. These factors likely contribute to the observed discrepancies, offering a potential explanation for the variations between our findings and those reported in previous studies. In our approach, we prioritized reproducibility by carefully controlling BMH-21 experimental conditions to mitigate these factors. These variables can significantly influence results, potentially leading to subtle discrepancies. Nevertheless, the overall conclusions regarding BMH-21's effects on WT Pol I are largely consistent across studies, with differences primarily observed at the nucleotide resolution. This is a strength of our CRAC-based analysis, which provides precise insights into Pol I activity.
  
  We will address these nuances in the revised manuscript to clarify how such differences may impact results and provide context for interpreting our findings in light of previous studies.
  
  Minor points:
  
  Reviewer #1:
  
  In general, the writing style is not clear, and there are some word mistakes or poor descriptions of the results, for example:
  
  On page 14: "SuperPol accumulation is decreased (compared to Pol I)".
  
  On page 16: "Compared to WT Pol I, the cumulative distribution of SuperPol is indeed shifted on the right of the graph."
  
  We clarified and increased the global writing style according to reviewer comment.
  
  There are also issues with the literature, for example: Turowski et al, 2020a and Turowski et al, 2020b are the same article (preprint and peer-reviewed). Is there any reason to include both references? Please, double-check the references.
  
  This was corrected in this version of the manuscript.
  
  In the manuscript, 5S rRNA is mentioned as an internal control for TMA normalisation. Why are Figure 1C data normalised to 18S rRNA instead of 5S rRNA?
  
  Data are effectively normalized relative to the 5S rRNA, but the value for the 18S rRNA is arbitrarily set to 100%.
  
  Figure 4 should be a supplementary figure, and Figure 7D doesn't have a y-axis labelling.
  
  The presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. In the absence of these subunits (which can vary depending on the purification batch), Pol I pausing, cleavage and elongation are known to be affected. To strengthen our conclusion, we really wanted to show the subunit composition of the purified enzyme. This important control should be shown, but can indeed be shown in a supplementary figure if desired.
  
  Y-axis is figure 7D is now correctly labelled
  
  In Figure 7C, BMH-21 treatment causes the accumulation of ~140bp rRNA transcripts only in SuperPol-expressing cells that are Rrp6-sensitive (line 6 vs line 8), suggesting that BHM-21 treatment does affect SuperPol. Could the author comment on the interpretation of this result?
  
  The 140 nt product is a degradation fragment resulting from trimming, which explains its lower accumulation in the absence of Rrp6. BMH21 significantly affects WT Pol I transcription but has also a mild effect on SuperPol transcription. As a result, the 140 nt product accumulates under these conditions.
  
  Reviewer #2:
  
  pp. 14-15: The authors note local differences in peak detection in the 5'-ETS among replicates, preventing a nucleotide-resolution analysis of pausing sites. Still, they report consistent global differences between wild-type and SuperPol CRAC signals in the 5'ETS (and other regions of the rDNA). These global differences are clear in the quantification shown in Figures 2B-C. A simpler statement might be less confusing, avoiding references to a "first and second set of replicates"
  
  According to reviewer, statement has been simplified in this version of the manuscript.
  
  Figures 2A and 2C: Based on these data and quantification, it appears that SuperPol signals in the body and 3' end of the rDNA unit are higher than those in the wild type. This finding supports the conclusion that reduced pausing (and termination) in the 5'ETS leads to an increased Pol I signal downstream. Since the average increase in the SuperPol signal is distributed over a larger region, this might also explain why even a relatively modest decrease in 5'ETS pausing results in higher rRNA production. This point merits discussion by the authors.
  
  We agree that this is a very important discussion of our results. Transcription is a very dynamic process in which paused polymerase is easily detected using the CRAC assay. Elongated polymerases are distributed over a much larger gene body, and even a small amount of polymerase detected in the gene body can represent a very large rRNA synthesis. This point is of paramount importance and, as suggested by the reviewer, is now discussed in detail.
  
  A decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Have the authors observed any evidence supporting this possibility?
  
  Reviewer suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. We thank Reviewer #2 to point it as in our opinion, this is an important point what should be added to the manuscript. We have now included new data (panels 5G, 5H and 5I) in the manuscript showing that SuperPol in vitro exhibits an increased error rate compared to the WT enzyme. From these results obtained in vitro, we concluded that SuperPol shows reduced nascent transcript cleavage, associated with more efficient transcript elongation, but to the detriment of transcriptional fidelity.
  
  pp. 15 and 22: Premature transcription termination as a regulator of gene expression is welldocumented in yeast, with significant contributions from the Corden, Brow, Libri, and Tollervey labs. These studies should be referenced along with relevant bacterial and mammalian research.
  
  According to reviewer suggestion, we referenced these studies.
  
  p. 23: "SuperPol and Rpa190-KR have a synergistic effect on BMH-21 resistance." A citation should be added for this statement.
  
  This represents some unpublished data from our lab. KR and SuperPol are the only two known mutants resistant to BMH-21. We observed that resistance between both alleles is synergistic, with a much higher resistance to BMH-21 in the double mutant than in each single mutant (data not shown). Comparing their resistance mechanisms is a very important point that we could provide upon request. This was added to the statement.
  
  p. 23: "The released of the premature transcript" - this phrase contains a typo
  
  This is now corrected.
  
  Reviewer #3:
  
  Figure 1B: it would be opportune to separate the technique's schematic representation from the actual data. Concerning the data, would the authors consider adding an experiment with rrp6D cells? Some RNAs could be degraded even in such short period of time, as even stated by the authors, so maybe an exosome depleted background could provide a more complete picture. Could also the authors explain why the increase is only observed at the level of 18S and 25S? To further prove the robustness of the Pol I TMA method could be good to add already characterized mutations or other drugs to show that the technique can readily detect also well-known and expected changes.
  
  The precise objective of this experiment is to avoid the use of the Rrp6 mutant. Under these conditions, we prevent the accumulation of transcripts that would result from a maturation defect. While it is possible to conduct the experiment with the Rrp6 mutant, it would be impossible to draw reliable conclusions due to this artificial accumulation of transcripts.
  
  Figure 1C: the NTS1 probe signal is missing (it is referenced in Figure 1A but not listed in the Methods section or the oligo table). If this probe was unused, please correct Figure 1A accordingly.
  
  We corrected Figure 1A.
  
  Figure 2A: the RNAPI occupancy map by CRAC is hard to interpret. The red color (SuperPol) is stacked on top of the blue line, and we are not able to observe the signal of the WT for most of the position along the rDNA unit. It would be preferable to use some kind of opacity that allows to visualize both curves. Moreover, the analysis of the behavior of the polymerase is always restricted to the 5'ETS region in the rest of the manuscript. We are thus not able to observe whether termination events also occur in other regions of the rDNA unit. A Northern blot analysis displaying higher sizes would provide a more complete picture.
  
  We addressed this point to make the figure more visually informative. In Northern Blot analysis, we use a TSS (Transcription Start Site) probe, which detects only transcripts containing the 5' extremity. Due to co-transcriptional processing, most of the rRNA undergoing transcription lacks its 5' extremity and is not detectable using this technique. We have the data, but it does not show any difference between Pol I and SuperPol. This information could be included in the supplementary data if asked.
  
  "Importantly, despite some local variations, we could reproducibly observe an increased occupancy of WT Pol I in 5'-ETS compared to SuperPol (Figure 1C)." should be Figure 2C.
  
  Thanks for pointing out this mistake. It has been corrected.
  
  Figure 3D: most of the difference in the cumulative proportion of CRAC reads is observed in the region ~750 to 3000. In line with my previous point, I think it would be worth exploring also termination events beyond the 5'-ETS region.
  
  We agree that such an analysis would have been interesting. However, with the exception of the pre-rRNA starting at the transcription start site (TSS) studied here, any cleaved rRNA at its 5' end could result from premature termination and/or abnormal processing events. Exploring the production of other abnormal rRNAs produced by premature termination is a project in itself, beyond this initial work aimed at demonstrating the existence of premature termination events in ribosomal RNA production.
  
  Figure 4: should probably be provided as supplementary material.
  
  As l mentioned earlier (see comments), the presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. This important control should be shown, but can indeed be shown in a supplementary figure if desired.
  
  "While the growth of cells expressing SuperPol appeared unaffected, the fitness of WT cells was severely reduced under the same conditions." I think the growth of cells expressing SuperPol is slightly affected.
  
  We agree with this comment and we modified the text accordingly.
  
  Figure 7D: the legend of the y-axis is missing as well as the title of the plot.
  
  Legend of the y-axis and title of the plot are now present.
  
  The statements concerning BMH-21, SuperPol and Rpa190-KR in the Discussion section should be removed, or data should be provided.
  
  This was discussed previously. See comment above.
  
  Some references are missing from the Bibliography, for example Merkl et al., 2020; Pilsl et al., 2016a, 2016b.
  
  Bibliography is now fixed
  
  Description of analyses that authors prefer not to carry out:
  
  Does SuperPol mutant produces more functional rRNAs ?
  
  As Reviewer 1 requested, we agree that this point requires clarification.. In cells expressing SuperPol, a higher steady state of (pre)-rRNAs is only observed in absence of degradation machinery suggesting that overproduced rRNAs are rapidly eliminated. We know that (pre)rRNas are unable to accumulate in absence of ribosomal proteins and/or Assembly Factors (AF). In consequence, overproducing rRNAs would not be sufficient to increase ribosome content. This specific point is further address in our lab but is beyond the scope of this article.
  
  Is premature termination coupled with rRNA processing
  
  We appreciate the reviewer’s insightful comments. The suggested experiments regarding the UTP-A complex's regulatory potential are valuable and ongoing in our lab, but they extend beyond the scope of this study and are not suitable for inclusion in the current manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.27.568781v3
arxiv.org arxiv.org

Theory of active self-organization of dense nematic structures in the actin cytoskeleton

5
1. Public_Reviews 10 Oct 2025
 
 in eLife (unscoped)
 
 eLife Assessment
 
 In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organisation in the actin cortex. The theoretical work is solid and provides a rigorous theoretical framework to study active self-organisation in actomyosin systems, including qualitative comparison with experimental observations.
 
 Summary
2. Public_Reviews 09 Oct 2025
 
 in eLife (unscoped)
 
 Reviewer #1 (Public review):
 
 Summary:
 
 In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.
 
 Strengths:
 
 The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.
 
 Weaknesses:
 
 This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative. It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination. The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns. Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim. Additionally, it's unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase. Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.
 
 Review 1
3. Public_Reviews 09 Oct 2025
 
 in eLife (unscoped)
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The article by Waleed et al discusses the self-organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self-organized structures can emerge.
 
 Strengths:
 
 (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network
 
 Weaknesses:
 
 Not placed in the context or literature on active nematics.
 
 Comments on revised version:
 
 The authors have satisfactorily responded to the comments
 
 Review 2
4. Public_Reviews 09 Oct 2025
 
 in eLife (unscoped)
 
 Reviewer #3 (Public review):
 
 The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripe-like patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the most crucial assumptions underlying continuum simulations.
 
 The paper is well written, figures are mostly clear, and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not explicitly stated this way by the authors, I would argue that combining these two is one of the key ingredients that distinguishes this theoretical paper from similar ones.
 
 The diversity of patterning processes experimentally observed and theoretically described is nicely elaborated on in the introduction of the paper. The theory development and discussion of the continuum model itself is also well-embedded in a review of the relevant broad literature on active liquid crystals and active nematics, which includes plenty of previous results by the authors themselves. Interestingly, several of the patterns identified in the present work, such as 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019) have been observed previously in different, but related, active isotropic fluid models. In light of this crowded literature, the authors do good job in delineating key results obtained in the present manuscript from existing work.
 
 The results of numerical simulations are well-presented. The discussion of numerical observations is comprehensive, but also at many times qualitative. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system, which is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (Nejad et al, Nat Comm 2024). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.
 
 The authors must be complimented for trying to gain further mechanistic insights into their conclusions using microscopic filament simulations that were diligently performed. It is rightfully stated that these simulations only provide plausibility tests about key assumptions underlying the hydrodynamic theory. Within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 microscopically, in which the continuum theory does also predict the formation of stripe patterns? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa? The authors clearly explain the scope and limitations of the microscopic model, which suggests that questions like these will be interesting directions of future investigations.
 
 Overall, the paper represents a valuable contribution to the field of active matter that should provide a fruitful basis to develop new hypothesis about the dynamic self-organisation and mechanics of dense filamentous bundles in biological systems.
 
 Review 3
5. Public_Reviews 09 Oct 2025
 
 in eLife (unscoped)
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 eLife assessment
 
 In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organization in the actin cortex. While the theoretical work is solid, experimental evidence in support of the model assumptions remains incomplete. The presentation could be improved to enhance accessibility for readers without a strong background in hydrodynamic and nematic theories.
 
 To address the weaknesses identified in this assessment, we have expanded the motivation and description of the theoretical model, specifically insisting on the experimental evidence supporting its rationale and assumptions. These changes in the revised manuscript are implemented in the two first paragraphs of Section “Theoretical model” and in a more detailed description and justification of the different mathematical terms that appear in that section. We have made an effort to map in our narrative different terms to mechanistic processes in the actomyosin network. Even if the nature of the manuscript is inevitably theoretical, we think that the revised manuscript will be more accessible to a broader spectrum of readers.
 
 Public Reviews:
 
 Reviewer #1 (Public Review):
 
 Summary:
 
 In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.
 
 Strengths:
 
 The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.
 
 We thank the referee for these comments.
 
 Weaknesses:
 
 (A) This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative.
 
 We understand the point of the referee. While it is unavoidable to present the continuum hydrodynamic theory behind our results, we have made an effort in the revised manuscript to (1) motivate the essential features required from a theoretical model of the actomyosin cytoskeleton capable of describing its nematic self organization (two first paragraphs of Section “Theoretical model”), and to (2) explicitly explain the physical meaning of each of the mathematical terms in the theory, and when appropriate, relate them to molecular mechanisms in the cytoskeleton. We hope that the revised manuscript addresses the concern of the referee.
 
 Regarding the comparison with experiments, they are indeed qualitative because the main point of the paper is to establish a physical basis for the self-organization of dense nematic structures in actomyosin gels. Somewhat surprisingly, we argue that a compelling mechanism explaining the tendency of actomyosin gels to form patterns of dense nematic bundles has been lacking. As we review in the introduction, these patterns are qualitatively diverse across cell types and organisms in terms of geometry and dynamics, and for this reason, our goal is to show that the same material in different parameter regimes can exhibit such qualitative diversity. A quantitative comparison is difficult for several reasons. First, many of the parameters in our theory have not been measured and are expected to vary wildly between cell types. In fact, estimates in the literature often rely on comparison with hydrodynamic models such as ours. For this reason, we chose to delineate regimes leading to qualitatively different emerging architectures and dynamics. Second, the patterns of nematic bundles found across cell types depend on the interaction between (1) the intrinsic tendency of actomyosin gels to form such structures studied here and (2) other elements of the cellular context. For instance, polymerization and retrograde flow from the lamellipodium, the physical barrier of the nucleus, and the interaction with the focal adhesion machinery are essential to understand the emergence of stress fibers in adherent cells. Cell shape and curvature anisotropy control the orientation of actin bundles in parallel patterns in the wings and trachea of insects. Nuclear positions guide the actin bundles organizing the cellularization of Sphaeroforma arctica [11]. Here, we focus on establishing that actomyosin gels have an intrinsic ability to self organize into dense nematic bundles, and leave how this property enables the morphogenesis of specific structures for future work. We have emphasized this point in the revised section of conclusions.
 
 (B) It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination.
 
 We thank the referee for this comment. Our theory is applicable to actomyosin gels originating from living cells. To our knowledge, the ability of reconstituted actomyosin gels from purified proteins to sustain the kind of contractile dynamical steady-states observed in living cells is very limited. In the revised manuscript, we cite a very recent preprint presenting very exciting but partial results in this direction [49]. Instead, reconstituted in vitro systems encapsulating actomyosin cell extracts robustly recapitulate contractile steady-states. This point has been clarified in the first paragraph of Section “Theoretical model”.
 
 (C) The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns.
 
 We agree with the referee and in the revised manuscript we have avoided the term “sarcomeric” because it refers to very specific organizations in cells. What we previously called “sarcomeric patterns”, where bands of high density exhibit nematic order perpendicular to the axis of the bands, is not a structure observed to our knowledge in cells. It is introduced to delimit the relevant region in parameter space. In the revised manuscript, we refer to this pattern as “banded pattern with perpendicular nematic organization” or “banded pattern” in short.
 
 (D) Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim.
 
 We thank the referee for raising this point, which was not sufficiently clarified in the original manuscript. We first note that in incompressible active nematic models, active tension is deviatoric (traceless and anisotropic) because an isotropic component would simply get absorbed by the pressure field enforcing incompressibility. Being compressible, our model admits an active tension tensor with deviatoric and isotropic components. We consider always a contractile (positive) isotropic component of active tension, but the deviatoric component can be either contractile (𝜅 > 0) or extensile (𝜅 < 0), where we follow the common terminology according to which in contractile/extensile active nematics the active stress is proportional to q with a positive/negative proportionality constant [see e.g. https://doi.org/10.1038/s41467018-05666-8]. Furthermore, as clarified in the revised manuscript, total active stresses accounting for the deviatoric and isotropic components are always contractile (positive) in all directions, as enforced by the condition |𝜅| < 1.
 
 For fibrillar patterns, we need 𝜅 < 0, and therefore active stresses are larger perpendicular to the nematic direction. This means that the anisotropic component of the active tension is extensile, although, accounting for the isotropic component, total active tension is contractile (see Fig. 1c). This is now clarified in the text following Eq. 7 and in Fig. 1.
 
 However, following fibrillar pattern formation and as a result of the interplay between active and viscous stresses, the total stress can be larger along the emergent dense nematic structures (“contractile structures”) or perpendicular to them (“extensile structures”). To clarify this point, in the revised Fig. 4 and the text referring to it, we have expanded our explanation and plotted the difference between the total stress component parallel to the nematic direction (𝜎∥) and the component perpendicular to the nematic direction (𝜎⊥), with contractile structures satisfying 𝜎∥ − 𝜎⊥ > 0 and extensile structures satisfying 𝜎∥ − 𝜎⊥ < 0. See lines 280 to 303. This is consistent with the common notion of contractile/extensile systems in incompressible nematic systems [see e.g. https://doi.org/10.1038/s41467-018-05666-8].
 
 (E) Additionally, its unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase.
 
 In the present work, we focus on the self-organization of a periodic patch of actomyosin gel. However, in adherent cells boundary conditions play an essential role, as discussed in our response to comment (A) by this referee. In ongoing work, we are studying with the present model the dynamics of assembly and reconfiguration of dense nematic structures in domains with boundary conditions mimicking in adherent cells, possibly interacting with the adhesion machinery, finding dynamical interactions as those suggested by the referee. As an example, we show a video of a simulation where at the edge of the circular domain, there is an actin influx modeling the lamellipodium, and in four small regions friction is higher simulating focal adhesions. Under these boundary conditions, the model presented in the paper exhibits the kind of dynamical reorganizations alluded by the referee.
 
 Author response video 1.
 
 We would like to note, however, that the prominent stress fibers in cells adhered to stiff substrates, so abundantly reported in the literature, are not the only instance of dense nematic actin bundles. In the present manuscript, we emphasize the relation of the predicted organizations with those found in different in vivo contexts not related to stress fibers, such as the aligned patterns of bundles in insects (trachea, scales in butterfly wings), in hydra, or in reproductive organs of C elegans; the highly dynamical network of bundles observed in C elegans early embryos; or the labyrinth patters of micro-ridges in the apical surface of epidermal cells in fish.
 
 (F) Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.
 
 We thank the referee for raising this question, which needs further clarification. The goal of the microscopic model is not to reproduce the self-organized patterns predicted by the active gel theory. The microscopic model lacks essential ingredients, notably a realistic description of hydrodynamics and turnover. Our goal with the agent-based simulations is to extract the relation between nematic order and active stresses for a small homogeneous sample of the network. This small domain is meant to represent the homogeneous active gel prior to pattern formation, and it allows us to substantiate key assumptions of the continuum model leading to pattern formation, notably the dependence of isotropic and deviatoric components of the active stress on density and nematic order (Eq. 7) and the active generalized stress promoting ordering.
 
 We should mention that reproducing the range of out-of-equilibrium mesoscale architectures predicted by our active gel model with agent-based simulations seems at present not possible, or at least significantly beyond the state-of-the-art. To our knowledge, these models have not been able to reproduce the heterogeneous nonequilibrium contractile states involving sustained self-reinforcing flows underlying the pattern formation mechanism studied in our work. The scope of the discrete network simulations has been clarified in lines 340 to 349 in the revised manuscript.
 
 While agent-based cytoskeletal simulations are very attractive because they directly connect with molecular mechanisms, active gel continuum models are better suited to describe out-of-equilibrium emergent hydrodynamics at a mesoscale. We believe that these two complementary modeling frameworks are rather disconnected in the literature, and for this reason, we have attempted substantiate some aspects of our continuum modeling with discrete simulations. We have emphasized the complementarity of the two approaches in the conclusions.
 
 Reviewer #1 (Recommendations For The Authors):
 
 Questions on the theory:
 
 Does rho describe the density of actin or myosin? The authors say that they are modeling actomyosin material as a whole, but the actin and myosin should be modeled separately. Along, similar lines, does Q define the ordering of actin or myosin?
 
 Active gel models of the actomyosin cytoskeleton have been formulated with independent densities for actin and for myosin or using a single density field, implicitly assuming a fixed stoichiometry. Super-resolution imaging of the actomyosin cytoskeleton also suggest that in principle it makes sense to consider different nematic fields for actin and for myosin filaments. In the revised manuscript, we now explicitly mention that our density and nematic field are effective descriptions of the entire actomyosin gel (lines 82-84).
 
 A more detailed model would entail additional material parameters, not available experimentally, which may help reproduce specific experiments but that would make the systematic study of the different behaviors much more difficult. Our approach has been to keep the model minimal meeting the fundamental requirements outlined in the first paragraphs of Section “Theoretical model”.
 
 Should the active stress depend on material density? It seems strange (from Eq. 3) that active stress could be non-zero even where density is zero, since sigma_act does not depend on rho.
 
 Yes, active stress is assumed to be proportional to density. Eq. 3 in the original manuscript was misleading (it was multiplied by rho in Eq. 2). In the revised manuscript, we have explained with a bit more detail the theoretical model, clarifying this point.
 
 The authors should clearly explain their rationale for retaining certain types of nonlinear terms while ignoring others in theory. For instance, the nonlinearities in the equations of motion are sometimes quadratic in the fields, while there are also some cubic terms. Please remark up to what order in the fields the various interactions are modeled.
 
 We thank the referee for raising this point. The nonlinearities in the theory are easily explained on the basis of a small number of choices. We have added a new paragraph towards the end of Section “Theoretical model” (lines 145 to 152) providing a rationale for the origin and underlying assumptions leading to different nonlinearities.
 
 To connect with experiments and the biological context, please explain the biological origin of various terms in the model: (1) L-dependent terms in Eq. 2 and 4, (2) Flowalignment of nematic order and experimental evidence in support of it, (3) densitydependent susceptibility terms in Eq. 4
 
 (1) Unfortunately, the L-dependent terms are very bulky, but are very standard in nematic theories. The best way to understand their physical significance is through the expression of the nematic free-energy, which is now given and explained in the revised manuscript (Eq. 3). The resulting complicated expression for the molecular field and the nematic stress (Eqs. 4 and 5) are mathematical consequences of the choice of nematic free energy. In the revised manuscript, we also attempt to provide a basis for these terms in the context of the actin cytoskeleton. (2) To our knowledge, the best reference supporting this term from experiments is Reymann et al, eLife (2016). In the revised manuscript, we have provided a physical interpretation. (3) We have expanded the motivation and plausible microscopic justification of this term.
 
 There are different 'activity' terms in the model. Their biophysical origin is not made clear. For example, the authors should make clear if these activities arise from filament or motor activity. Relatedly, the authors should provide a comprehensive discussion of the signs of the different active parameters and their physical interpretations.
 
 In an active gel model, activity parameters are phenomenological and how they map to molecular mechanisms is not precisely known, although conventionally contractile active tension is ascribed to the mechanical transduction of chemical power by myosin motors. The fact is that, besides myosin activity, there are many nonequilibrium processes in the actomyosin cytoskeleton that may lead to active stresses including (de)polymerization of filaments or (un)binding of crosslinkers. In the revised manuscript, we have added sentences illustrating how different terms may result from microscopic mechanisms, but providing a precise mapping between our model and nonequilibrium dynamics of proteins is beyond the scope of our work, although our discrete network simulations address this issue to a certain degree.
 
 Following the suggestion of the referee, our description of the theory now discusses much more extensively the signs of activity parameters and their physical interpretations, e.g. the text following Eq. 7.
 
 Throughout the paper, various activity terms are varied independently of each other. Is that a reasonable assumption given that activities should depend on ATP and are thus not independent of one another?
 
 We agree that, ultimately, all active process depend on the conversion of chemical energy into mechanical energy. However, recent work has highlighted how active tension also depends on the microscopic architecture of the network controlled by multiple regulators of the actomyosin cytoskeleton (e.g. Chug et al, Nat Cell Biol, 2017). It is reasonable to expect that, for a given rate of ATP consumption, chemical power will be converted into mechanical power in different ways depending on the micro-architecture of the cytoskeleton, e.g. the stoichiometry of filaments, crosslinkers, myosins, or the length distribution of filaments (very long filaments crosslinked by myosins may be difficult to reorient but may contract efficiently).
 
 We have added a paragraph in Section “Theoretical model” with a discussion, lines 153 to 156.
 
 Sarcomeres are muscle fibers that exhibit alternating polarity pattern. Such patterning is not evident in what the authors call 'sarcomeres' in Fig. 2. I believe the authors should revise their terminology and not loosely interpret existing classifications in the field.
 
 We thank the referee for raising this point. We have changed the terminology.
 
 Fig 2a: Is the cartoon for filament alignment incorrect for kappa>0?
 
 The cartoon is correct. In the revised manuscript we have explained more clearly the physical meaning of kappa in the text following Eq. 7. In the caption of Fig. 1 and of Fig. 2a, we have also clarified that when the absolute value of kappa is <1, then active tension is positive in all directions.
 
 Within the section "Requirements for fibrillar and banded patterns", it will be useful to show the figures for varying the different active parameters in the main figures.
 
 We have followed the referee’s suggestion and moved Supp. Fig. 1 of the original manuscript to the main figures.
 
 How do the authors decide if bundles are contractile or extensile? Why are contractile bundles under tension while extensile bundles are under compression? I would expect the opposite.
 
 We agree that this point deserves a more detailed explanation. In the revised manuscript and in the new Figure 4, we further develop this point. The fibrillar pattern forms when kappa<0. We further assume that -1<kappa<0, so that active tension is positive in all directions. In this regime, the deviatoric (anisotropic) part of active tension is extensile. However, following pattern formation and because of the interplay between active and viscous stresses, the total stress in the emerging bundles may become extensile or contractile, depending on whether the largest component of stress is perpendicular or along the bundle axis. This is now presented in the updated figure, with new panels presenting maps of the total tension. The text discussing this point has been rewritten and we hope that the new version is much clearer (lines 280 to 303).
 
 A contractile bundle tends to shorten, but it cannot do it because of boundary conditions or the interaction with other bundles. As a result they are in tension. Conversely, an extensile bundle tries to elongate, but being constrained, it becomes compressed. As an analogy, consider the cortex of a suspended cell. The cortex is contractile, but it cannot contract because of volume regulation in th cell, which is typically pressurized. As a result, tension in the cortex is positive, as shown by Laplace’s law [10.1016/j.tcb.2020.03.005]. We have tried to clarify this point in the revised manuscript.
 
 Can the authors reproduce alternating density patterns using the cytosim simulations? This is an important step in establishing the correspondence between the continuum theory and the agent-based model.
 
 We have addressed this point in our response to public comment (F) of this referee.
 
 The authors do not provide code or data.
 
 The finite element code with an input file require to run a representative simulation in the paper is now made available, see Ref. [74].
 
 The customizations of Cytosim needed to account for nematic order in our discrete network simulations are available, see Ref. [98].
 
 Reviewer #2 (Public Review):
 
 Summary:
 
 The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article.
 
 We thank the referee for these comments. In the revised manuscript, we have highlighted the novelty, particularly in the last paragraph of the introduction, the first two paragraphs of Section “Theoretical model”, and in the conclusions. Despite a very large literature on theoretical models of stress fibers, actin rings, and active nematics, we argue that the active self-organization of dense nematic structures from an isotropic and low-density gel has not been compellingly explained so far. Many models assume from the outset the presence of actin bundles, or explain their formation using localized activity gradients. The literature of active nematics has extensively studied symmetry breaking and the self-organization. However, most of the works assume initial orientational order. Only a few works study the emergence of nematic order from a uniform isotropic state, but consider dry systems lacking hydrodynamic interactions or incompressible and density-independent systems [37,38]. Yet, pattern formation in actomyosin gels is characterized by large density variations, and by highly compressible flows, which coordinate in a mechanism relying on an advective instability and self-reinforcing flows.
 
 Our theoretical model is not particularly novel, and as we mention in the manuscript, it can be particularized to different models used in the literature. However, we argue that it has the right minimal features to capture nematic self-organization in actomyosin gels. To our knowledge, no previous study explains the emergence of dense and nematic structures from a low-density isotropic gel as a result of activity and involving the advective instability typical of symmetry-breaking and patterning in the actomyosin cytoskeleton. These are important qualitative features of our results that resonate with a large experimental record, and as such, we believe that our work provides a new and compelling mechanism relying on self-organization to explain the prominence and diversity of patterns involving dense nematic bundles in the actomyosin cytoskeleton across species.
 
 Strengths:
 
 (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network
 
 Weaknesses:
 
 Not placed in the context or literature on active nematics.
 
 We agree with the referee that this was a weakness of the original manuscript. In the revised manuscript, within reasonable space constraints given the size and dynamism of the field of active nematics, we have placed our work in the context of this field (end of introduction and first two paragraphs of Section “Theoretical model”). The published version of our companion manuscript [45] also contributes to providing a clear context to our theoretical model within the field.
 
 Reviewer #2 (Recommendations For The Authors):
 
 The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article. I explain my questions comments below.
 
 We have responded to this comment above.
 
 (i) Active nematics including density variations have been dealt quite extensively in the literature. For example, the works of Sriram Ramaswami have dealt with this system including linear stability analysis, simulations etc. In what way is the present work different from the system that they have considered?
 
 (ii) Active flows leading to self organization has been a topic of discussion in many works. For example: (i) Annual Review of Fluid Mechanics, Vol. 43:637-659, 2010, https://doi.org/10.1146/annurev-fluid-121108-145434 (ii) S Santhosh, MR Nejad, A Doostmohammadi, JM Yeomans, SP Thampi, Journal of Statistical Physics 180, 699-709 (iii) M. G. Giordano1, F. Bonelli2, L. N. Carenza1,3, G. Gonnella1 and G. Negro1, Europhysics Letters, Volume 133, Number 5. In what way this work is different from any of these?
 
 (iii) I am confused about the models used in the paper. There is significant literature from Prof. Mike Cates group, Prof. Julia Yeomans group, Prof. Marchetti's group who all use similar governing equations. In the present paper, I find it hard to understand whether the model used is similar to the existing ones in literature or are there significant differences. It should be clarified.
 
 Response to (i), (ii) and (iii).
 
 We completely agree with this referee (and also the previous referee), that the contextualization of our work in the field of active nematics was very insufficient. In the revised manuscript, the last paragraph of the introduction and the first two paragraphs of Section “Theoretical model” now address this point. In short, previous active nematic models predicting patterns with density variations have been either for dry active matter (disregarding hydrodynamic interactions), or for suspensions of active particles moving in an incompressible flow. None of these previous works predict nematic pattern formation as a result of activity relying on the advective instability and self-reinforcing compressible flows, leading to high density and high order bundles surrounded by an isotropic low density phase. Yet, these are fundamental features observed in actomyosin gels. Many works deal with symmetry-breaking of a system with pre-existing order, but very few address how order emerges actively from an isotropic state. We thank the referee for pointing at the paper by Santhosh et al, who nicely make this argument and is now cited. Our mechanism is fundamentally different from that in Santhosh, whose model is incompressible and ignores density variations.
 
 We hope that the revised manuscript addresses this important concern.
 
 (i) >(iv) Below Eqn 6, it starts by saying that the “...origin..is clear...” Its not. I don't understand the physical origin of the instability, and this should be clarified, may be with some illustrations.
 
 We apologize for this unfortunate sentence, which we have rewritten in the revised manuscript (lines 181 to 185).
 
 Reviewer #3 (Public Review):
 
 The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripelike patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the conclusions drawn from continuum simulations.
 
 The paper is well written, figures are mostly clear and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not yet stated this way by the authors, I would argue that combining these two is of the key ingredients that distinguishes this theoretical paper from similar ones. The diversity of patterning processes experimentally observed is nicely elaborated on in the introduction of the paper, though other closely related previous work could also have been included in these references (see below for examples).
 
 We thank the referee for these comments and for the suggestion to emphasize the interplay of isotropic and anisotropic active tension, which is possible only in a compressible gel, as mentioned in the revised manuscript. We have emphasized this point in different places in the revised manuscript. We thank the suggestions of the referee to better connect with existing literature.
 
 To introduce the continuum model, the authors exclusively cite their own, unpublished pre-print, even though the final equations take the same form as previously derived and used by other groups working in the field of active hydrodynamics (a certainly incomplete list: Marenduzzo et al (PRL, 2007), Salbreux et al (PRL, 2009, cited elsewhere in the paper), Jülicher et al (Rep Prog Phys, 2018), Giomi (PRX, 2015),...). To make better contact with the broad active liquid crystal community and to delineate the present work more compellingly from existing results, it would be helpful to include a more comprehensive discussion of the background of the existing theoretical understanding on active nematics. In fact, I found it often agrees nicely with the observations made in the present work, an opportunity to consolidate the results that is sometimes currently missed out on. For example, it is known that self-organised active isotropic fluids form in 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019), just as shown and discussed in Fig. 2. It is also known that extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis (the other way around for \kappa>0, see e.g. Doostmohammadi et al, Nat Comm, 2018 "Active Nematics" for a review that makes this point), consistent with all relative nematic director/flow orientations shown in Figs. 2 and 3 of the present work.
 
 We thank the referee for these suggestions. Indeed, in the original submission we had outsourced much of the justification of the model and the relevant literature to a related pre-print, but this is not reasonable. The companion publication has now been accepted in the New Journal of Physics, with significant changes to better connect the work to the field of active nematics. A preprint reflecting those changes is available in Ref. [64], but we hope to reference the published paper that will come out soon.
 
 In the revised manuscript, we have significantly rewritten the Section “Theoretical model” to frame the continuum model in the context of the field of active nematics. While our model and results have commonalities with previous work, there are also important differences. We have highlighted the novelty of the present work along with the relation with previous studies and theoretical models in the last paragraph of the introduction and the first two paragraphs of Section “Theoretical model”. Furthermore, as suggested by the referee, we have made an effort to connect our results with previous work by Kumar, Mietke, Doostmohammadi and others.
 
 Regarding the last point alluded by the referee (“extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis”), the picture raised by the referee would be nuanced for our compressible system as compared to the incompressible systems discussed in that reference. As we have elaborated in our response to point (D) of Referee #1, our systems are overall contractile (with positive active tension in all directions), but the deviatoric component of the active tension can be either extensile or contractile. In our “extensile” models (left in Fig. 2c), material is drawn to laterally to the nematic axis but it is not expelled along this axis. Instead, it is “expelled” by turnover. In the revised manuscript, we have added a comment about this.
 
 The results of numerical simulations are well-presented. Large parts of the discussion of numerical observations - specifically around Fig. 3 - are qualitative and it is not clear why the analysis is restricted to \kappa<0. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (https://arxiv.org/abs/2309.04224). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.
 
 We thank the referee for these comments. We are reluctant to extend the detailed analysis of emergent architectures and dynamics to the case \kappa > 0 as it leads to architectures not observed, to our knowledge, in actin networks. In the revised manuscript, we have expanded and clarified the characterization of emergent contractile/extensile networks by reporting the relative magnitude of stress along and perpendicular to the nematic direction. Our revised manuscript clearly shows that even though all of our simulations describe locally contractile systems with extensile anisotropic active tension, the emergent meso-structures can be either extensile or contractile, with the extensile ones exhibiting the usual bend-type instability (a secondary instability in our system) described classically for extensile active nematic systems. We have rewritten the text discussing this (lines 280 to 303), where we have placed these results in the context of recent work reporting the nontrivial relation between the contractility/extensibility of the local units vs the nematic pattern.
 
 I compliment the authors for trying to gain further mechanistic insights into this conclusion with microscopic filament simulations that are diligently performed. It is rightfully stated that these simulations only provide plausibility tests and, within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 (which is dropped ad-hoc from Fig. 3 onward) microscopically, in which the continuum theory does also predict the formation of stripe patterns - besides the short comment at the very end? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa?
 
 We thank the referee for this compliment. We think that the point raised by the referee is very interesting. It is reasonable to expect that the sign of \kappa may not be a constant but rather depend on S and \rho. Indeed, for a sparse network with low order, the progressive bundling by crosslinkers acting on nearby filaments is likely to produce a large active stress perpendicular to the nematic direction, whereas in a dense and highly ordered region, myosin motors are more likely to effectively contract along the nematic direction whereas there is little room for additional lateral contraction by additional bundling. As discussed in our response to referee #1, we believe that studying the formation of patterns using the discrete network simulations is far beyond the scope of our work. We discuss in lines 332 to 341, as well as in the last paragraph of the conclusions, the scope and limitations of our discrete network simulations.
 
 Overall, the paper represents a valuable contribution to the field of active matter and, if strengthened further, might provide a fruitful basis to develop new hypothesis about the dynamic self-organisation of dense filamentous bundles in biological systems.
 
 Reviewer #3 (Recommendations For The Authors):
 
 The statement "the porous actin cytoskeleton is not a nematic liquid-crystal because it can adopt extended isotropic/low-order phases" is difficult to understand and should be clarified, as the next paragraph starts formulating a nematic active liquid crystal theory. Do the authors mean a crystal that "Tends to be in a disordered phase?", according to its equilibrium properties? It would still be a "nematic liquid crystal", only its ground state is not a nematic phase.
 
 We agree with the referee, and we hope that changes in the introduction and in Section “Theoretical model” address this comment.
 
 I could not find what Frank energy is precisely used, that would be helpful information.
 
 In the revised manuscript, we have provided the expression for the nematic free energy in Eq. 3.
 
 The Significance of green/purple arrows in Fig 2a sketch unclear, green arrows also in b,c, do they represent the same quantity? From the simulations images it is overall it is very difficult to see how the flows are oriented near the high-density regions (i.e. if they are towards / away from the strip).
 
 We thank the referee for bringing this up. The colorcodings of the sketches were confusing. The modified figures (Fig. 1(c) and Fig. 2(a)) present now a clearer and unified representation of anisotropic tension. The green arrows in Fig. 2(c) represent the out-of-equilibrium flows in the steady state. We agree that the zoom is insufficient to resolve the flow structure. For this reason, in the revised Fig. 2, we have added additional panels showing the flow with higher resolution.
 
 It is currently unclear how the linear stability results - beyond identification of the parameter \delta - inform any of the remaining manuscript. Quantitative comparisons of the various length scales seen in simulated patterns (e.g. Fig. 2b, 3c etc) with linear predictions and known characteristic length scales would be instructive mechanistically, would make the overall presentation more compelling and probes limitations of linear results.
 
 In the revised manuscript, we have provided further information so that the readers can appreciate the predictions and limitations of the linear stability results. We have added a sentence and a Figure to show that, in addition to the critical activity, the linear theory provides a good prediction of the wavelengh of the pattern. See lines 199 to 201.
 
 It is not clear what is meant by "[bundle-formation] requires that active tension perpendicular to nematic orientation is larger than along this direction", and therefore also not why that would be "counter-intuitive". If interpreted naively, I would say that a large tension brings in more filaments into the bundle, so that may well be an obviously helpful feature for bundle formation and maintenance. In any case, it would be helpful if clarity is improved throughout when arguments about "directions of tensions" are made.
 
 We have significantly rewritten the first paragraphs of section “Microscopic origin…” to clarify this point (lines 330 to 339). This paragraph, along with other changes in the manuscript such as the explanation of Eq. 7 or the discussion about the stress anisotropy in the new version of Fig. 4 (see lines 280 to 303), provide a better explanation of this important point.
 
 All density color bars: Shouldn't they rather be labelled \rho/\rho_0?
 
 Yes! We have corrected this typo.
 
 Scalar product missing in caption definition of order parameter Fig. 2
 
 We have corrected this typo.
 
 Fig. 3a: I suggest to put the expression for q0 in the caption
 
 We have changed q_0 by S_0 and clarified its meaning in the caption of what now is Fig 4.
 
 Paragraph on bottom right of page 6 should several times probably refer to Fig. 3c(...), instead of Fig. 3b
 
 We have corrected this typo.
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

Review 3

AuthorResponse

Summary

Review 1

Annotators

Public_Reviews

URL

arxiv.org/abs/2306.15352v4
www.biorxiv.org www.biorxiv.org

A network regularized linear model to infer spatial expression pattern for single cell

5
1. Public_Reviews 10 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  The study is useful for advancing spatial transcriptomics through its novel regression-based linear model (glmSMA) that integrates single-cell RNA-seq with spatial reference atlases, and its methodological framework is convincing. The approach demonstrates notable utility by enabling higher-resolution cell mapping across multiple biological systems and spatial platforms compared to existing tools.
  
  Summary
2. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Liu et al., present glmSMA, a network-regularized linear model that integrates single-cell RNA-seq data with spatial transcriptomics, enabling high-resolution mapping of cellular locations across diverse datasets. Its dual regularization framework (L1 for sparsity and generalized L2 via a graph Laplacian for spatial smoothness) demonstrates robust performance of their model. It offers novel tools for spatial biology, despite some gaps in fully addressing spatial communication.
  
  The study presents a clear methodological framework that balances sparsity and smoothness, with parameter guidelines for different tissue contexts. It is commendable for its application to multiple spatial omics platforms, including both sequencing-based and imaging-based data, with results that can be generalized across both structured and less-structured tissues. After revision, there is a more transparent discussion of assumptions, including the correlation between expression and physical distance, and how performance may vary by tissue heterogeneity.
  
  Limitations are modest - the spatial communication application is mentioned but not fully developed, and resolution reporting is primarily qualitative, which may limit direct comparability between datasets. The imaging-based validation is currently limited to simulated or lower-plex data, and expansion to high-plex datasets would further support platform versatility, although this is not essential to the core claims.
  
  Overall, the manuscript delivers on its main objective, which is to present and validate a practical, flexible, and accurate framework for spatial mapping. The methods are clearly described, and the resource will be useful for researchers seeking to integrate single-cell and spatial datasets in diverse biological contexts.
  
  Review 1
3. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.
  
  Strengths:
  
  The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.
  
  Comments on revised version:
  
  The authors have sufficiently addressed all of my comments.
  
  Review 2
4. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors have provided a thorough and constructive response to the comments. They effectively addressed concerns regarding the dependence on marker gene selection by detailing the incorporation of multiple feature selection strategies, such as highly variable genes and spatially informative markers (e.g., via Moran's I), which enhance glmSMA's robustness even when using gene-limited reference atlases.
  
  Furthermore, the authors thoughtfully acknowledged the assumption underlying glmSMA-that transcriptionally similar cells are spatially proximal-and discussed both its limitations and empirical robustness in heterogeneous tissues such as human PDAC. Their use of real-world, heterogeneous datasets to validate this assumption demonstrates the method's practical utility and adaptability.
  
  Overall, the response appropriately contextualizes the limitations while reinforcing the generalizability and performance of glmSMA. The authors' clarifications and experimental justifications strengthen the manuscript and address the reviewer's concerns in a scientifically sound and transparent manner.
  
  Review 3
5. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Liu et al., present glmSMA, a network-regularized linear model that integrates single-cell RNA-seq data with spatial transcriptomics, enabling high-resolution mapping of cellular locations across diverse datasets. Its dual regularization framework (L1 for sparsity and generalized L2 via a graph Laplacian for spatial smoothness) demonstrates robust performance of their model and offers novel tools for spatial biology, despite some gaps in fully addressing spatial communication.
  
  Overall, the manuscript is commendable for its comprehensive benchmarking across different spatial omics platforms and its novel application of regularized linear models for cell mapping. I think this manuscript can be improved by addressing method assumptions, expanding the discussion on feature dependence and cell type-specific biases, and clarifying the mechanism of spatial communication.
  
  The conclusions of this paper are mostly well supported by data, but some aspects of model developmentand performance evaluation need to be clarified and extended.
  
  We are thankful for the positive comments and have made changes following the reviewer's advice, as detailed below.
  
  (1) What were the assumptions made behind the model? One of them could be the linear relationship between cellular gene expression and spatial location. In complex biological tissues, non-linear relationships could be present, and this would also vary across organ systems and species. Similarly, with regularization parameters, they can be tuned to balance sparsity and smoothness adequately but may not hold uniformly across different tissue types or data quality levels. The model also seems to assume independent errors with normal distribution and linear additive effects - a simplification that may overlook overdispersion or heteroscedasticity commonly observed in RNA-seq data.
  
  Thank you for this comment. We acknowledge that the non-linear relationships can be present in complex tissues and may not be fully captured by a linear model.
  
  Our choice of a linear model was guided by an investigation of the relationship in the current datasets, which include intestinal villus, mouse brain, and fly embryo.There is a linear correlation between expression distance and physical distance [Nitzan et al]. Within a given anatomical structure, cells in closer proximity exhibit more similar expression patterns (Fig. 3c). In tissues where non-linear relationships are more prevalent—such as the human PDAC sample—our mapping results remain robust. We acknowledge that we have not yet tested our algorithm in highly heterogeneous regions like the liver, and we plan to include such analyses in future work if necessary.
  
  Regarding the regularization parameters, we agree that the balance between sparsity and smoothness is sensitive to tissue-specific variation and data quality. In our current implementation, we explored a range of values to find robust defaults. Supplementary Figure 7 illustrates the regularization path for cell assignment in the fly embryo.
  
  The choice of L1 and L2 regularization parameters is crucial for balancing sparsity and smoothness in spatial mapping.
  
  For Structured Tissues (brain):
  
  Moderate L1 to ensure cells are localized.
  
  Small to moderate L2 to maintain local smoothness without blurring distinct regions.
  
  For Less Structured (PDAC):
  
  Slightly lower L1 to allow cells to be associated with multiple regions if boundaries are ambiguous.
  
  Higher L2 to stabilize mappings in noisy or mixed regions.
  
  (2) The performance of glmSMA is likely sensitive to the number and quality of features used. With too few features, the model may struggle to anchor cells correctly due to insufficient discriminatory power, whereas too many features could lead to overfitting unless appropriately regularized. The manuscript briefly acknowledges this issue, but further systematic evaluation of how varying feature numbers affect mapping accuracy would strengthen the claims, particularly in settings where marker gene availability is limited. A simple way to show some of this would be testing on multiple spatial omics (imaging-based) platforms with varying panel sizes and organ systems. Related to this, based on the figures, it also seems like the performance varies by cell type. What are the factors that contribute to this? Variability in expression levels, RNA quantity/quality? Biases in the panel? Personally, I am also curious how this model can be used similarly/differently if we have a FISH-based, high-plex reference atlas. Additional explanation around these points would be helpful for the readers.
  
  Thank you for this thoughtful comment. The performance of our method is indeed sensitive to the number and quality of selected features. To optimize feature selection, we employed multiple strategies, including Moran’s I statistic, identification of highly variable genes, and the Seurat pipeline to detect anchor genes linking the spatial transcriptomics data with the reference atlas. The number of selected markers depends on the quality of the data. For highquality datasets, fewer than 100 markers are typically sufficient for prediction. To select marker genes, we applied the following optional strategies:
  
  (1) Identifying highly variable genes (HVGs).
  
  (2) Calculating Moran’s I scores for all genes to assess spatial autocorrelation.
  
  (3) Generating anchor genes based on the integration of the reference atlas and scRNA-seq data using Seurat.
  
  We evaluated our method across diverse tissue types and platforms—including Slide-seq, 10x Visium, and Virtual-FISH—which represent both sequencing-based and imaging-based spatial transcriptomics technologies. Our model consistently achieved strong performance across these settings. It's worth noting that the performance of other methods, such as CellTrek [Wei et al] and novoSpaRc [Nitzan et al], also depends heavily on feature selection. In particular, performance degrades substantially when fewer features are used. For fair comparison across different methods, the same set of marker genes was used. Under this condition, our method outperformed the others based on KL divergence (Fig. 2b, Fig. 5g).
  
  To assess the effect of marker gene quantity, we randomly selected subsets of 2,000, 1500, 1,000, 700, 500, and 200 markers from the original set. As the number of markers decreases, mapping performance declines, which is expected due to the reduction in available spatial information. This result underscores the general dependence of spatial mapping accuracy on both the number and quality of informative marker genes (Supplementary Fig. 10).
  
  We do not believe that the observed performance is directly influenced by cell type composition. Major cell types are typically well-defined, and rare cell types comprise only a small fraction of the dataset. For these rare populations, a single misclassification can disproportionately impact metrics like KL divergence due to small sample size. However, this does not necessarily indicate a systematic cell type–specific bias in the mapping. We incorporated a high-resolution Slide-seq dataset from the mouse hippocampus to evaluate the influence of cell type composition on the algorithm’s performance [Stickels et al., 2020]. Most cell types within the CA1, CA2, CA3, and DG regions were accurately mapped to their original anatomical locations (Fig. 5e, f, g).
  
  (3) Application 3 (spatial communication) in the graphical abstract appears relatively underdeveloped. While it is clear that the model infers spatial proximities, further explanation of how these mappings translate into insights into cell-cell communication networks would enhance the biological relevance of the findings.
  
  Thank you for this valuable feedback. We agree that further elaboration on the connection between spatial proximity and cell–cell communication would enhance the biological interpretation of our results. While our current model focuses on inferring spatial relationships, we may provide some cell-cell communications in the future.
  
  (4) What is the final resolution of the model outputs? I am assuming this is dictated by the granularity of the reference atlas and the imposed sparsity via the L1 norm, but if there are clear examples that would be good. In figures (or maybe in practice too), cells seem to be assigned to small, contiguous patches rather than pinpoint single-cell locations, which is a pragmatic compromise given the inherent limitations of current spatial transcriptomics technologies. Clarification on the precise spatial scale (e.g., pixel or micrometer resolution) and any post-mapping refinement steps would be beneficial for the users to make informed decisions on the right bioinformatic tools to use.
  
  Thank you for the comment. For each cell, our algorithm generates a probability vector that indicates its likely spatial assignment along with coordinate information. In our framework, each cell is mapped to one or more spatial spots with associated probabilities. Depending on the amount of regularization through L1 and L2 norms, a cell may be localized to a small patch or distributed over a broader domain (Supplementary Fig. 5 & 7). For the 10x Visium data, we applied a repelling algorithm to enhance visualization [Wei et al]. If a cell’s original location is already occupied, it is reassigned to a nearby neighborhood to avoid overlap. The users can also see the entire regularization path by varying the penalty terms.
  
  Nitzan M, Karaiskos N, Friedman N, Rajewsky N. Gene expression cartography. Nature. 2019;576(7785):132-137. doi:10.1038/s41586-019-1773-3
  
  Wei, R. et al. (2022) ‘Spatial charting of single-cell transcriptomes in tissues’, Nature Biotechnology, 40(8), pp. 1190–1199. doi:10.1038/s41587-022-01233-1.
  
  Stickels, R.R. et al. (2020) ‘Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-SEQV2’, Nature Biotechnology, 39(3), pp. 313–319. doi:10.1038/s41587-020-0739-1.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.
  
  Strengths:
  
  The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.
  
  Weakness:
  
  (1) Although the researchers claim that glmSMA seamlessly accommodates both sequencing-based and image-based spatial transcriptomics (ST) data, their testing primarily focused on sequencingbased ST data, such as Visium and Slide-seq. To demonstrate its versatility for spatial analysis, the authors should extend their evaluation to imaging-based spatial data.
  
  Thank you for the comment. We have tested our algorithm on the virtual FISH dataset from the fly embryo, which serves as an example of image-based spatial omics data (Fig. 4c). However, such datasets often contain a limited number of available genes. To address this, we will conduct additional testing on image-based data if needed. The Allen Brain Atlas provides high-quality ISH data, and we can select specific brain regions from this resource to further evaluate our algorithm if necessary [Lein et al]. Currently, we plan to focus more on the 10x Visium platform, as it supports whole-transcriptome profiling and offers a wide range of tissue samples for analysis.
  
  (2) The definition of "ground truth" for spatial distribution is unclear. A more detailed explanation is needed on how the "ground truth" was established for each spatial dataset and how it was utilized for comparison with the predicted distribution generated by various spatial mapping tools.
  
  Thank you for the comment. To clarify how ground truth is defined across different tissues, we provided the following details. Direct ground truth for cell locations is often unavailable in scRNA-seq data due to experimental constraints. To address this, we adopted alternative strategies for estimating ground truth in each dataset:
  
  10x Visium Data: We used the cell type distribution derived from spatial transcriptomics (ST) data as a proxy for ground truth. We then computed the KL divergence between this distribution and our model's predictions for performance assessment.
  
  Slide-seq Data: We validated predictions by comparing the expression of marker genes between the reconstructed and original spatial data.
  
  Fly Embryo Data: We used predicted cell locations from novoSpaRc as a reference for evaluating our algorithm.
  
  These strategies allowed us to evaluate model performance even in the absence of direct cell location data. In addition, we can apply multiple evaluation strategies within a single dataset.
  
  (3) In the analysis of spatial mapping results using intestinal villus tissue, only Figure 3d supports their findings. The researchers should consider adding supplemental figures illustrating the spatial distribution of single cells in comparison to the ground truth distribu tion to enhance the clarity and robustness of their investigation.
  
  Thank you for the comment. In the intestinal dataset, only six large domains were defined. As a result, the task for this dataset is relatively simple—each cell only needs to be assigned to one of the six domains. As the intestinal villus is a relatively simple tissue, most existing algorithms performed well on it. For this reason, we did not initially provide extensive details in the main text.
  
  (4) The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus. However, the original anatomical regions are not displayed, making it difficult to directly compare them with the predicted mapping results. Providing ground truth distributions for each tested tissue would enhance clarity and facilitate interpretation. For instance, in Figure 2a and Supplementary Figures 1 and 2, only the predicted mapping results are shown without the corresponding original spatial distribution of regions in the mouse cortex. Additionally, in Figure 3c, four anatomical regions are displayed, but it is unclear whether the figure represents the original spatial regions or those predicted by glmSMA. The authors are encouraged to clarify this by incorporating ground truth distributions for each tissue.
  
  Thank you for the comment. To improve visualization, we included anatomical structures alongside the mapping results in the next version, wherever such structures are available (e.g., mouse brain cortex, human PDAC sample, etc.). Major cell type assignments for the PDAC samples, along with anatomical structures, are shown in Supplementary Figure 9. Most of these cell types were correctly mapped to their corresponding anatomical regions.
  
  (5) The cell assignment results from the mouse hippocampus (Supplementary Figure 6) lack a corresponding ground truth distribution for comparison. DG and CA cells were evaluated solely based on the gene expression of specific marker genes. Additional analyses are needed to further validate the robustness of glmSMA's mapping performance on Slide-seq data from the mouse hippocampus.
  
  Thank you for the comment. The ground truth for DG and CA cells was not available. To better evaluate the model's performance, we computed the KL divergence between the original and predicted cell type distributions, following the same approach used for the 10x Visium dataset. We identified a higher-quality dataset for the mouse hippocampus and used it to evaluate our algorithm. Additionally, we employed KL divergence as an alternative strategy to validate and benchmark our results (Fig. 5e, f, g). Most CA cells, including CA1, CA2, and CA3 principal cells, were correctly assigned back to the CA region. Dentate principal cells were accurately mapped to the DG region (Fig. 5e, f).
  
  (6) The tested spatial datasets primarily consist of highly structured tissues with well-defined anatomical regions, such as the brain and intestinal villus. Anatomical regions are not distinctly separated, such as liver tissue. Further evaluation of such tissues would help determine the method's broader applicability.
  
  Thank you for the insightful comment. We agree that many spatial datasets used in our study are from tissues with well-defined anatomical regions. To address the applicability of glmSMA in tissues without clearly separated anatomical structures, we applied glmSMA to the Drosophila embryo, which represents a tissue with relatively continuous spatial patterns and lacks well-demarcated anatomical boundaries compared to organs like the brain or intestinal villus.
  
  Despite this less structured spatial organization, glmSMA demonstrated robust performance in the fly embryo, accurately mapping cells to their correct spatial spots based on gene expression profiles. This result indicates that glmSMA is not strictly limited to highly structured tissues and can generalize to tissues with more continuous or gradient-like spatial architectures. These results suggest that glmSMA has broader applicability beyond highly compartmentalized tissues.
  
  Lein, E., Hawrylycz, M., Ao, N. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007). https://doi.org/10.1038/nature05453
  
  Reviewer #3 (Public review):
  
  The authors aim to develop glmSMA, a network-regularized linear model that accurately infers spatial gene expression patterns by integrating single-cell RNA sequencing data with spatial transcriptomics reference atlases. Their goal is to reconstruct the spatial organization of individual cells within tissues, overcoming the limitations of existing methods that either lack spatial resolution or sensitivity.
  
  Strengths:
  
  (1) Comprehensive Benchmarking:
  
  Compared against CellTrek and Novosparc, glmSMA consistently achieved lower Kullback-Leibler divergence (KL divergence) scores, indicating better cell assignment accuracy.
  
  Outperformed CellTrek in mouse cortex mapping (90% accuracy vs. CellTrek's 60%) and provided more spatially coherent distributions.
  
  (2) Experimental Validation with Multiple Real-World Datasets:
  
  The study used multiple biological systems (mouse brain, Drosophila embryo, human PDAC, intestinal villus) to demonstrate generalizability.
  
  Validation through correlation analyses, Pearson's coefficient, and KL divergence support the accuracy of glmSMA's predictions.
  
  We thank reviewer #3 for their positive feedback and thoughtful recommendations.
  
  Weaknesses:
  
  (1) The accuracy of glmSMA depends on the selection of marker genes, which might be limited by current FISH-based reference atlases.
  
  We agree that the accuracy of glmSMA is influenced by the selection of marker genes, and that current FISH-based reference atlases may offer a limited gene set. To address this, we incorporate multiple feature selection strategies, including highly variable genes and spatially informative genes (e.g., via Moran’s I), to optimize performance within the available gene space. As more comprehensive reference atlases become available, we expect the model’s accuracy to improve further.
  
  (2) glmSMA operates under the assumption that cells with similar gene expression profiles are likely to be physically close to each other in space which not be true under various heterogeneous environments.
  
  Thank you for raising this important point. We agree that glmSMA operates under the assumption that cells with similar gene expression profiles tend to be spatially proximal, and this assumption may not strictly hold in highly heterogeneous tissues where spatial organization is less coupled to transcriptional similarity.
  
  To address this concern, we specifically tested glmSMA on human PDAC samples, which represent moderately heterogeneous environments characterized by complex tumor microenvironments, including a mixture of ductal cells, cancer cells, stromal cells, and other components. Despite this heterogeneity, glmSMA successfully mapped major cell types to their expected anatomical regions, demonstrating that the method is robust even in the presence of substantial cellular diversity and spatial complexity.
  
  This result suggests that while glmSMA relies on the assumption of spatialtranscriptomic correlation, the method can tolerate a reasonable degree of spatial heterogeneity without a significant loss of performance. Nevertheless, we acknowledge that in extremely disorganized or highly mixed tissues where transcriptional similarity is decoupled from spatial proximity, the performance may be affected.
  
  AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.20.624541v2
www.biorxiv.org www.biorxiv.org

Single-cell profiling of trabecular meshwork identifies mitochondrial dysfunction in a glaucoma model that is protected by vitamin B3 treatment

4
1. Public_Reviews 10 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This study provides a fundamental advancement in our understanding of trabecular meshwork cell diversity and its role in eye pressure regulation and glaucoma using multimodal single-cell analysis, spatial validation, and functional testing that go beyond the current state-of-the-art. The study demonstrates that mitochondrial dysfunction, specifically in one of three distinct cell subtypes (TM3), contributes to elevated IOP in a genetic mouse model of glaucoma carrying a mutation in the transcription factor Lmx1b. While the identification of TM3 cells as metabolically specialized is compelling, there is somewhat limited evidence linking mitochondrial dysfunction to the Lmx1b mutation in TM3 cells.
 
 Summary
2. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This study provides a comprehensive single-cell and multiomic characterization of trabecular meshwork (TM) cells in the mouse eye, a structure critical to intraocular pressure (IOP) regulation and glaucoma pathogenesis. Using scRNA-seq, snATAC-seq, immunofluorescence, and in situ hybridization, the authors identify three transcriptionally and spatially distinct TM cell subtypes. The study further demonstrates that mitochondrial dysfunction specifically in one subtype (TM3) contributes to elevated IOP in a genetic mouse model of glaucoma carrying a mutation in the transcription factor Lmx1b. Importantly, treatment with nicotinamide (vitamin B3), known to support mitochondrial health, prevents IOP elevation in this model. The authors also link their findings to human datasets, suggesting the existence of analogous TM3-like cells with potential relevance to human glaucoma.
 
 Strengths:
 
 The study is methodologically rigorous, integrating single-cell transcriptomic and chromatin accessibility profiling with spatial validation and in vivo functional testing. The identification of TM subtypes is consistent across mouse strains and institutions, providing robust evidence of conserved TM cell heterogeneity. The use of a glaucoma model to show subtype-specific vulnerability-combined with a therapeutic intervention-gives the study strong mechanistic and translational significance. The inclusion of chromatin accessibility data adds further depth by implicating active transcription factors such as LMX1B, a gene known to be associated with glaucoma risk. The integration with human single-cell datasets enhances the potential relevance of the findings to human disease.
 
 Weaknesses:
 
 Although the LMX1B transcription factor is implicated as a key regulator in TM3 cells, its role in directly controlling mitochondrial gene expression is not fully explored. Additional analysis of motif accessibility or binding enrichment near relevant target genes could substantiate this mechanistic link. The therapeutic effect of vitamin B3 is clearly demonstrated phenotypically, but the underlying cellular and molecular mechanisms remain somewhat underdeveloped-for instance, changes in mitochondrial function, oxidative stress markers, or NAD+ levels are not directly measured. While the human relevance of TM3 cells is suggested through marker overlap, more quantitative approaches, such as cell identity mapping or gene signature scoring in human datasets, would strengthen the translational connection.
 
 Overall, this is a compelling and carefully executed study that offers significant advances in our understanding of TM cell biology and its role in glaucoma. The integration of multimodal data, disease modeling, and therapeutic testing represents a valuable contribution to the field. With additional mechanistic depth, the study has the potential to become a foundational resource for future research into IOP regulation and glaucoma treatment.
 
 Review 1
3. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 In this study, the authors perform multimodal single-cell transcriptomic and epigenomic profiling of 9,394 mouse TM cells, identifying three transcriptionally distinct TM subtypes with validated molecular signatures. TM1 cells are enriched for extracellular matrix genes, TM2 for secreted ligands supporting Schlemm's canal, and TM3 for contractile and mitochondrial/metabolic functions. The transcription factor LMX1B, previously linked to glaucoma, shows the highest expression in TM3 cells and appears to regulate mitochondrial pathways. In Lmx1bV265D mutant mice, TM3 cells exhibit transcriptional signs of mitochondrial dysfunction associated with elevated IOP. Notably, vitamin B3 treatment significantly mitigates IOP elevation, suggesting a potential therapeutic avenue. This is an excellent and collaborative study involving investigators from two institutions, offering the most detailed single-cell transcriptomic and epigenetic profiling of the mouse limbal tissues-including both TM and Schlemm's canal (SC), from wild-type and Lmx1bV265D mutant mice. The study defines three TM subtypes and characterizes their distinct molecular signatures, associated pathways, and transcriptional regulators. The authors also compare their dataset with previously published murine and human studies, including those by Van Zyl et al., providing valuable cross-species insights.
 
 Strengths:
 
 (1) Comprehensive dataset with high single-cell resolution
 
 (2) Use of multiple bioinformatic and cross-comparative approaches
 
 (3) Integration of 3D imaging of TM and SC for anatomical context
 
 (4) Convincing identification and validation of three TM subtypes using molecular markers.
 
 Weaknesses:
 
 (1) Insufficient evidence linking mitochondrial dysfunction to TM3 cells in Lmx1bV265D mice: While the identification of TM3 cells as metabolically specialized and Lmx1b-enriched is compelling, the proposed link between Lmx1b mutation and mitochondrial dysfunction remains underdeveloped. It is unclear whether mitochondrial defects are a primary consequence of Lmx1b-mediated transcriptional dysregulation or a secondary response to elevated IOP. Although authors have responded to this, the manuscript is not sufficiently altered to address these points. I would like to suggest that authors tone down mitochondrial connection with Lmx1b from the title and abstract, and clearly discuss that these events are associated, and future work is needed to dissect the role of mitochondria in this pathway. Furthermore, the protective effects of nicotinamide (NAM) are interpreted as evidence of mitochondrial involvement, but no direct mitochondrial measurements (e.g., immunostaining, electron microscopy, OCR assays) are provided. It is essential to validate mitochondrial dysfunction in TM3 cells using in vivo functional assays to support the central conclusion of the paper. Without this, the claim that mitochondrial dysfunction drives IOP elevation in Lmx1bV265D mice remains speculative. Alternatively, authors should consider revising their claims that mitochondrial dysfunction in these mice is a central driver of TM dysfunction.
 
 (2) Mechanism of NAM-mediated protection is unclear: The manuscript states that NAM treatment prevents IOP elevation in Lmx1bV265D mice via metabolic support, yet no data are shown to confirm that NAM specifically rescues mitochondrial function. Do NAM-treated TM3 cells show improved mitochondrial integrity? Are reactive oxygen species (ROS) reduced? Does NAM also protect RGCs from glaucomatous damage? Addressing these points would clarify whether the therapeutic effects of NAM are indeed mitochondrial.
 
 Review 2
4. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This study provides a comprehensive single-cell and multiomic characterization of trabecular meshwork (TM) cells in the mouse eye, a structure critical to intraocular pressure (IOP) regulation and glaucoma pathogenesis. Using scRNA-seq, snATAC-seq, immunofluorescence, and in situ hybridization, the authors identify three transcriptionally and spatially distinct TM cell subtypes. The study further demonstrates that mitochondrial dysfunction, specifically in one subtype (TM3), contributes to elevated IOP in a genetic mouse model of glaucoma carrying a mutation in the transcription factor Lmx1b. Importantly, treatment with nicotinamide (vitamin B3), known to support mitochondrial health, prevents IOP elevation in this model. The authors also link their findings to human datasets, suggesting the existence of analogous TM3-like cells with potential relevance to human glaucoma.
 
 Strengths:
 
 The study is methodologically rigorous, integrating single-cell transcriptomic and chromatin accessibility profiling with spatial validation and in vivo functional testing. The identification of TM subtypes is consistent across mouse strains and institutions, providing robust evidence of conserved TM cell heterogeneity. The use of a glaucoma model to show subtype-specific vulnerability, combined with a therapeutic intervention-gives the study strong mechanistic and translational significance. The inclusion of chromatin accessibility data adds further depth by implicating active transcription factors such as LMX1B, a gene known to be associated with glaucoma risk. The integration with human single-cell datasets enhances the potential relevance of the findings to human disease.
 
 We thank the reviewers for their thorough reading of our manuscript and helpful comments.
 
 Weaknesses:
 
 (1) Although the LMX1B transcription factor is implicated as a key regulator in TM3 cells, its role in directly controlling mitochondrial gene expression is not fully explored. Additional analysis of motif accessibility or binding enrichment near relevant target genes could substantiate this mechanistic link.
 
 We show that the Lmx1b mutation induces mitochondrial dysfunction with mitochondrial gene expression changes but agree with the referee in that we do not show direct regulation of mitochondrial genes by LMX1B. Emerging data suggest that LMX1B regulates the expression of mitochondrial genes in other cell types [1, 2] making the direct link reasonable. Future work that is beyond the scope of the current paper will focus on sequencing cells at earlier timepoints to help distinguish gene expression changes associated with the V265D mutation from those secondary to ongoing disease and elevated IOP. Additional studies, including ATAC seq at more ages, ChIP-seq and/or Cut and Run/Tag (in TM cells) will be necessary to directly investigate LMX1B target genes.
 
 As we studied adult mice, mitochondrial gene expression changes could be secondary to other disease induced stresses. Because we did not intend to say we have shown a direct link, we have now added a sentence to the discussion ensure clarity.
 
 Lines 932-934: “Although our studies show a clear effect of the Lmx1b mutation on mitochondria, future studies are needed to determine if LMX1B directly modulates mitochondrial genes in V265D mutant TM cells”
 
 (2) The therapeutic effect of vitamin B3 is clearly demonstrated phenotypically, but the underlying cellular and molecular mechanisms remain somewhat underdeveloped - for instance, changes in mitochondrial function, oxidative stress markers, or NAD+ levels are not directly measured.
 
 We agree that further experiments towards a fuller mechanistic understanding of vitamin B3’s therapeutic effects are needed. Such experiments are planned but are beyond the scope of this paper, which is already very large (7 Figures and 16 Supplemental Figures).
 
 (3) While the human relevance of TM3 cells is suggested through marker overlap, more quantitative approaches, such as cell identity mapping or gene signature scoring in human datasets, would strengthen the translational connection.
 
 We appreciate the reviewer’s suggestion and agree that additional quantitative analyses will further strengthen the translational relevance of TM3 cells. It is not yet clear if humans have a direct TM3 counterpart or if TM cell roles are compartmentalized differently between human cell types. We are currently limited in our ability to perform these comparative analyses. Specifically, we were unable to obtain permission to use the underlying dataset from Patel et al., and our access to the Van Zyl et al. dataset was through the Single Cell Portal, which does not support more complex analyses (ex. cell identity mapping or gene signature scoring). Differences between human studies themselves also affect these comparisons. Future work aimed at resolving differences and standardizing human TM cell annotations, as well as cross species comparisons are needed (working groups exist and this ongoing effort supports 3 human TM cell subtypes as also reported by Van Zyl). This is beyond what we are currently able to do for this paper. We present a comprehensive assessment using readily available published resources.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This elegant study by Tolman and colleagues provides fundamental findings that substantially advance our knowledge of the major cell types within the limbus of the mouse eye, focusing on the aqueous humor outflow pathway. The authors used single-cell and single-nuclei RNAseq to very clearly identify 3 subtypes of the trabecular meshwork (TM) cells in the mouse eye, with each subtype having unique markers and proposed functions. The U. Columbia results are strengthened by an independent replication in a different mouse strain at a separate laboratory (Duke). Bioinformatics analyses of these expression data were used to identify cellular compartments, molecular functions, and biological processes. Although there were some common pathways among the 3 subtypes of TM cells (e.g., ECM metabolism), there also were distinct functions. For example:
 
 TM1 cell expression supports heavy engagement in ECM metabolism and structure, as well as TGFb2 signaling.
 
 TM2 cells were enriched in laminin and pathways involved in phagocytosis, lysosomal function, and antigen expression, as well as End3/VEGF/angiopoietin signaling.
 
 TM3 cells were enriched in actin binding and mitochondrial metabolism.
 
 They used high-resolution immunostaining and in situ hybridization to show that these 3 TM subtypes express distinct markers and occupy distinct locations within the TM tissue. The authors compared their expression data with other published scRNAseq studies of the mouse as well as the human aqueous outflow pathway. They used ATAC-seq to map open chromatin regions in order to predict transcription factor binding sites. Their results were also evaluated in the context of human IOP and glaucoma risk alleles from published GWAS data, with interesting and meaningful correlations. Although not discussed in their manuscript, their expression data support other signaling pathways/ proteins/ genes that have been implicated in glaucoma, including: TGFb2, BMP signaling (including involvement of ID proteins), MYOC, actin cytoskeleton (CLANs), WNT signaling, etc.
 
 In addition to these very impressive data, the authors used scRNAseq to examine changes in TM cell gene expression in the mouse glaucoma model of mutant Lmxb1-induced ocular hypertension. In man, LMX1B is associated with Nail-Patella syndrome, which can include the development of glaucoma, demonstrating the clinical relevance of this mouse model. Among the gene expression changes detected, TM3 cells had altered expression of genes associated with mitochondrial metabolism. The authors used their previous experience using nicotinamide to metabolically protect DBA2/J mice from glaucomatous damage, and they hypothesized that nicotinamide supplementation of mutant Lmx1b mice would help restore normal mitochondrial metabolism in the TM and prevent Lmx1b-mediated ocular hypertension. Adding nicotinamide to the drinking water significantly prevented Lmxb1 mutant mice from developing high intraocular pressure. This is a laudable example of dissecting the molecular pathogenic mechanisms responsible for a disease (glaucoma) and then discovering and testing a potential therapy that directly intervenes in the disease process and thereby protects from the disease.
 
 Strengths:
 
 There are numerous strengths in this comprehensive study including:
 
 Deep scRNA sequencing that was confirmed by an independent dataset in another mouse strain at another university.
 
 Identification and validation of molecular markers for each mouse TM cell subset along with localization of these subsets within the mouse aqueous outflow pathway.
 
 Rigorous bioinformatics analysis of these data as well as comparison of the current data with previously published mouse and human scRNAseq data.
 
 Correlating their current data with GWAS glaucoma and IOP "hits".
 
 Discovering gene expression changes in the 3 TM subgroups in the mouse mutant Lmx1b model of glaucoma.
 
 Further pursuing the indication of dysfunctional mitochondrial metabolism in TM3 cells from Lmx1b mutant mice to test the efficacy of dietary supplementation with nicotinamide. The authors nicely demonstrate the disease modifying efficacy of nicotinamide in preventing IOP elevation in these Lmx1b mutant mice, preventing the development of glaucoma. These results have clinical implications for new glaucoma therapies.
 
 We thank the reviewer for these generous and thoughtful comments on the strengths of this study.
 
 Weaknesses:
 
 (1) Occasional over-interpretation of data. The authors have used changes in gene expression (RNAseq) to implicate functions and signaling pathways. For example: they have not directly measured "changes in metabolism", "mitochondrial dysfunction" or "activity of Lmx1b".
 
 We thank the reviewer for this feedback. We did not intend to overstate and agree. Our gene expression changes support, but do not by themselves prove, metabolic disturbances. We had felt that this was obvious and did not want to clutter the text. We have revised the manuscript to clarify that our conclusions about metabolic changes and LMX1B activity are based on gene expression patterns rather than direct functional assays and have added EM data (see below under “Recommendations for the authors”).
 
 We have also added the following to the results:
 
 Lines 715-721: “Although the documented gene expression changes strongly suggest metabolic and mitochondrial dysfunction, they do not directly prove it. Using electron microscopy to directly evaluate mitochondria in the TM, we found a reduction in total mitochondria number per cell in mutants (P = 0.015, Figure 6G). In addition, mitochondria in mutants had increased area and reduced cristae (inner membrane folds) in mutants consistent with mitochondrial swelling and metabolic dysfunction (all P < 0.001 compared to WT, Figure 6G-H).”
 
 More detailed EM and metabolic studies are underway but are beyond the scope of this paper.
 
 (2) In their very thorough data set, there is enrichment of or changes in gene expression that support other pathways that have been previously reported to be associated with glaucoma (such as TGFb2, BMP signaling, actin cytoskeletal organization (CLANs), WNT signaling, ossification, etc. that appears to be a lost opportunity to further enhance the significance of this work.
 
 We appreciate the reviewer’s suggestions for enhancing the relevance of our work, we had not initially discussed this due to length concerns. We have now incorporated some of this information into the manuscript (see below under “Recommendations for the authors”).
 
 Reviewer #3 (Public review):
 
 Summary: In this study, the authors perform multimodal single-cell transcriptomic and epigenomic profiling of 9,394 mouse TM cells, identifying three transcriptionally distinct TM subtypes with validated molecular signatures. TM1 cells are enriched for extracellular matrix genes, TM2 for secreted ligands supporting Schlemm's canal, and TM3 for contractile and mitochondrial/metabolic functions. The transcription factor LMX1B, previously linked to glaucoma, shows the highest expression in TM3 cells and appears to regulate mitochondrial pathways. In Lmx1bV265D mutant mice, TM3 cells exhibit transcriptional signs of mitochondrial dysfunction associated with elevated IOP. Notably, vitamin B3 treatment significantly mitigates IOP elevation, suggesting a potential therapeutic avenue.
 
 This is an excellent and collaborative study involving investigators from two institutions, offering the most detailed single-cell transcriptomic and epigenetic profiling of the mouse limbal tissues-including both TM and Schlemm's canal (SC), from wild-type and Lmx1bV265D mutant mice. The study defines three TM subtypes and characterizes their distinct molecular signatures, associated pathways, and transcriptional regulators. The authors also compare their dataset with previously published murine and human studies, including those by Van Zyl et al., providing valuable crossspecies insights.
 
 Strengths:
 
 (1) Comprehensive dataset with high single-cell resolution
 
 (2) Use of multiple bioinformatic and cross-comparative approaches
 
 (3) Integration of 3D imaging of TM and SC for anatomical context
 
 (4) Convincing identification and validation of three TM subtypes using molecular markers.
 
 We thank the reviewer for their comments on the strengths of this study.
 
 Weaknesses:
 
 (1) Insufficient evidence linking mitochondrial dysfunction to TM3 cells in Lmx1bV265D mice: While the identification of TM3 cells as metabolically specialized and Lmx1b-enriched is compelling, the proposed link between Lmx1b mutation and mitochondrial dysfunction remains underdeveloped. It is unclear whether mitochondrial defects are a primary consequence of Lmx1b-mediated transcriptional dysregulation or a secondary response to elevated IOP. Additional evidence is needed to clarify whether Lmx1b directly regulates mitochondrial genes (e.g., via ChIP-seq, motif analysis, or ATAC-seq), or whether mitochondrial changes are downstream effects.
 
 We agree and refer the reviewer to our responses to the other referees including Reviewer 1, Comment 1 and Reviewer 2 comments 1 and 17. As noted there, these mechanistic questions are the focus of ongoing and future studies. We have revised the text where appropriate to ensure it accurately reflects the scope of our current data.
 
 (2) Furthermore, the protective effects of nicotinamide (NAM) are interpreted as evidence of mitochondrial involvement, but no direct mitochondrial measurements (e.g., immunostaining, electron microscopy, OCR assays) are provided. It is essential to validate mitochondrial dysfunction in TM3 cells using in vivo functional assays to support the central conclusion of the paper. Without this, the claim that mitochondrial dysfunction drives IOP elevation in Lmx1bV265D mice remains speculative. Alternatively, authors should consider revising their claims that mitochondrial dysfunction in these mice is a central driver of TM dysfunction.
 
 We again refer the reviewer to our other response including Reviewer 1, Comment 1 and Reviewer 2 comments 1 and 17.
 
 (3) Mechanism of NAM-mediated protection is unclear: The manuscript states that NAM treatment prevents IOP elevation in Lmx1bV265D mice via metabolic support, yet no data are shown to confirm that NAM specifically rescues mitochondrial function. Do NAM-treated TM3 cells show improved mitochondrial integrity? Are reactive oxygen species (ROS) reduced? Does NAM also protect RGCs from glaucomatous damage? Addressing these points would clarify whether the therapeutic effects of NAM are indeed mitochondrial.
 
 We refer the reviewer to our response to Reviewer 1, Comment 2.
 
 (4) Lack of direct evidence that LMX1B regulates mitochondrial genes: While transcriptomic and motif accessibility analyses suggest that LMX1B is enriched in TM3 cells and may influence mitochondrial function, no mechanistic data are provided to demonstrate direct regulation of mitochondrial genes. Including ChIP-seq data, motif enrichment at mitochondrial gene loci, or perturbation studies (e.g., Lmx1b knockout or overexpression in TM3 cells) would greatly strengthen this central claim.
 
 We refer the reviewer to our response to Reviewer 1, Comment 1.
 
 (5) Focus on LMX1B in Fig. 5F lacks broader context: Figure 5F shows that several transcription factors (TFs)-including Tcf21, Foxs1, Arid3b, Myc, Gli2, Patz1, Plag1, Npas2, Nr1h4, and Nfatc2exhibit stronger positive correlations or motif accessibility changes than LMX1B. Yet the manuscript focuses almost exclusively on LMX1B. The rationale for this focus should be clarified, especially given LMX1B's relatively lower ranking in the correlation analysis. Were the functions of these other highly ranked TFs examined or considered in the context of TM biology or glaucoma? Discussing their potential roles would enhance the interpretation of the transcriptional regulatory landscape and demonstrate the broader relevance of the findings.
 
 Our analysis (Figure 5F) indicates that Lmx1b is the transcription factor most strongly associated with its predicted target gene expression across all TM cells, as reflected by its highest value along the X-axis. While other transcription factors exhibit greater motif accessibility (Y-axis), this likely reflects their broader expression across TM subtypes. In contrast, Lmx1b is minimally expressed in TM1 and TM2 cells, which may account for its lower motif accessibility overall (motifs not accessible in cells where Lmx1b is not / minimally expressed).
 
 Our emphasis on LMX1B is further supported by its direct genetic association with glaucoma. In contrast, the other transcription factors lack clear links to glaucoma and are supported primarily by indirect evidence. Nonetheless, we agree that the transcription factors highlighted in our analysis are promising candidates for future investigation. However, to maintain focus on the central narrative of this study, we have chosen not to include an extended discussion of these additional genes.
 
 (6) In abstract, they say a number of 9,394 wild-type TM cell transcriptomes. The number of Lmx1bV265D/+ TM cell transcriptomes analyzed is not provided. This information is essential for evaluating the comparative analysis and should be clearly stated in the Abstract and again in the main text (e.g., lines 121-123). Including both wild-type and mutant cell counts will help readers assess the balance and robustness of the dataset.
 
 We thank the reviewer for noticing this oversight and have added this value to the abstract and results section.
 
 Lines 41 and 696: 2,491 mutant TM cells.
 
 (7) Did the authors monitor mouse weight or other health parameters to assess potential systemic effects of treatment? It is known that the taste of compounds in drinking water can alter fluid or food intake, which may influence general health. Also, does Lmx1bV265D/+ have mice exhibit non-ocular phenotypes, and if so, does nicotinamide confer protection in those tissues as well? Additionally, starting the dose of the nicotinamide at postnatal day 2, how long the mice were treated with water containing nicotinamide, and after how many days or weeks IOP was reduced, and how long the decrease in the IOP was sustained.
 
 Water intake was monitored in both treatment groups, and dosing was based on the average volume consumed by adult mice (lines 1017–1018, young pups do not drink water and so drug is largely delivered through mothers’ milk until weaning and so we do not know an accurate dose for young pups). Mouse health was assessed throughout the experiment through regular monitoring of body weight and general condition.
 
 Depending on genetic context, Lmx1b mutations can cause kidney disease and impact other systems. Non-ocular phenotypes were not the focus of this study and were not characterized.
 
 We added a comment to the method to clarify the NAM treatment timeline. NAM was administered continuously in the drinking water starting at P2 and maintained throughout the experiment. IOP was measured beginning at 2 months and then at monthly time points. NAM lessened IOP at 2 and 3 months. We terminated IOP assessment at 3 months.
 
 Lines 1028-1029: “Treatment was started at postnatal day 2 and continued throughout the experiment.”
 
 (8) While the IOP reduction observed in NAM-treated Lmx1bV265D/+ mice appears statistically significant, it is unclear whether this reflects meaningful biological protection. Several untreated mice exhibit very high IOP values, which may skew the analysis. The authors should report the mean values for IOP in both untreated and NAM-treated groups to clarify the magnitude and variability of the response.
 
 We have added supplemental table 7 with the statistical information. Regarding the high IOP values observed in a subset of untreated V265D mutant mice, we consistently detect individual mutant eyes with IOPs exceeding 30 mmHg across independent cohorts and time points [3-5]. It is important to note that IOP is subject to fluctuation and in disease states such as glaucoma, circadian rhythms can be disrupted with stochastic and episodic IOP spikes throughout the day. This may be occurring in those untreated mice. This is also why we strive to use sample sizes of 40 or more. Additionally, we observe that some mutant eyes with IOPs measured within the normal range have anterior chamber deepening (ACD) - a persistent anatomical change associated with sustained or recurrent high IOP that stretches the cornea and may posteriorly displace the lens. This suggests mutant mice experience transient IOP elevations that are not always captured at a single time point due to the stochastic nature of these fluctuations. To account for this, we include ACD as an additional readout alongside IOP measurements. The reduction in ACD observed in NAM-treated mice provides independent evidence supporting the biological relevance of NAM-mediated IOP reduction.
 
 (9) Additionally, since NAM has been shown to protect RGCs in other glaucoma models directly, the authors should assess whether RGCs are preserved in NAM-treated Lmx1b V265D/+ mice. Demonstrating RGC protection would support a synergistic effect of NAM through both IOP reduction and direct neuroprotection, strengthening the translational relevance of the treatment.
 
 We again thank the referee. We note the possibility of dual IOP protection and neuroprotection in the manuscript (lines 961–963). The goal of the present study, however, was to determine mechanisms underlying IOP elevation in patients with LMX1B variants. Therefore, we limited our focus to IOP elevation (LMX1B is expressed in the TM but not RGCs). Studies of the RGCs and optic nerve in V265D mutant mice treated with NAM take considerable effort but are underway. They will be reported in a subsequent manuscript. Initial data support protection, but that is a work in progress.
 
 Additionally, we recently reported a similar pattern of IOP protection to that reported here using pyruvate - in experiments where we analyzed the optic nerve as the focus of the study was assessment of pyruvate as a resilience factor against high genetic risk of glaucoma [4]. In that case, there was statistically significant protection from glaucomatous optic nerve damage, arguing for translational relevance again with a possible synergistic effect through both IOP reduction and direct neuroprotection.
 
 (10) Can the authors add any other functional validation studies to explore to understand the pathways enriched in all the subtypes of TM1, TM2, and TM3 cells, in addition to the ICH/IF/RNAscope validation?
 
 We agree with the reviewer on the importance of further functional validation of pathways active in TM cell subtypes that influence IOP. However, comprehensive investigation of the pathways active in subtypes need to be in future studies. It is beyond the scope of his already large paper.
 
 (11) The authors should include a representative image of the limbal dissection. While Figure S1 provides a schematic, mouse eyes are very small, and dissecting unfixed limbal tissue is technically challenging. It is also difficult to reconcile the claim that the majority of cells in the limbal region are TM and endothelium. As shown in Figure S6, DAPI staining suggests a much higher abundance of scleral cells compared to TM cells within the limbal strip. Additional clarification or visual evidence would help validate the dissection strategy and cellular composition of the captured region.
 
 We appreciate the reviewer’s suggestion and have added additional images to Figure S1 to show our limbal strip dissection. However, we clarify that we do not intend to suggest that TM and endothelial cells are the most abundant populations in these dissected strips. When we say “are enriched for drainage tissues” we mean in comparison to dissecting the anterior segment as a whole. We have clarified this in the text. In fact, epithelial cells (primarily from the cornea) constituted the largest cluster in our dataset (Figure 1A). Additionally, to avoid misinterpretation, we generally refrain from drawing conclusions about the relative abundance of cell types based on sequencing data. Single-cell and single nucleus RNA sequencing results are sensitive to technical factors that alter cell proportions depending on exact methodological details. In our study, TM cells comprised 24.4% of the single-cell dataset and 11.8% of the single-nucleus dataset, illustrating the impact of methodological variability.
 
 Lines 163-164: “Individual eyes were dissected to isolate a strip of limbal tissue, which is enriched for TM cells in comparison to dissecting the anterior segment as a whole.”
 
 Reviewer #1 (Recommendations for the authors):
 
 To enhance the reproducibility and transparency of the findings presented in this study, we strongly recommend that the authors make all analysis scripts and computational tools publicly available.
 
 We agree with the reviewer’s emphasis on transparency and are currently building a GitHub page to share our scripts. However, we did not develop any new tools for this study. All tools that we used are publicly available and provided in our methods section. All data will be available as raw data and through the Broad Institute’s Single Cell Portal.
 
 Reviewer #2 (Recommendations for the authors):
 
 The authors are to be commended for a well-written presentation of high-quality data, their comparisons of datasets (other mouse and human scRNAseq data), correlation with clinical glaucoma risk alleles, and curative therapy for the mouse model of Lmx1b glaucoma. There are several minor suggestions that the authors might consider to further improve their manuscript:
 
 (1) Lines 42-43: Although their data strongly support the role of mitochondrial dysfunction in Lmx1b glaucoma, they might want to soften their conclusion "supports a primary role of mitochondrial dysfunction within TM3 cells initiating the IOP elevation that causes glaucoma".
 
 With the inclusion of EM data supporting mitochondrial dysfunction in Lmx1b mutant TM cells, we have revised this sentence to more accurately reflect our findings.
 
 Lines 42-44 (previously lines 42-43): “Mitochondria in TM cells of V265D/+ mice are swollen with a reduced cristae area, further supporting a role for mitochondrial dysfunction in the initiation of IOP elevation in these mice.”
 
 (2) Figure 1: Why is the shape of the "TM containing" cluster in 1A so different than the cluster shown in 1B?
 
 We isolated cells from the 'TM-containing' cluster and performed unbiased reclustering, which alters their positioning in UMAP space. The figure legend has been updated to clarify this point.
 
 Lines 143-144 “A separate UMAP representation of the trabecular meshwork (TM) containing cluster following subclustering.”
 
 (3) Line 160: change "data was" to "data were"
 
 Corrected
 
 (4) S4 Fig C: Please comment on why the Columbia and Duke heatmaps for TM3 are not as congruent as the heatmaps for TM1 and TM2.
 
 We cannot definitively determine the reason for this. However, differences in tissue processing techniques between the Columbia and Duke preparations may contribute. Such variations have been shown to affect cellular transcriptomes in certain contexts. It is possible that TM3 cells are more susceptible to these effects than others. We have added a statement addressing this point to the figure legend.
 
 Lines 238-240: “Because tissue processing techniques can alter gene expression [52], the heatmap variation between institutes likely reflects differences in processing techniques (Methods) and suggests that TM3 cells are more susceptible to these effects than other cell types.”
 
 (5) S9 Fig: It is very difficult to see any staining for TM1 CHIL1 (2nd panel), TM2 End3 (2nd panel), and TM3 Lypd1 (both panels)
 
 We apologize for the difficulty in visualizing these panels. To improve clarity, we have increased the brightness of all relevant marker signals, within standard bounds, to facilitate easier interpretation.
 
 (6) Line 380: "are significantly higher"; since statistical analysis was not reported, please do not use "significantly"
 
 Done
 
 (7) The authors should consider discussing several of their findings that agree with published literature. For example:
 
 Figure 3B: "Wnt protein binding" (PMID: 18274669), "TGFb "binding" (numerous references), "integrin binding" (work of Donna Peters), "actin binding"/"actin filament binding"/"actin filament bundle" (CLANs references)
 
 S10 Fig c: "ossification" (work of Torretta Borres)
 
 S11 Fig A: ID2/ID3 (PMID: 33938911); (B) BMP4 (PMID: 17325163)
 
 S12 Fig A: MYOC in TM1 cells (numerous references)
 
 We appreciate the reviewer’s diligent review and comments regarding these pathways. We have added a comment to the discussion regarding the agreement of these pathways.
 
 Lines 855-858: In addition, the expression of genes that we document generally agrees with the literature. For example, the following genes and signaling molecules have been reported in TM cells, WNT signaling [78], TGF-β signaling [79-85], integrin binding [86-88], actin cytoskeletal networks [89], calcification genes [90, 91], and Myocilin [91-94].
 
 (8) Line 541: was confocal microscopy used to measure the "3D shapes" of nuclei or was this done with a single image to determine sphericity?
 
 This analysis was performed using confocal microscopy and 3D reconstructed models of the TM nuclei. We have added text to clarify this in the figure legend
 
 Lines 553-556: “To rigorously assess whether TM1 nuclei are more spherical, we analyzed their reconstructed 3D shapes from whole mounts images by confocal microscopy, comparing them to TM3 nuclei using the ‘Sphericity’ tool in Imaris.”
 
 (9) Line 545: please add a close parentheses after "scoring 1"
 
 Done
 
 (10) S15 Fig: (A) There does not appear to be "good agreement" (line 653) between the datasets for TM1. (C) please provide a better explanation on how to interpret these "Confusion Matrix" results.
 
 We understand the referee's concern, the patterns likely appear different to the referee due to limited sampling in snRNA-seq data. Based on our results, TM1 seems particularly susceptible, possibly because these cells do not tolerate the isolation process as well. Although we are confident that TM1 shows good agreement between the two techniques based on our experience, we have revised the language in the text to “generally” to reflect this nuance.
 
 Lines 633-635 (previously line 653): The generated clusters and their marker genes generally agreed with our scRNA-seq analyses (Fig 5A-B, S15A Fig).
 
 We have also added additional clarification for how to interpret the Confusion Matrix.
 
 Lines 669-672: “Colors indicate the fraction of cells identified in each ATAC cluster (row) which are also identified in each RNA cell type (columns), where darker colors represent stronger correspondence between RNA and ATAC clusters.”
 
 (11) Line 676: The transition from discussing the sc/snRNAseq data to the work in Lmx1b mutant mice is quite abrupt and could use a better transition to introduce this metabolism work.
 
 We have revised this transition for improved flow but prefer to keep all transitions brief due to the paper's length.
 
 Lines 691-694 (previously line 676): To evaluate the utility of our new TM cell atlas, we used it to examine how Lmx1b mutations affect the TM cell transcriptome and to identify potential mechanisms underlying IOP elevation. We selected LMX1B because it causes IOP elevation and glaucoma in humans and was identified as a highly active transcription factor in our TM cell dataset.
 
 (12) Lines 696-697: It appears counter-intuitive that upregulation of ubiquitin pathways would lead to proteostasis (proteosome protein degradation requires ubiquination).
 
 We have clarified that the protein tagging pathway was significantly upregulated. However, polyubiquitin precursor itself was downregulated. In general, the statistical significance of the protein tagging pathway suggests perturbation of the system tagging proteins for degradation. We have clarified this in the text.
 
 Lines 711-714 (previously lines 696-697): “In addition, mutant TM3 cells showed an upregulation of protein tagging genes. However, there is a downregulation of the polyubiquitin precursor gene (Ubb, P = 4.5E-30), indicating a general dysregulation of pathways that tag proteins for degradation.”
 
 (13) Line 715: Please justify why "perturbed metabolism" was chosen to pursue vs the other differentially expressed pathways
 
 We chose to narrow our focus on TM3 cells because of the enrichment for Lmx1b expression.Most pathways identified in our analysis of TM3 cells implicate mitochondrial metabolism.Therefore, we chose to further explore this avenue. We clarified that perturbed metabolism was the strongest gene expression signature in the text.
 
 Lines 753-754 (previously line 715): “Our findings most strongly implicate perturbed metabolism within TM3 cells as responsible for IOP elevation in an Lmx1b glaucoma model.”
 
 (14) Line 759: The authors clearly demonstrate that Lmx1b is most expressed in TM3 cells; however, they did not demonstrate that "Lmx1b was most active"
 
 ATAC analysis showed that Lmx1b was most active in TM cells overall. We inferred its activity in TM3 because Lmx1b is most enriched in that subtype. This has been clarified in the text.
 
 Lines 799-800 (previously line 759): “More specifically, we demonstrate that Lmx1b is the most active TM cell TF and is enriched in TM3 cells,…”
 
 (15) Lines 830-835: Please include references documenting increased TGFβ2 concentrations in POAG aqueous humor and TM, effects of TGFβ2 on TM ECM deposition, and TGFβ2 induced ocular hypertension ex vivo and in vivo.
 
 Done.
 
 (16) Line 875: The authors provide no direct evidence for enhances "oxidative stress" in Lmx1b TM3 cells
 
 The mitochondrial abnormalities and changed pathways support oxidative stress, but we have not directly tested this. Experiments are currently underway to evaluate its role, but these additional analyses are beyond the scope of this paper. We removed oxidative stress from the sentence.
 
 Lines 920-922 (previously line 875): “Importantly, in heterozygous mutant V265D/+ mice, TM3 cells had pronounced gene expression changes that implicate mitochondrial dysfunction, but that were absent or much lower in other cells including TM1 and TM2.”
 
 (17) Line 880: Similarly, the authors have not directly assessed effects on metabolism in TM3 cells; they only have shown changes in the expression of mitochondrial genes that may affect metabolism
 
 We have no way to specifically isolating TM3 cells to test this. Future work is underway to test this more broadly in isolated TM cells but is beyond the scope of this is already large paper. Considering our gene expression data and the addition of supporting EM data, we have qualified the text.
 
 Lines 930-931 (previously 880): “Our data extend these published findings by showing that inheritance of a single dominant mutation in Lmx1b similarly affects mitochondria in TM cells.”
 
 (18) Line 892: What markers were used to detect "cell stress"?
 
 We have revised the text. Although our RNA data show stress gene changes, characterization of these markers is beyond the scope of the current study and will be included in a subsequent paper.
 
 Lines 945-948 (previously line 892): “However, these processes were not limited to TM3 cells or even to cell types that express detectable Lmx1b, suggesting that they are secondary damaging processes that are subsequent to the initiating, Lmx1b-induced perturbations in TM3 cells.”
 
 Additional author driven change
 
 While revising and reviewing our data, we identified a coding error that resulted in the WT and V265D mutant group labels being switched in Figure 6. Importantly, the significance of the differentially expressed genes (DEGs), the implicated biological pathways, and the interpretation of pathway directionality in the manuscript remain accurate. The only issue was the incorrect labeling in the figure. We have corrected the labels in Figure 6 to accurately reflect the data. As noted above, all data and code will be made available to ensure full reproducibility of our results.
 
 References
 
 (1) Doucet-Beaupre H, Gilbert C, Profes MS, Chabrat A, Pacelli C, Giguere N, et al. Lmx1a and Lmx1b regulate mitochondrial functions and survival of adult midbrain dopaminergic neurons. Proc Natl Acad Sci U S A. 2016;113(30):E4387-96. Epub 2016/07/14. doi: 10.1073/pnas.1520387113. PubMed PMID: 27407143; PubMed Central PMCID: PMCPMC4968767.
 
 (2) Jimenez-Moreno N, Kollareddy M, Stathakos P, Moss JJ, Anton Z, Shoemark DK, et al. ATG8-dependent LMX1B-autophagy crosstalk shapes human midbrain dopaminergic neuronal resilience. J Cell Biol. 2023;222(5). Epub 2023/04/05. doi: 10.1083/jcb.201910133. PubMed PMID: 37014324; PubMed Central PMCID: PMCPMC10075225.
 
 (3) Cross SH, Macalinao DG, McKie L, Rose L, Kearney AL, Rainger J, et al. A dominantnegative mutation of mouse Lmx1b causes glaucoma and is semi-lethal via LDB1mediated dimerization [corrected]. PLoS Genet. 2014;10(5):e1004359. Epub 2014/05/09. doi: 10.1371/journal.pgen.1004359. PubMed PMID: 24809698; PubMed Central PMCID: PMCPMC4014447.
 
 (4) Li K, Tolman N, Segre AV, Stuart KV, Zeleznik OA, Vallabh NA, et al. Pyruvate and related energetic metabolites modulate resilience against high genetic risk for glaucoma. Elife. 2025;14. Epub 2025/04/24. doi: 10.7554/eLife.105576. PubMed PMID: 40272416; PubMed Central PMCID: PMCPMC12021409.
 
 (5) Tolman NG, Balasubramanian R, Macalinao DG, Kearney AL, MacNicoll KH, Montgomery CL, et al. Genetic background modifies vulnerability to glaucoma-related phenotypes in Lmx1b mutant mice. Dis Model Mech. 2021;14(2). Epub 2021/01/20. doi: 10.1242/dmm.046953. PubMed PMID: 33462143; PubMed Central PMCID: PMCPMC7903917.
 
 AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.01.621152v2
www.biorxiv.org www.biorxiv.org

Organelle membrane-associated proteins recruit cGAS via phase separation to facilitate its membrane localization

5
1. Public_Reviews 10 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This useful study investigates how intrinsically disordered domains can interact to dictate the sub-cellular localization of a major innate immune sensor termed cGAS. The data from various cellular and biochemical assays are mostly solid, but the main conclusions from these experiments need to be validated further. This paper is relevant to immunologists, especially those interested in cytosolic DNA-sensing pathways.
  
  Summary
2. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This manuscript by the Yin group presents interesting findings that organelle-tethered intrinsically disordered "MEMCA" scaffolds, as exemplified by ZDHHC18 at the Golgi and MARCH8 at endosomes, enhance the engagement of cGAS with organelle-proximal condensates, thereby sequestering cGAS from cytosolic DNA sensing and negatively regulating innate immunity.
  
  Strengths:
  
  These findings suggest a previously unrecognized mechanism by which Golgi/endosomal IDR scaffolds modulate cGAS activity, with implications for antiviral defense and tumor immunology. The study is conceptually intriguing and potentially impactful.
  
  Weaknesses:
  
  While the manuscript addresses a novel aspect of cGAS regulation, additional mechanistic insights and targeted validations are needed to ensure robustness:
  
  (1) How do ZDHHC18/MARCH8 enhance cGAS engagement? Do they act as bridges to form a ternary, membrane-tethered cGAS-DNA-MEMCA complex, or alter cGAS condensate properties allosterically?
  
  (2) Is organelle cGAS capture selective? For instance, can other palmitoyltransferases/E3 ligases be substituted for ZDHHC18/MARCH8?
  
  (3) Why does membrane association suppress cGAS enzymic activity, as dsDNA still resides in cGAS condensation?
  
  Review 1
3. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors found that cGAS, a DNA sensor, relocalizes to organelle membranes (ER, Golgi, endosomes) upon DNA stimulation, revealing spatial regulation of its activity. ZDHHC18 and MARCH8 recruit cGAS to Golgi/endosomes via intrinsically disordered regions (IDRs), driving phase-separated condensates. This sequestration of cGAS-dsDNA complexes suppresses innate immune signaling, uncovering a novel regulatory mechanism.
  
  Strengths:
  
  The work overall is very interesting. The authors provided molecular and biochemical evidence.
  
  Weaknesses:
  
  Overall, the work is very interesting. However, the quality of some of the data does need to be improved, and more experiments need to be performed.
  
  The following points need to be addressed:
  
  (1) In Figure S7, no direct binding between cGAS and MARCH8 or ZD18 IDR is observed, and the interaction only occurs after DNA stimulation. However, Figure 5 shows cGAS recruitment to ZD18 or MARCH8 IDR droplets, suggesting direct interactions. This apparent discrepancy should be clarified.
  
  (2) The authors propose that recruiting cGAS to organelle membranes reduces its activity, as demonstrated by the FKBP experiment. However, ZD18 and MARCH8 also post-translationally modify cGAS. Do both mechanisms contribute to this effect, and can the authors test this?
  
  (3) To demonstrate the functional importance of MEMCA, the authors should test IFN production or STING activation in cells.
  
  (4) Does the IDR of MARCH8 or ZD18 influence the interaction between cGAS and DNA?
  
  (5) Which region of cGAS does the IDR of MARCH8 or ZD18 interact with: the cGAS-CD or the cGAS-N-terminus?
  
  (6) The in vitro LLPS experiments with cGAS, DNA, and ZD18/MARCH8 should be conducted under physiological conditions.
  
  Review 2
4. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In this study by Shi et al., the authors evaluate if cGAS is recruited to the membranes of intracellular organelles. Using a combination of biochemical fractionation and imaging techniques, the authors propose that upon recognition of DNA, cGAS translocates to various subcellular locations, including the golgi, endoplasmic reticulum, and endosomes. Mechanistically, the authors propose that upon localizing to the Golgi or endosome, cGAS binding to MARCH8 and ZDHHC18 prevents cGAS activity by incorporating cGAS and dsDNA into biomolecular condensates. However, in its current form, the study does not directly address this question.
  
  Strengths:
  
  The question of evaluating cGAS sub-cellular localization as a mechanism for controlling activity is interesting, and there is some evidence that cGAS is localized to sub-cellular organelle membranes.
  
  Weaknesses:
  
  (1) The well-established nuclear localization of cGAS is not adequately addressed in the cell lines used and is inconsistent with the findings.
  
  (2) Previous studies have shown that ZDHHC18 and MARCH8 control cGAS activity, which detracts somewhat from the novelty.
  
  (3) A lot of inconsistency in the cell lines and artificial expression systems used across the study.
  
  (4) A key element missing is showing that in the absence of ZDHHC18 or MARCH8, the loss of endogenous cGAS localization to the various sub-cellular organelles increases cGAMP synthesis and downstream STING activation in primary cells. There is an over-reliance on artificial expression systems. An important experiment to validate the hypothesis would be to evaluate endogenous cGAS localization in MARCH8- and ZDHHC18-deficient primary cells. Further, there should be evaluation of endogenous STING responses in MARCH8- and ZDHHC18-deficient primary cells in tandem with the localization studies.
  
  (5) There are a large number of grammatical errors throughout the manuscript which should be addressed.
  
  Review 3
5. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Author response:
  
  Below we outline our provisional responses to the major points raised in the public reviews, and our planned revisions:
  
  (1) Mechanistic model of how ZDHHC18/MARCH8 engage the cGAS–DNA condensate (Reviewer #1 & #2
  
  We will add a dedicated subsection and a working-model figure describing our current view: IDRs of ZDHHC18 (Golgi) and MARCH8 (endosomes) engage pre-formed cGAS–DNA condensates at organelle membranes, and thereby tune cGAS activity through PTMs. We will explicitly discuss bridge-like versus allosteric modes by perform additional LLPS experiment (e.g. FRAP assay) to detect any IDR-driven changes in condensate properties, and explain how these scenarios fit our data.
  
  (2) Selectivity beyond ZDHHC18/MARCH8 (Reviewer #1)
  
  We will expand the text to explain existing evidence indicating that, in addition to ZDHHC18 or MARCH8, other post-translational modification (PTM) enzymes and/or membrane-associated scaffolds may also modulate cGAS. We will summarize our current datasets that support this possibility and outline how this selectivity relates to organelle identity.
  
  (3) Why membrane association suppresses cGAS activity (Reviewer #1)
  
  We will provide a concise mechanistic rationale—integrating our published work—to explain how membrane-proximal sequestration can limit cGAS catalysis despite cGAS–DNA coexistence within condensates. Specifically, we will discuss (i) IDR-dependent changes in condensate properties, and (ii) PTMs by ZDHHC18/MARCH8 that allosterically reduce catalytic efficiency; we will clearly cross-reference our prior publications that bear on these points.
  
  (4) Reconciling Fig. S7 (DNA-dependent binding) with Fig. 5 (recruitment to IDR droplets) (Reviewer #2)
  
  We will add text to clarify experimental context and readouts to prove that there is no real contradiction between Fig. S7 and Fig. 5. In the experiment shown in Fig. 5, PEG (a macromolecular crowding agent) was added to the system, which facilitates the formation of IDR phase-separated droplets. Under these conditions, cGAS partitions into the IDR condensates, leading to the observed recruitment. In contrast, Fig. S7 examines the direct physical interaction between cGAS and the IDRs using biochemical pull-down assays and shows that no direct interaction occurs in the absence of DNA. These two results reflect different experimental contexts and are therefore not mutually exclusive.
  
  (5) Planned additional tests to address specificity and mechanism (Reviewer #2)
  
  DNA pull-down: to test whether IDRs alter cGAS–DNA affinity, we will compare cGAS binding to DNA with/without MEMCA IDRs (and with charged-residue mutants).
  
  Domain mapping: to determine which region of cGAS engages MEMCA IDRs, we will map binding using cGAS N-terminus/core-domain truncations and key surface mutants.
  
  Physiological in vitro LLPS: we will repeat cGAS–DNA–IDR LLPS assays under physiological buffer conditions and report partition coefficients, FRAP, and phase diagrams to ensure physiological relevance.
  
  (6) Image clarity and data presentation (Reviewer #2):
  
  We will improve image resolution, add zoomed-in insets with organelle markers, and provide more significant Cy5-ISD signal.
  
  (7) Nuclear localization of cGAS and system considerations (Reviewer #3)
  
  We will explicitly document the nuclear signal of cGAS observed in our confocal experiments, detail the cell lines and expression systems used. We will also clarify cGAS nuclear localization in the cell lines used.
  
  (8) Endogenous validation and cell line consistency (Reviewer #3):
  
  We will perform experiments in primary cells (knockout macrophages) to address the concern of relying on overexpression.
  
  (9) Language and grammar (Reviewer #3):
  
  We will thoroughly revise the manuscript for grammar and clarity.
  
  Together, these planned revisions will strengthen the mechanistic basis of our findings and provide direct evidence for the physiological role of organelle-tethered IDRs in regulating cGAS activity.
  
  AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.01.668185v1
www.biorxiv.org www.biorxiv.org

Dietary sulfur amino acid restriction elicits a cold-like transcriptional response in inguinal but not epididymal white adipose tissue of male mice

4
1. Public_Reviews 10 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  Ruppert et al. investigated how activation of thermogenesis by cold exposure (CE) and methionine restriction (MetR) impacts health and leads to weight loss in mice. The authors provided valuable datasets showing that the responses to MR and CE are tissue-specific, while MR and CE affect beige adipose similarly. Although the study is descriptive, the data analyses are solid, with well-supported conclusions drawn from the findings.
  
  Summary
2. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Activation of thermogenesis by cold exposure and dietary protein restriction are two lifestyle changes that impact health in humans and lead to weight loss in model organisms - here, in mice. How these affect liver and adipose tissues has not been thoroughly investigated side by side. In mice, the authors show that the responses to methionine restriction and cold exposure are tissue-specific, while the effects on beige adipose are somewhat similar.
  
  Strengths:
  
  The strength of the work is the comparative approach, using transcriptomics and bioinformatic analyses to investigate the tissue-specific impact. The work was performed in mouse models and is state-of-the-art. This represents an important resource for researchers in the field of protein restriction and thermogenesis.
  
  Weaknesses:
  
  The findings are descriptive, and the conclusions remain associative. The work is limited to mouse physiology, and the human implications have not been investigated yet.
  
  Review 1
3. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study provides a library of RNA sequencing analysis from brown fat, liver, and white fat of mice treated with two stressors - cold challenge and methionine restriction - alone and in combination (interaction between diet and temperature). They characterize the physiologic response of the mice to the stressors, including effects on weight, food intake, and metabolism. This paper provides evidence that while both stressors increase energy expenditure, there are complex tissue-specific responses in gene expression, with additive, synergistic, and antagonistic responses seen in different tissues.
  
  Strengths:
  
  The study design and implementation are solid and well-controlled. Their writing is clear and concise. The authors do an admirable job of distilling the complex transcriptome data into digestible information for presentation in the paper. Most importantly, they do not overreach in their interpretation of their genomic data, keeping their conclusions appropriately tied to the data presented. The discussion is well thought out and addresses some interesting points raised by their results.
  
  Weaknesses:
  
  The major weakness of the paper is the almost complete reliance on RNA sequencing data, but it is presented as a transcriptomic resource.
  
  Review 2
4. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Ruppert et al. present a well-designed 2×2 factorial study directly comparing methionine restriction (MetR) and cold exposure (CE) across liver, iBAT, iWAT, and eWAT, integrating physiology with tissue-resolved RNA-seq. This approach allows a rigorous assessment of where dietary and environmental stimuli act additively, synergistically, or antagonistically. Physiologically, MetR progressively increases energy expenditure (EE) at 22{degree sign}C and lowers RER, indicating a lipid utilization bias. By contrast, a 24-hour 4 {degree sign}C challenge elevates EE across all groups and eliminates MetR-Ctrl differences. Notably, changes in food intake and activity do not explain the MetR effect at room temperature.
  
  Strengths:
  
  The data convincingly support the central claim: MetR enhances EE and shifts fuel preference to lipids at thermoneutrality, while CE drives robust EE increases regardless of diet and attenuates MetR-driven differences. Transcriptomic analysis reveals tissue-specific responses, with additive signatures in iWAT and CE-dominant effects in iBAT. The inclusion of explicit diet×temperature interaction modeling and GSEA provides a valuable transcriptomic resource for the field.
  
  Weaknesses:
  
  Limitations include the short intervention windows (7 d MetR, 24 h CE), use of male-only cohorts, and reliance on transcriptomics without complementary proteomic, metabolomic, or functional validation. Greater mechanistic depth, especially at the level of WAT thermogenic function, would strengthen the conclusions.
  
  Review 3
Visit annotations in context

Tags

Summary

Review 3

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.06.669020v2
www.biorxiv.org www.biorxiv.org

Explainable machine learning-assisted exploration of chromatin dynamics reveals chromosome-specific response to serum starvation

3
1. Public_Reviews 10 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This interesting study adapts machine learning tools to analyze movements of a chromatin locus in living cells in response to serum starvation. The machine learning approach developed is useful, the experiments are well controlled, and the data are solid. The study would be greatly strengthened by testing key predictions made using perturbation experiments. This work will be of interest to those studying chromosome biology and gene expression patterns.
  
  Summary
2. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Redchuk et al. explore the dynamic properties of chromatin upon serum starvation using machine learning approaches. They use CRISPR-tagging to visualize a region on chromosome 1 in human cells and show that in their system, chromosome 1, but not the previously reported chromosomes 10, 13, and X, undergo a change in radial position upon serum starvation. Live cell imaging showed a position change towards the periphery after serum starvation. They then apply a machine learning algorithm for the analysis of the imaging data, which reveals changes in nuclear area during serum starvation and longer displacements of the chromosome 1 locus near the nuclear periphery. Differential behavior of homologues is also reported.
  
  Strengths:
  
  (1) The study of chromatin dynamics is an interesting and important area of research.
  
  (2) The use of machine learning approaches to analyze live cell imaging data is timely.
  
  (3) With serum starvation, the authors use a simple, well-controllable model system.
  
  Weaknesses:
  
  (1) This study only provides limited new insight into chromatin dynamics.
  
  (2) It was not immediately evident what the use of machine learning approaches added to this study. It appears that the main conclusions could have been reached by conventional analysis.
  
  (3) There are several specific technical points:
  
  a) It was not clear what the CRISRP-Sirius probes actually labelled. The chromosome 1 sgRNA sequence is provided, but I could not find information as to which region(s) of the chromosome are actually labelled (size, location, etc.).
  
  b) The authors visualize a relatively small region of chromosome 1 but make conclusions regarding the entire chromosome. Additional probes on the same chromosome should be used.
  
  Related to this point, the discussion of why the authors are unable to reproduce the prior findings of relocation of chromosomes 10, 13, and X is not satisfying. It would be worth comparing the FISH-based painting of entire chromosomes, which generated the results suggesting relocation of these chromosomes, with the point-labelling method used here.
  
  c) The study lacks controls. Since in their hands chromosomes 10, 13, and X do not change position, they should be used as a negative control in all experiments demonstrating a shift in the location of chromosome 1.
  
  d) I did not find information about the spatial or temporal resolution of the imaging modality. This is important to assess whether the observed changes in position, relative to time, are meaningful.
  
  e) The authors analyze surprisingly early timepoints (up to 40 minutes) of serum starvation. Would these results look different if longer serum starvation timepoints of several hours were analyzed?
  
  f) The authors can do a better job of explaining what the biological meaning of the various parameters (DistR, TDist, etc.) they measure is.
  
  g) I did not understand the reasoning for the authors' conclusion of differential behavior of homologues. Please explain this better, or idealy use more direct labeling methods that identify the individual homologues.
  
  h) In many figures, statistical analysis of the data is missing, including, but not limited to, Figures 1B, C, G, Figures 4, 5, 6.
  
  i) No information is provided throughout the manuscript as to how many cells were analyzed in each experiment. This should be indicated in every figure legend.
  
  Review 1
3. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The study demonstrates that CRISPR-Sirius provides a powerful approach to investigating chromosome dynamics in living cells during environmental stress. By focusing on serum starvation, the authors show that this process induces global nuclear changes, including a reduction in nuclear area and increased morphological dynamism, while at the same time driving specific reorganization of chromosome 1. Chromosome 1 relocates toward the nuclear periphery and displays distinctive patterns of motion, maintaining overall motility but punctuated by occasional long-distance displacements, particularly near the nuclear envelope. Importantly, the analysis reveals that homologous copies of chromosome 1 do not behave uniformly: peripheral loci become more mobile and responsive to starvation, whereas central homologs remain comparatively stable, often associated with nucleolar subcompartments. By integrating live imaging with machine learning and explainable AI analysis, the study highlights the complexity of nuclear organization and provides valuable insights into how chromosome-specific and locus-specific responses to stress are orchestrated within the three-dimensional nuclear landscape.
  
  Strengths:
  
  The study uses live-cell imaging to investigate the dynamics of loci during starvation. Live-cell tracking and data interpretation are carried out using machine learning and AI models, which is a major strength.
  
  Weaknesses:
  
  The manuscript is at times difficult to follow, partly because the methodological descriptions are highly specialized, especially for non-expert biologists. In addition, the observations are not tested for a mechanistic basis. Experiments that could provide deeper insights are missing, for example, why chromosome 1 moves, why the peripheral homologue dislocates, or why a "long jump" is observed at the periphery even though the speed of the loci does not change. It is also unclear whether a displacement of 0.5 μm is functionally meaningful.
  
  Review 2
Visit annotations in context

Tags

Summary

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.08.669316v1
www.biorxiv.org www.biorxiv.org

3D directional tuning in the orofacial sensorimotor cortex during natural feeding and drinking

4
1. Public_Reviews 10 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This study characterises motor and somatosensory cortex neural activity during naturalistic eating and drinking tongue movement in nonhuman primates. The data, which include electrophysiology, three-dimensional tracking of tongue movements, and nerve block manipulations, are valuable to neuroscientists and neural engineers interested in tongue use. Although the current analyses provide a solid description of single neuron activity in these areas, both the population level analyses and the characterisation of activity changes following nerve block could be improved.
 
 Summary
2. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 Hosack and Arce-McShane investigate how the 3D movement direction of the tongue is represented in the orofacial part of the sensory-motor cortex and how this representation changes with the loss of oral sensation. They examine the firing patterns of neurons in the orofacial parts of the primary motor cortex (MIo) and somatosensory cortex (SIo) in non-human primates (NHPs) during drinking and feeding tasks. While recording neural activity, they also tracked the kinematics of tongue movement using biplanar video-radiography of markers implanted in the tongue. Their findings indicate that many units in both MIo and SIo are directionally tuned during the drinking task. However, during the feeding task, directional turning was more frequent in MIo units and less prominent in SIo units. Additionally, in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which seemed to result in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.
 
 Strengths:
 
 The most significant strength of this paper lies in its unique combination of experimental tools. The author utilized a video-radiography method to capture 3D kinematics of the tongue movement during two behavioral tasks while simultaneously recording activity from two brain areas. This specific dataset and experimental setup hold great potential for future research on the understudied orofacial segment of the sensory-motor area.
 
 Weaknesses:
 
 A substantial portion of the paper is dedicated to establishing directional tuning in individual neurons, followed by an analysis of how this tuning changes when sensory feedback is blocked. While such characterizations are valuable, particularly in less-studied motor cortical areas and behaviors, the discrepancies in tuning changes across the two NHPs, coupled with the overall exploratory nature of the study, render the interpretation of these subtle differences somewhat speculative. At the population level, both decoding analyses and state space trajectories from factor analysis indicate that movement direction (or spout location) is robustly represented. However, as with the single-cell findings, the nuanced differences in neural trajectories across reach directions and between baseline and sensory-block conditions remain largely descriptive. To move beyond this, model-based or hypothesis-driven approaches are needed to uncover mechanistic links between neural state space dynamics and behavior.
 
 Review 1
3. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This manuscript by Hosack and Arce-McShane examines the directional tuning of neurons in macaque primary motor (MIo) and somatosensory (SIo) cortex. The neural basis of tongue control is far less studied than, for example, forelimb movements, partly because the tongue's kinematics and kinetics are difficult to measure. A major technical advantage of this study is using biplanar video-radiography, processed with modern motion tracking analysis software, to track the movement of the tongue inside the oral cavity. Compared to prior work, the behaviors are more naturalistic behaviors (feeding and licking water from one of three spouts), although the animals were still head-fixed.
 
 The study's main findings are that:
 
 • A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulation depending on the direction of movement (i.e., exhibited directional tuning). Examining the statistics of tuning across neurons, there was anisotropy (e.g., more neurons preferring anterior movement) and a lateral bias in which tongue direction neurons preferred that was consistent with the innervation patterns of tongue control muscles (although with some inconsistency between monkeys). • Consistent with this encoding, tongue position could be decoded with moderate accuracy even from small ensembles of ~28 neurons. • There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during feeding. This potentially suggests behavioral context-dependent encoding. • The authors then went one step further and used a bilateral nerve block to the sensory inputs (trigeminal nerve) from the tongue. This impaired the precision of tongue movements and resulted in an apparent reduction and change in neural tuning in Mio and SIo.
 
 Strengths:
 
 The data are difficult to obtain and appear to have been rigorously measured, and provide a valuable contribution to this under-explored subfield of sensorimotor neuroscience. The analyses adopt well-established methods especially from the arm motor control literature, and represent a natural starting point for characterizing tongue 3D direction tuning.
 
 Weaknesses:
 
 There are alternative explanations from some of the interpretations, but those interpretations are described in a way that clearly distinguishes results from interpretations, and readers can make their own assessments. Some of these limitations are described in more detail below.
 
 One weakness of the current study is that there is substantial variability in some of the results between monkeys, including the tuning characteristics of primary somatosensory cortex neurons during drinking, and the effect of nerve block on tongue movements and the associated changes in single neuron tuning.
 
 This study focuses on describing directional tuning using the preferred direction (PD) / cosine tuning model popularized by Georgopoulous and colleagues for understanding neural control of arm reaching in the 1980s. This is a reasonable starting point and a decent first order description of neural tuning. However, the arm motor control field has moved far past that viewpoint, and in some ways an over-fixation on static representational encoding models and PDs held that field back for many years. The manuscript benefit from drawing the readers' attention (perhaps in their Discussion) that PDs are a very simple starting point for characterizing how cortical activity relates to kinematics, but that there is likely much richer population-level dynamical structure and that a more mechanistic, control-focused analytical framework may be fruitful. A good review of this evolution in the arm field can be found in Vyas S, Golub MD, Sussillo D, Shenoy K. 2020. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 43(1):249-75. A revised version of the manuscript incorporates more population-level analyses, but with inconsistent use of quantifications/statistics and without sufficient contextualization of what the reader is to make of these results.
 
 The described changes in tuning after nerve block could also be explained by changes in kinematics between these conditions, which temper the interpretation of these interesting results.
 
 I am not convinced of the claim that tongue directional encoding fundamentally changes between drinking and feeding given the dramatically different kinematics and the involvement of other body parts like the jaw (e.g., the reference to Laurence-Chasen et al. 2023 just shows that there is tongue information independent of jaw kinematics, not that jaw movements don't affect these neurons' activities). I also find the nerve block results inconsistent (more tuning in one monkey, less in the other?) and difficult to really learn something fundamental from, besides that neural activity and behavior both change - in various ways - after nerve block (not at all surprising but still good to see measurements of).
 
 The manuscript states that "Our results suggest that the somatosensory cortex may be less involved than the motor areas during feeding, possibly because it is a more ingrained and stereotyped behavior as opposed to tongue protrusion or drinking tasks". An alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somatosensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer. In the revised manuscript the authors note these potential confounds and other limitations in the Discussion.
 
 Review 2
4. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary
 
 In this study, the authors aim to uncover how 3D tongue direction is represented in the Motor (M1o) and Somatosensory (S1o) cortex. In non-human primates implanted with chronic electrode arrays, they use X-ray based imaging to track the kinematics of the tongue and jaw as the animal is either chewing food or licking from a spout. They then correlate the tongue kinematics with the recorded neural activity. They perform both single-unit and population level analyses during feeding and licking. Then, they recharacterize the tuning properties after bilateral lidocaine injections in the two sensory branches of the trigeminal nerve. They report that their nerve block causes a reorganization of the tuning properties and population trajectories. Overall, this paper concludes that M1o and S1o both contain representations of the tongue direction, but their numbers, their tuning properties and susceptibility to perturbed sensory input are different.
 
 Strengths
 
 The major strengths of this paper are in the state-of-the-art experimental methods employed to collect the electrophysiological and kinematic data. In the revision, the single-unit analyses of tuning direction are robustly characterized. The differences in neural correlations across behaviors, regions and perturbations are robust. In addition to the substantial amount of largely descriptive analyses, this paper makes two convincing arguments 1) The single-neuron correlates for feeding and licking in OSMCx are different - and can't be simply explained by different kinematics and 2) Blocking sensory input alters the neural processing during orofacial behaviors. The evidence for these claims is solid.
 
 Weaknesses
 
 The main weakness of this paper is in providing an account for these differences to get some insight into neural mechanisms. For example, while the authors show changes in neural tuning and different 'neural trajectory' shapes during feeding and drinking - their analyses of these differences are descriptive and provide limited insight for the underlying neural computations.
 
 Review 3
Visit annotations in context

Tags

Summary

Review 3

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.02.601741v3
www.biorxiv.org www.biorxiv.org

Active regulation of the epidermal growth factor receptor by the membrane bilayer

3
1. Public_Reviews 10 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  The authors describe an interesting approach to studying the dynamics and function of membrane proteins in different lipid environments. The important findings have theoretical and practical implications beyond the study of EGFR to all membrane signalling proteins. The evidence supporting the conclusions is convincing, based on the use of a nanodisk system to study membrane proteins in vitro, combined with state-of-the-art single-molecule FRET. The work will be of broad interest to cell biologists and biochemists.
  
  Summary
2. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This work addresses a key question in cell signalling: how does the membrane composition affect the behaviour of a membrane signalling protein? Understanding this is important, not just to understand basic biological function but because membrane composition is highly altered in diseases such as cancer and neurodegenerative disease. Although parts of this question have been addressed on fragments of the target membrane protein, EGFR, used here, Srinivasan et al. harness a unique tool, membrane nanodisks, which allow them to probe full-length EGFR in vitro in great detail with cutting-edge fluorescent tools. They find interesting impacts on EGFR conformation in differently charged and fluid membranes, explaining previously identified signalling phenotypes.
  
  Strengths:
  
  The nanodisk system enables full-length EGFR to be studied in vitro and in a membrane with varying lipid and cholesterol concentrations. The authors combine this with single-molecule FRET utilising multiple pairs of fluorophores at different places on the protein to probe different conformational changes in response to EGF binding under different anionic lipid and cholesterol concentrations. They further support their findings using molecular dynamics simulations, which help uncover the full atomistic detail of the conformations they observe.
  
  Weaknesses:
  
  Much of the interpretation of the results comes down to a bimodal model of an 'open' and 'closed' state between the intracellular tail of the protein and the membrane. Some of the data looks like a bimodal model is appropriate, but its use is not sufficiently justified (statistically or otherwise) in this work in its current form. The experiments with varying cholesterol in particular appear to suggest an alternate model with longer fluorescent lifetimes. More justification of these interpretations of the central experiment of this work would strengthen the paper.
  
  Review 1
3. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Nanodiscs and synthesized EGFR are co-assembled directly in cell-free reactions. Nanodiscs containing membranes with different lipid compositions are obtained by providing liposomes with corresponding lipid mixtures in the reaction. The authors focus on the effects of lipid charge and fluidity on EGFR activity.
  
  Strengths:
  
  The authors implement a variety of complementary techniques to analyze data and to verify results. They further provide a new pipeline to study lipid effects on membrane protein function.
  
  Weaknesses:
  
  Due to the relative novelty of the approach, a number of concerns remain.
  
  (1) I am a little skeptical about the good correlation of the nanodisc compositions with the liposome compositions. I would rather have expected a kind of clustering of individual lipid types in the liposome membrane, in particular of cholesterol. This should then result in an uneven distribution upon nanodisc assembly, i.e., in a notable variation of lipid composition in the individual nanodiscs. Could this be ruled out by the implemented assays, or can just the overall lipid composition of the complete nanodisc fraction be analyzed?
  
  (2) Both templates have been added simultaneously, with a 100-fold excess of the EGFR template. Was this the result of optimization? How is the kinetics of protein production? As EGFR is in far excess, a significant precipitation, at least in the early period of the reaction, due to limiting nanodiscs, should be expected. How is the oligomeric form of the inserted EGFR? Have multiple insertions into one nanodisc been observed?
  
  (3) The IMAC purification does not discriminate between EGFR-filled and empty nanodiscs. Does the TEM study give any information about the composition of the particles (empty, EGFR monomers, or EGFR oligomers)? Normalizing the measured fluorescence, i.e., the total amount of solubilized receptor, with the total protein concentration of the samples could give some data on the stoichiometry of EGFR and nanodiscs.
  
  (4) The authors generally assume a 100% functional folding of EGFR in all analyzed environments. While this could be the case, with some other membrane proteins, it was shown that only a fraction of the nanodisc solubilized particles are in functional conformation. Furthermore, the percentage of solubilized and folded membrane protein may change with the membrane composition of the supplied nanodiscs, while non-charged lipids mostly gave rather poor sample quality. The authors normalize the ATP binding to the total amount of detectable EGFR, and variations are interpreted as suppression of activity. Would the presence of unfolded EGFR fractions in some samples with no access to ATP binding be an alternative interpretation?
  
  Review 2
Visit annotations in context

Tags

Summary

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.14.670284v1
www.biorxiv.org www.biorxiv.org

Death receptor 6 does not regulate axon degeneration and Schwann cell injury responses during Wallerian degeneration

4
1. Public_Reviews 10 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 In this valuable study, through carefully executed and rigorously controlled experiments, the authors challenged a previously reported role of the Death Receptor 6 (DR6/Tnfrsf21) in Wallerian degeneration (WD). Using two DR6 knockout mouse lines and multiple WD assays, both in vitro and in vivo, the authors provided convincing evidence that loss of DR6 in mice does not protect peripheral axons from WD after injury. Questions remain about whether this conclusion is generalizable to CNS axonal degeneration in disease models such as ALS, AD, and prion diseases. In addition, the authors need to provide information about the sex, age, and genetic background of their animal studies to allow readers to better assess the basis for inconsistencies from previous reports on the protective effects of DR6.
 
 Summary
2. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors show that genetic deletion of the orphan tumor necrosis factor receptor DR6 in mice does not protect peripheral axons against degeneration after axotomy. Similarly, Schwann cells in DR6 mutant mice react to axotomy similarly to wild-type controls. These negative results are important because previous work has indicated that loss or inhibition of DR6 is protective in disease models and also against Wallerian degeneration of axons following injury. This carefully executed counterexample is important for the field to consider.
 
 Strengths:
 
 A strength of the paper is the use of two independent mouse strains that knock out DR6 in slightly different ways. The authors confirm that DR6 mRNA is absent in these models (western blots for DR6 protein are less convincingly null, but given the absence of mRNA, this is likely an issue of antibody specificity). One of the DR6 knockout strains used is the same strain used in a previous paper examining the effects of DR6 on Wallerian degeneration.
 
 The authors use a series of established assays to evaluate axon degeneration, including light and electron microscopy on nerve histological samples and cultured dorsal root ganglion neurons in which axons are mechanically severed and degeneration is scored in time-lapse microscopy. These assays consistently show a lack of effect of loss of DR6 on Wallerian degeneration in both mouse strains examined.
 
 Therefore, in the specific context of these experiments, the author's data support their conclusion that loss of DR6 does not protect against Wallerian degeneration.
 
 Weaknesses:
 
 The major weaknesses of this paper include the tone of correcting previously erroneous results and the lack of reporting on important details around animal experiments that would help determine whether the results here really are discordant with previous studies, and if so, why.
 
 The authors do not report the genetic strain background of the mice used, the sex distributions of their experimental cohorts, or the age of the mice at the time the experiments were performed. All of these are important variables.
 
 The DR6 knockout strain reported in Gamage et al. (2017) was on a C57BL/6.129S segregating background. Gamage et al. reported that loss of DR6 protected axons from Wallerian degeneration for up to 4 weeks, but importantly, only in 38.5% (5 out of 13) mice they examined. In the present paper, the authors speculate on possible causes for differences between the lack of effect seen here and the effects reported in Gamage et al., including possible spontaneous background mutations, epigenetic changes, genetic modifiers, neuroinflammation, and environmental differences. A likely explanation of the incomplete penetrance reported by Gamage et al. is the segregating genetic background and the presence of modifier loci between C57BL/6 and 129S. The authors do not report the genetic background of the mice used in this study, other than to note that the knockout strain was provided by the group in Gamage et al. However, if, for example, that mutation has been made congenic on C57BL/6 in the intervening years, this would be important to know. One could also argue that the results presented here are consistent with 8 out of 13 mice presented in Gamage et al.
 
 Age is also an important variable. The protective effects of the spontaneous WldS mutation decrease with age, for example. It is unclear whether the possible protective effects of DR6 also change with age; perhaps this could explain the variable response seen in Gamage et al. and the lack of response seen here.
 
 It is unclear if sex is a factor, but this is part of why it should be reported.
 
 The authors also state that they do not see differences in the Schwann cell response to injury in the absence of DR6 that were reported in Gamage et al., but this is not an accurate comparison. In Gamage et al., they examined Schwann cells around axons that were protected from degeneration 2 and 4 weeks post-injury. Those axons had much thinner myelin, in contrast to axons protected by WldS or loss of Sarm1, where the myelin thickness remained relatively normal. Thus, Gamage et al. concluded that the protection of axons from degeneration and the preservation of Schwann cell myelin thickness are separate processes. Here, since no axon protection was seen, the same analysis cannot be done, and we can only say that when axons degenerate, the Schwann cells respond the same whether DR6 is expressed or not.
 
 The authors also take issue with Colombo et al. (2018), where it was reported that there is an increase in axon diameter and a change in the g-ratio (axon diameter to fiber diameter - the axon + myelin) in peripheral nerves in DR6 knockout mice. This change resulted in a small population of abnormally large axons that had thinner myelin than one would expect for their size. The change in g-ratio was specific to these axons and driven by the increased axon diameter, not decreased myelin thickness, although those two factors are normally loosely correlated. Here, the authors report no changes in axon size or g-ratio, but this could also be due to how the distribution of axon sizes was binned for analysis, and looking at individual data points in supplemental figure 3A, there are axons in the DR6 knockout mice that are larger than any axons in wild type. Thus, this discrepancy may be down to specifics and how statistics were performed or how histograms were binned, but it is unclear if the results presented here are dramatically at odds with the results in Colombo et al. (2018).
 
 Finally, it is important to note that previously reported effects of DR6 inhibition, such as protection of cultured cortical neurons from beta-amyloid toxicity, are not necessarily the same as Wallerian degeneration of axons distal to an injury studied here. The negative results presented here, showing that loss of DR6 is not protective against Wallerian degeneration induced by injury, are important given the interest in DR6 as a therapeutic target, but they are specific to these mice and this mechanism of induced axon degeneration. The extent to which these findings contradict previous work is difficult to assess due to the lack of detail in describing the mouse experiments, and care should be taken in attempting to extrapolate these results to other disease contexts, such as ALS or Alzheimer's disease.
 
 Review 1
3. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This manuscript by Beirowski, Huang, and Babetto revisits the proposed role of Death Receptor 6 (DR6/Tnfrsf21) in Wallerian degeneration (WD). A prior study (Gamage et al., 2017) suggested that DR6 deletion delays axon degeneration and alters Schwann cell responses following peripheral nerve injury. Here, the authors comprehensively test this claim using two DR6 knockout mouse models (the line used in the earlier report plus a CMV-Cre derived floxed ko line) and multiple WD assays in vivo and in vitro, aligned with three positive controls, Sarm1 WldS and Phr1/Mycbp2 mutants. Contrary to the prior findings, they find no evidence that DR6 deletion affects axon degeneration kinetics or Schwann cell dynamics (assessed by cJun expression or [intact+degenerating] myelin abundance after injury) during WD. Importantly, in DRG explant assays, neurites from DR6-deficient mice degenerated at rates indistinguishable from controls. The authors conclude that DR6 is dispensable for WD, and that previously reported protective effects may have been due to confounding factors such as genetic background or spontaneous mutations.
 
 Strengths:
 
 The authors employ two independently generated DR6 knockout models, one overlapping with the previously published study, and confirm loss of DR6 expression by qPCR and Western blotting. Multiple complementary readouts of WD are applied (structural, ultrastructural, molecular, and functional), providing a robust test of the hypothesis.
 
 Comparisons are drawn with established positive controls (WldS, SARM1, Phr1/Mycbp2 mutants), reinforcing the validity of the assays.
 
 By directly addressing an influential but inconsistent prior report, the manuscript clarifies the role of DR6 and prevents potential misdirection of therapeutic strategies aimed at modulating WD in the PNS. The discussion thoughtfully considers possible explanations for the earlier results, including colony-specific second-site mutations that could explain the incomplete penetrance of the earlier reported phenotype of only 36%.
 
 Weaknesses:
 
 (1) The study focuses on peripheral nerves. The manuscript frequently refers to CNS studies to argue for consistency with their findings. It would be more accurate to frame PNS/CNS similarities as reminiscences rather than as consistencies (e.g., line 205ff in the Discussion).
 
 (2) The DRG explant assays are convincing, though the slight acceleration of degeneration in the DR6 floxed/Cre condition is intriguing (Figure 4E). Could the authors clarify whether this is statistically robust or biologically meaningful?
 
 (3) In the summary (line 43), the authors refer to Hu et al. (2013) (reference 5) as the study that previously reported AxD delay and SC response alteration after injury. However, this study did not investigate the PNS, and I believe the authors intended to reference Gamage et al. (2017) (reference 10) at this point.
 
 (4) In line 74ff of the results section, the authors claim that developmental myelination is not altered in DR6 mutants at postnatal day 1. However, the variability in Figure S2 appears substantial, and the group size seems underpowered to support this claim. Colombo et al. (2018) (reference 11) reported accelerated myelination at P1, but this study likewise appears underpowered. Possible reasons for these discrepancies and the large variability could be that only a defined cross-sectional area was quantified, rather than the entire nerve cross-section.
 
 (5) The authors stress the data of Gamage et al. (2017) on altered SC responses in DR6 mutants after injury. They employed cJun quantification to show that SC reprogramming after injury is not altered in DR6 mutants. This approach is valid and the conclusion trustworthy. Here, the addition of data showing the combined abundance of intact and degenerated myelin does not add much insight. However, Gamage et al. (2017) reported altered myelin thickness in a subset of axons at 14 days after injury, which is considerably later than the time points analyzed in the present study. While, in the Reviewer's view, the thin myelin observed by Gamage et al. in fact resembles remyelination, the authors may wish to highlight the difference in the time points analyzed.
 
 Review 2
4. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 The authors revisit the role of DR6 in axon degeneration following physical injury (Wallerian degeneration), examining both its effects on axons and its role in regulating the Schwann cell response to injury. Surprisingly, and in contrast to previous studies, they find that DR6 deletion does not delay the rate of axon degeneration after injury, suggesting that DR6 is not a mediator of this process.
 
 Overall, this is a valuable study. As the authors note, the current literature on DR6 is inconsistent, and these results provide useful new data and clarification. This work will help other researchers interpret their own data and re-evaluate studies related to DR6 and axon degeneration.
 
 Strengths:
 
 (1) The use of two independent DR6 knockout mouse models strengthens the conclusions, particularly when reporting the absence of a phenotype.
 
 (2) The focus on early time points after injury addresses a key limitation of previous studies. This approach reduces the risk of missing subtle protective phenotypes and avoids confounding results with regenerating axons at later time points after axotomy.
 
 Weaknesses:
 
 (1) The study would benefit from including an additional experimental paradigm in which DR6 deficiency is expected to have a protective effect, to increase confidence in the experimental models, and to better contextualize the findings within different pathways of axon degeneration. For example, DR6 deletion has been shown in more than one study to be partially axon protective in the NGF deprivation model in DRGs in vitro. Incorporating such an experiment could be straightforward and would strengthen the paper, especially if some of the neuroprotective effects previously reported are confirmed.
 
 (2) The quality of some figures could be improved, particularly the EM images in Figure 2. As presented, they make it difficult to discern subtle differences.
 
 Review 3
Visit annotations in context

Tags

Summary

Review 3

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.07.21.665928v1
www.biorxiv.org www.biorxiv.org

Dynamic Architecture of Mycobacterial Outer Membranes Revealed by All-Atom Simulations

3
1. Public_Reviews 10 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 In their study, Brown et. al. provide an important advance in understanding the architecture of the mycobacterial outer membrane. Using all-atom simulations of model mycomembranes, the work reports compelling structural insights into how α-mycolic acids and outer leaflet lipids (PDIM and PAT) shape membrane organisation. The work revealed membrane heterogeneity with ordered inner leaflets and disordered outer leaflets that provide a molecular explanation for the resilience of the mycobacterial envelope.
 
 Summary
2. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Disclaimer:
 
 This reviewer is not an expert on MD simulations but has a basic understanding of the findings reported and is well-versed with mycobacterial lipids.
 
 Summary:
 
 In this manuscript titled "Dynamic Architecture of Mycobacterial Outer Membranes Revealed by All-Atom 1 Simulations", Brown et al describe outcomes of all-atom simulation of a model outer membrane of mycobacteria. This compelling study provided three key insights: (1) The likely conformation of the unusually long chain alpha-branched beta-methoxy fatty acids, mycolic acids in the mycomembrane, to be the extended U or Z type rather than the compacted W-type. (2) Outer leaflet lipids such as PDIM and PAT provide regional vertical heterogeneity and disorder in the mycomembrane that is otherwise prevented in a mycolic acid-only bilayer. (3) Removal of specific lipid classes from the symmetric membrane systems leads to significant changes in membrane thickness and resilience to high temperatures.
 
 Strengths:
 
 The authors take a step-wise approach in building the complexity of the membrane and highlight the limitations of each of the approaches. A case in point is the use of supraphysiological temperature of 333 K or even higher temperatures for some of the simulations. Overall, this is a very important piece of work for the mycobacterial field, and will help in the development of membrane-disrupting small molecules and provide important insights for lipid-lipid interactions in the mycomembrane.
 
 Weaknesses:
 
 (1) The authors used alpha-mycolic acids only for their models. The ratios of alpha, keto, and methoxy-mycolic acids are known in the literature, and it may be worth including these in their model. Future studies can be aimed at addressing changes in the dynamic behavior of the MOM by altering this ratio, but the inclusion of all three forms in the current model will be important and may alter the other major findings of the current study.
 
 (2) The findings from the 14 different symmetric membrane systems developed with the removal of one complex lipid at a time are very interesting but have not been analysed/discussed at length in the current manuscript. I find many interesting insights from Figures S3 and S5, which I find missing in the manuscript. These are as follows:
 
 a) Loss of PDIM resulted in reduced membrane thickness. This is a very important finding given that loss of PDIM can be a spontaneous phenomenon in Mtb cultures in vitro and that this is driven by increased nutrient uptake by PDIM-deficient bacilli (Domenech and Reed, 2009 Microbiology). While the latter is explained by the enhanced solute uptake by several PE/PPE transporter systems in the absence of PDIM (Wang et al, Science 2020), the findings presented by Brown et al could be very important in this context. A discussion on these aspects would be beneficial for the mycobacterial community.
 
 b) I find it interesting that loss of PAT or DAT does not change membrane thickness (Figure S3). While both PAT and PDIM can migrate to the interleaflet space, loss of PDIM and PAT has a different impact on membrane thickness. It is worth explaining what the likely interactions are that shape membrane thickness in the case of the modelled MOM.
 
 c) Figure S5: Is the presence of SGL driving PDIM and PAT to migrate to the inter-leaflet space? Again, a discussion on major lipid-lipid interactions driving these lipid migrations across the membrane thickness would be useful.
 
 Review 1
3. Public_Reviews 10 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The manuscript reports all-atom molecular dynamics simulations on the outer membrane of Mycobacterium tuberculosis. This is the first all-atom MD simulation of the MTb outer membrane and complements the earlier studies, which used coarse-grained simulation.
 
 Strengths:
 
 The simulation of the outer membrane consisting of heterogeneous lipids is a challenging task, and the current work is technically very sound.
 
 The observation about membrane heterogeneity and ordered inner leaflets vs disordered outer leaflets is a novel result from the study. This work will also facilitate other groups to work on all-atom models of mycobacterial outer membrane for drug transport, etc.
 
 Weaknesses:
 
 Beyond a challenging simulation study, the current manuscript only provides qualitative explanations on the unusual membrane structure of MTb and does not demonstrate any practical utility of the all-atom membrane simulation. It will be difficult for the general biology community to appreciate the significance of the work, based on the manuscript in its current form, because of the high content of technical details and limited evidence on the utility of the work.
 
 Major Points:
 
 (1) The simulation by Basu et al (Phys Chem Chem Phys 2024) has studied drug transports through mycolic acid monolayers. Since the authors of the current study have all atom models of MTb outer membrane, they should carry out drug transport simulations and compare them to the outer membranes of other bacteria through which drugs can permeate. In the current manuscript, it is only discussed in lines 388-392. Can the disruption of MA cyclopropanation be simulated to show its effect on membrane structure ?
 
 (2) In line 277, the authors mention about 6 simulations which mimic lipid knockout strains. The results of these simulations, specifically the outcomes of in silico knockout of lipids, are not described in detail.
 
 (3) Figure 5 shows PDIM and PAT-driven lipid redistribution, which is a significant novel observation from the study. However, comparison of 3B and 3D shows that at 313K, the movement of the PDIM head group is much less. Since MD simulations are sensitive to random initial seeds, repeated simulations with different random seeds and initial structures may be necessary.
 
 (4) As per Figure 1, in the initial structure, the head group of PAT should be on the membrane surface, similar to TDM and TMM, while PDIM is placed towardsthe interior of the outer membrane. However, Figure 5 shows that at t=0, PAT has the same Z position as PDIM. It will be necessary to provide Z-position Figures for TMM and TDM to understand the difference. Is it really dependent on the chemical structure of the lipid moiety or the initial position of the lipid in the bilayer at the beginning of the simulation?
 
 Minor Point:
 
 In view of the complexity of the system undertaken for the study, the manuscript in its current form may not be informative for readers who are not experts in molecular simulations.
 
 Review 2
Visit annotations in context

Tags

Summary

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.05.24.655956v1
www.biorxiv.org www.biorxiv.org

Overexpression of Ssd1 and calorie restriction extend yeast replicative lifespan by preventing deleterious age-dependent iron uptake

4
1. Public_Reviews 10 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This important study uses innovative microfluidics-based single-cell imaging to monitor replicative lifespan, protein localization, and intracellular iron levels in aging yeast cells. The evidence for the proposed role of Ssd1 and reduced nutrients for lifespan through limiting iron uptake is convincing, even though some mechanistic details remain unclear. This work will be of interest to cell biologists working on aging and iron metabolism.
  
  Summary
2. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Overexpression of the mRNA-binding protein Ssd1 was shown before to expand the replicative lifespan of yeast cells, whereas ssd1 deletion had the opposite effect. Here, the authors provide evidence that Ssd1 acts via sequestration of mRNAs of the Aft1/2-dependent iron regulon. This restricts activation of the regulon and limits accumulation of Fe2+ inside cells, thereby likely lowering oxidative damage. The effects of Ssd1 overexpression and calorie restriction on lifespan are epistatic, suggesting that they might act through the same pathway.
  
  Strengths:
  
  The study is well-designed and involves analysis of single yeast cells during replicative aging. The findings are well displayed and largely support the derived model, which also has implications for the lifespan of other organisms, including humans.
  
  Weaknesses:
  
  The model is largely supported by the findings, however, they remain largely correlative at the same time. Whether the knockout of ssd1 shortens lifespan by increased intracellular Fe2+ levels has not been tested. The finding that increased Ssd1 levels form condensates in a cell-cycle-dependent manner is interesting, yet the role of the condensates in lifespan expansion remains untested and unlinked.
  
  Review 1
3. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  This manuscript describes the use of a powerful technique called microfluidics to elucidate the mechanisms explaining how overexpression (OE) of Ssd1 and caloric restriction (CR) in yeast extend replicative lifespan (RLS). Microfluidics measures RLS by trapping cells in chambers mounted to a slide. The chambers hold the mother cell but allow daughters to escape. The slide, with many chambers, is recorded during the entire process, roughly 72 hours, with the video monitored afterwards to count how many daughters each of the trapped mothers produces. The power of the method is what can be done with it. For example, the entire process can be viewed by fluorescence so that GFP and mCherry-tagged proteins can be followed as cells age. The budding yeast is the only model where bona fide replicative aging can be measured, and microfluidics is the only system that allows protein localization and levels to be measured in a single cell while aging. The authors do a wonderful job of showing what this combination of tools can do.
  
  The authors had previously shown that Ssd1, an mRNA-binding protein, extends RLS when overexpressed. This was attributed to Ssd1 sequestering away specific mRNAs under stress, likely leading to reduced ribosomal function. It remained completely unknown how Ssd1 OE extended RLS. The authors observed that overexpressed, but not normally expressed, Ssd1 formed cytoplasmic condensates during mitosis that are resolved by cytokinesis. When the condensates fail to be resolved at the end of mitosis, this signals death.
  
  It has become clear in the literature that iron accumulation increases with age within the cell. The transcriptional programs that activate the iron regulon also become elevated in aging cells. This is thought to be due to impaired mitochondrial function in aging cells, with increased iron accumulation as an attempt at restoring mitochondrial activity. The authors show that Ssd1 OE and CR both reduce the expression of the iron regulon. The data presented indicate that iron accumulation shortens RLS: deletion of iron regulon components extends RLS, and adding iron to WT cells decreases RLS, but not when Ssd1 is overexpressed or when cells are calorically restricted. Interestingly, iron chelation using BPS has no impact on WT RLS, but decreases the elevated RLS in CR cells and cells overexpressing Ssd1. It was not initially clear why iron chelation would inhibit the extended lifespan seen with CR and Ssd1 OE. This was addressed by an experiment where it was shown that the iron regulon is induced (FIT2 induction) when iron is chelated. Thus, the detrimental effects of induction of the iron regulon by BPS and iron accumulation on RLS cannot be tempered by Ssd1 OE and CR once turned on.
  
  I did not find any weaknesses to be addressed in this paper. The draft was well-written, and the extensive experimentation was well-designed, performed, and controlled. However, I did make minor comments that I recommend the authors address:
  
  (1) Why would BPS not reduce RLS in WT cells? The authors could test whether OE of FIT2 reduces RLS in WT cells.
  
  (2) The authors should add a brief explanation for why the GDP1 promoter was chosen for Ssd1 OE.
  
  (3) On page 12, growth to saturation was described as glucose starvation. This is more accurately described as nutrient deprivation. Referring to it as glucose starvation is akin to CR, which growing to saturation is not. Ssd1 OE formed condensates upon saturation but not in CR. Why do the authors think Ssd1 OE did not form condensates upon CR? Too mild a stress?
  
  (4) The authors conclude that the main mechanism for RLS extension in CR and Ssd1 OE is the inhibition of the iron regulon in aging cells. The data certainly supports this. However, this may be an overstatement as other mutations block CR, such as mutations that impair respiration. The authors do note that induction of the iron regulon in aging cells could be a response to impaired mitochondrial function. Thus, it seems that the main goal of CR and Ssd1 OE may be to restore mitochondrial function in aging cells, one way being inactivation of the iron regulon. A discussion of how other mutations impact CR would be of benefit.
  
  (5) The cell cycle regulation of Ssd1 OE condensates is very interesting. There does not appear to be literature linking Ssd1 with proteasome-dependent protein turnover. Many proteins involved in cell cycle regulation and genome stability are regulated through ubiquitination. It is not necessary to do anything here about it, but it would be interesting to address how Ssd1 condensates may be regulated with such precision.
  
  (6) While reading the draft, I kept asking myself what the relevance to human biology was. I was very impressed with the extensive literature review at the end of the discussion, going over how well conserved this strategy is in yeast with humans. I suggest referring to this earlier, perhaps even in the abstract. This would nail down how relevant this model is for understanding human longevity regulation.
  
  In conclusion, I enjoyed reading this manuscript, describing how Ssd1 OE and CR lead to RLS increases, using different mechanisms. However, since the 2 strategies appear to be using redundant mechanisms, I was surprised that synergism was not observed.
  
  Review 2
4. Public_Reviews 10 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  In this paper, the authors investigate how the RNA-binding protein Ssd1 and calorie restriction (CR) influence yeast replicative lifespan, with a particular focus on age-dependent iron uptake and activation of the iron regulon. For this, they use microfluidics-based single-cell imaging to monitor replicative lifespan, protein localization, and intracellular iron levels across aging cells. They show that both Ssd1 overexpression and CR act through a shared pathway to prevent the nuclear translocation of the iron-regulon regulator Aft1 and the subsequent induction of high-affinity iron transporters. As a result, these interventions block the age-related accumulation of intracellular free iron, which otherwise shortens lifespan. Genetic and chemical epistasis experiments further demonstrate that suppression of iron regulon activation is the key mechanism by which Ssd1 and CR promote replicative longevity.
  
  Overall, the paper is technically rigorous, and the main conclusions are supported by a substantial body of experimental data. The microfluidics-based assays in particular provide compelling single-cell evidence for the dynamics of Ssd1 condensates and iron homeostasis.
  
  My main concern, however, is that the central reasoning of the paper-that Ssd1 overexpression and CR prevent the activation of the iron regulon-appears to be contradicted by previous findings, and the authors may actually be misrepresenting these studies, unless I am mistaken. In the manuscript, the authors state on two occasions:
  
  "Intriguingly, transcripts that had altered abundance in CR vs control media and in SSD1 vs ssd1∆ yeast included the FIT1, FIT2, FIT3, and ARN1 genes of the iron regulon (8)"
  
  "Ssd1 and CR both reduce the levels of mRNAs of genes within the iron regulon: FIT1, FIT2, FIT3 and ARN1 (8)"
  
  However, reference (8) by Kaeberlein et al. actually says the opposite:
  
  "Using RNA derived from three independent experiments, a total of 97 genes were observed to undergo a change in expression >1.5-fold in SSD1-V cells relative to ssd1-d cells (supplemental Table 1 at http://www.genetics.org/supplemental/). Of these 97 genes, only 6 underwent similar transcriptional changes in calorically restricted cells (Table 2). This is only slightly greater than the number of genes expected to overlap between the SSD1-V and CR datasets by chance and is in contrast to the highly significant overlap in transcriptional changes observed between CR and HAP4 overexpression (Lin et al. 2002) or between CR and high external osmolarity (Kaeberlein et al. 2002). Intriguingly, of the 6 genes that show similar transcriptional changes in calorically restricted cells and SSD1-V cells, 4 are involved in iron-siderochrome transport: FIT1, FIT2, FIT3, and ARN1 (supplemental Table 1 at http://www.genetics.org/supplemental/)."
  
  Although the phrasing might be ambiguous at first reading, this interpretation is confirmed upon reviewing Matt Kaeberlein's PhD thesis: https://dspace.mit.edu/handle/1721.1/8318 (page 264 and so on).
  
  Moreover, consistent with this, activation of the iron regulon during calorie restriction (or the diauxic shift) has also been observed in two other articles:
  
  https://doi.org/10.1016/S1016-8478(23)13999-9
  
  https://doi.org/10.1074/jbc.M307447200
  
  Taken together, these contradictory data might blur the proposed model and make it unclear how to reconcile the results.
  
  Review 3
Visit annotations in context

Tags

Summary

Review 3

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.02.673772v1
arxiv.org arxiv.org

Fragmentation and aggregation of cyanobacterial colonies

4
1. Public_Reviews 10 Oct 2025
 
 in eLife (unscoped)
 
 eLife Assessment
 
 With the goal of investigating the assembly and fragmentation of cellular aggregates, this manuscript investigates cyanobacterial aggregates in a laboratory setting. This investigation of the conditions and mechanisms behind aggregation is an important contribution as it yields basic understanding of natural processes and offers potential strategies for control. The combination of computational and experimental investigations in this manuscript provides solid support for the role of shear on aggregation and fragmentation. However, the role of extracellular matrix, with possibly a strong effect on aggregation, is not adequately studied.
 
 Summary
2. Public_Reviews 10 Oct 2025
 
 in eLife (unscoped)
 
 Reviewer #1 (Public review):
 
 Sinzato et. al. investigated how shear flow in a rheological chamber affects the assembly and fragmentation of cyanobacterial aggregates, with the goal of understanding how such aggregates might form naturally, and/or be destroyed industrially. The authors used a combination of experiments and models to show that cyanobacterial colonies can be difficult to fragment with fluid flows. Additionally, they provide biophysical support for the idea that such aggregates likely form primarily when cells stay together after cell division, rather than coming together from disparate paths.
 
 This work has significant relevance to the field, both practically and naturally. Combatting or preventing toxic cyanobacterial blooms is an active area of environmental research that offers a practical backbone for this manuscript's ideas. Additionally, the formation and behavior of cellular aggregates in general is of widespread interest in many fields, including marine and freshwater ecology, healthcare and antibiotic resistance research, biophysics, and microbial evolution. In this field, there are still outstanding questions regarding how microbial aggregates form into communities, including if and how they come together from separate places. Therefore, I believe that researchers from many distinct fields would find interest in the topic of this paper, and particularly Figure 5, in which a phase space that is meant to represent the different modes of aggregate formation and destruction is suggested, dependent on properties of the fluid flow and particle concentration.
 
 Altogether, the authors were successful in their investigation, and I find their claims to be justified. In particular, the authors achieve strong results from their experiments. Below, I outline key claims of the paper and indicate the level to which they were supported by their data.
 
 Their first major claim is that fluid flows alone must be quite strong in order to fragment the cyanobacterial aggregates they have studied. With their rheological chamber, they explicitly show that energy dissipation rates must exceed "natural" conditions by multiple orders of magnitude in order to fragment lab strain colonies, and even higher to disrupt natural strains sampled from a nearby freshwater lake. This claim is well-supported by their experiments and data.
 
 The authors then claim that the fragmentation of aggregates due to fluid flows occurs primarily through erosion of small pieces from larger aggregates. Because their experimental setup does not allow them to directly observe this process (for example, by watching one aggregate break into pieces), they rely on indirect methods to support the claim. Overall, the experimental evidence is generally supportive, but the models leave some gaps. I describe this conclusion in more detail below.
 
 The strongest evidence for the erosion-dominated process comes from the authors' measurements of transfer of biomass between large and small size classes, as in Figure 2E and Figure 2D. The authors claim that only the erosion model can reproduce this kind of biomass transfer. However, it also seems that the idealized erosion model alone is not fully sufficient to capture the observed behavior. In Figure 2D, there remains a gap between their experiment and the prediction of the erosion model, which grows larger over time (Supplemental Figure S9). While the authors suggest that the erosion model is better than the equal-fragmentation model, it is also true that tracking the mean size (Figure 2B) or small size distribution (Figure S6) cannot distinguish between these models.
 
 Taken altogether, the experimental evidence favors an erosion-dominated process. However, a few minor questions remain regarding the models. Why does the equal-fragmentation model predict no biomass transfer between size classes? To what extent, quantitatively, does the erosion model outperform the equal fragments model at capturing the biomass size distributions? Finally, why does the idealized erosion fail to capture the size distribution at late stages in Supplemental Figure S9 - would this discrepancy be resolved if the authors considered individual colony variances in cell adhesion (for instance, as hypothesized by the authors in lines 133-137)? I do not believe these questions curb the other results of the paper.
 
 Their third major claim is that fluid flows only weakly cause cells to collide and adhere in a "coming together" process of aggregate formation. They test this claim in Figure 3, where they suspend single cells in their test chamber and stir them at moderate intensity, monitoring their size histogram. They show that the size histogram changes only slightly, indicating that aggregation is, by-and-large, not occurring at a high rate. Therefore, they lend support to the idea that cell aggregation likely does not initiate group formation in toxic cyanobacterial blooms. Additionally, they show that the median size of large colonies also does not change at moderate turbulent intensities. These results agree with previous studies (their own citation 25) indicating that aggregates in toxic blooms are clonal in nature. This is an important result, and well-supported by their data, but only for this specific particle concentration and stirring intensity. Later, in Figure 5 they show a much broader range of particle concentrations and energy dissipation rates that they leave untested. However, they refer to other literature that does test these regions of the phase map.
 
 The fourth major result of the manuscript is displayed in Equation 8 and Figure 5, where the authors derive an expression for the ratio between the rate of increase of a colony due to aggregation vs. the rate due to cell division. They then plot this line on a phase map, altering two physical parameters (concentration and fluid turbulence) to show under what conditions aggregation vs. cell division are more important for group formation. Because these results are derived from relatively simple biophysical considerations, they have the potential to be quite powerful and useful, and represent a significant conceptual advance. By combining their experiments with discussions of other experimental investigations of scum formation in cyanobacterial blooms, the authors have investigated the two most relevant zones of this map for the present study (Zones II and III), and have made a strong contribution to the literature in regards to artificial mixing to disrupt cyanobacterial blooms.
 
 Other notes:
 
 The authors rely heavily on size distributions to make the claims of their paper. I was pleased to find the calibration histograms in Supplemental Figure S8, which provide information as to how and why they made corrections to the histograms they observed. From these calibration histograms, it seems that larger colonies are more accurately measured in the cone-and-plate shear setup, while smaller colonies can be missed, presumably due to resolution issues.
 
 Review 1
3. Public_Reviews 10 Oct 2025
 
 in eLife (unscoped)
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this work, the authors investigate the role of fluid flow in shaping the colony size of a freshwater cyanobacterium Microcystis. To do so, they have created a novel assay by combining a rheometer with a bright field microscope. This allows them to exert precise shear forces on cyanobacterial cultures and field samples, and then quantify the effect of these shear forces on the colony size distribution. Shear force can affect the colony size in two ways: reducing size by fragmentation and increasing size by aggregation. They find limited aggregation at low shear rates, but high shear forces can create erosion-type fragmentation: colonies do not break in large pieces, but many small colonies are sheared off the large colonies. Overall, bacterial colonies from field samples seem to be more inert to shear than laboratory cultures, which the authors explain in terms of enhanced intercellular adhesion mediated by secreted polysaccharides.
 
 Strengths:
 
 This study is timely, as cyanobacterial blooms are an increasing problem in freshwater lakes. They are expected to increase in frequency and severeness because of rising temperatures, and it is worthwhile learning how these blooms are formed. More generally, how physical aspects such as flow and shear influence colony formation is often overlooked, at least in part because of experimental challenges. Therefore, the method developed by the authors is useful and innovative, and I expect applications beyond the presented system here.
 
 A strong feature of this paper is the highly quantitative approach, combining theory with experiments, and the combination of laboratory experiments and field samples.
 
 Weaknesses:
 
 Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. The writing could have done more justice to the fact that the importance of adhesion had been described elsewhere. This being said, the same method can be used to investigate systems where shear forces are biologically more relevant.
 
 Review 2
4. Public_Reviews 10 Oct 2025
 
 in eLife (unscoped)
 
 Author response:
 
 The following is the authors’ response to the original reviews
 
 Reviewer #1 (Public review):
 
 (1) Their first major claim is that fluid flows alone must be quite strong in order to fragment the cyanobacterial aggregates they have studied. With their rheological chamber, they explicitly show that energy dissipation rates must exceed "natural" conditions by multiple orders of magnitude in order to fragment lab strain colonies, and even higher to disrupt natural strains sampled from a nearby freshwater lake. This claim is well-supported by their experiments and data.
 
 We thank the reviewer for this positive comment. We fully agree, as our fragmentation experiments on division-formed colonies clearly demonstrate their strong mechanical resistance in naturally occurring flows.
 
 (2) The authors then claim that the fragmentation of aggregates due to fluid flows occurs through erosion of small pieces. Because their experimental setup does not allow them to explicitly observe this process (for example, by watching one aggregate break into pieces), they implement an idealized model to show that the nature of the changes to the size histogram agrees with an erosion process. However, in Figure 2C there is a noticeable gap between their experiment and the prediction of their model. Additionally, in a similar experiment shown in Figure S6, the experiment cannot distinguish between an idealized erosion model and an alternative, an idealized binary fission model where aggregates split into equal halves. For these reasons, this claim is weakened.
 
 The two idealized models of colony fragmentation, namely erosion of single cells and fragmentation into equal sizes (or binary fission), lead to distinguishable final size distributions. We believe that our experiments for division-formed colonies support the hypothesis of the erosion mechanism. Specifically, Figure 2E shows that colony fragmentation resulted in a decrease of large colonies and a strong increase of single cells and dimers (two cells). In our view, the strong increase of single cells and dimers provides quite convincing (but indirect) evidence supporting the erosion mechanism. This is described on lines 112-121. To further address the reviewer’s concern, we have included in the revised version of Figure 2 (panels B and D) a direct comparison between these two fragmentation models for large division-formed colonies fragmented at a high dissipation rate of ε = 5.8 m2/s3. Furthermore, we have included the new Supplementary Figure S9, which details the model predictions for the colony size distribution at various time points.
 
 The ideal equal fragments model (i.e., where every fracture event produces two identical fragments with half the original biovolume) does not capture the biovolume transfer from large colonies to single cells, as observed for the experimental results in panel D of Figure 2 and panel E of Figure S9. In contrast, the erosion model, in panel D of Figure 2 and panel D of Figure S9, provides a good prediction of the experimental results within the experimental uncertainty. The different fragmentation models are discussed in lines 226-228 of the revised manuscript and lines 865-873 of the SI.
 
 (3) Their third major claim is that fluid flows only weakly cause cells to collide and adhere in a "coming together" process of aggregate formation. They test this claim in Figure 3, where they suspend single cells in their test chamber and stir them at moderate intensity, monitoring their size histogram. They show that the size histogram changes only slightly, indicating that aggregation is, by and large, not occurring at a high rate. Therefore, they lend support to the idea that cell aggregation likely does not initiate group formation in toxic cyanobacterial blooms. Additionally, they show that the median size of large colonies also does not change at moderate turbulent intensities. These results agree with previous studies (their own citation 25) indicating that aggregates in toxic blooms are clonal in nature. This is an important result and well-supported by their data, but only for this specific particle concentration and stirring intensity. Later, in Figure 5 they show a much broader range of particle concentrations and energy dissipation rates that they leave untested.
 
 We thank the reviewer for this positive comment. We agree that our experimental results show clear evidence that aggregated colonies have a weaker structure in comparison to division-formed colonies, thus supporting the hypothesis that clonal expansion is the main mechanism for colony formation under most natural settings. The range of energy dissipation rates of our experimental setup covers almost entirely the region for which aggregated and division-formed colonies differ in their fragmentation behavior (Zone III of Figure 5). Within this zone, aggregated colonies are fragmented and only the division-formed colonies are able to withstand the hydrodynamic stresses. Furthermore, we show that this fragmentation behavior has a low sensitivity to the total biovolume fraction, as displayed in the Supplementary Figures S2 and S4 and discussed in lines 151-154 and 160-163. We agree that our cone-and-plate setup covers a limited parameter range, and we have added a detailed discussion of these limitations in the revised manuscript, under section Materials and Methods in lines 462-473.
 
 (4) The fourth major result of the manuscript is displayed in Equation 8 and Figure 5, where the authors derive an expression for the ratio between the rate of increase of a colony due to aggregation vs. the rate due to cell division. They then plot this line on a phase map, altering two physical parameters (concentration and fluid turbulence) to show under what conditions aggregation vs. cell division are more important for group formation. Because these results are derived from relatively simple biophysical considerations, they have the potential to be quite powerful and useful and represent a significant conceptual advance. However, there is a region of this phase map that the authors have left untested experimentally. The lowest energy dissipation rate that the authors tested in their experiment seemed to be \dot{epsilon}~1e-2 [m^2/s^3], and the highest particle concentration they tested was 5e-4, which means that the authors never tested Zone II of their phase map. Since this seems to be an important zone for toxic blooms (i.e. the "scum formation" zone), it seems the authors have missed an important opportunity to investigate this regime of high particle concentrations and relatively weak turbulent mixing.
 
 We agree with the reviewer that Zone (II) of Figure 5 is of great importance to dense bloom formation under wind mixing and that this parameter range was not covered by our experiments using a cone-and-plate shear flow. The measuring range of our device was motivated by engineering applications such as artificial mixing of eutrophic lakes using bubble plumes, as well as preliminary experiments which demonstrated that high levels of dissipation rate were required to achieve fragmentation. The range of dissipation rates that can be achieved by the cone-and-plate setup is limited at the lower end by the accumulation of colonies near the stagnation point at the conical tip and at the upper end by the spillage of fluid out of the chamber. We now discuss this measuring range in lines 462-473 of the revised manuscript.
 
 Although our setup does not cover Zone (II), we now refer to recent results in the literature for evidence of aggregation-dominance at Zone (II). The experimental study of Wu et al. (2024) (reference number 64 of the revised manuscript) investigated the formation of Microcystis surface scum layers in wind-mixed mesocosms. Their study identified aggregation of colonies in the scum layer, resulting in increases of colony size at rates faster than cell division. These results agree with our model, and the parameters range investigated fall within the Zone II. We have included in the revised version, lines 328-337, a detailed discussion elucidating the parameter range covered in our experiments and the findings of Wu et al. (2024).
 
 Other items that could use more clarity:
 
 (5) The authors rely heavily on size distributions to make the claims of their paper. Yet, how they generated those size distributions is not clearly shown in the text. Of primary concern, the authors used a correction function (Equation S1) to estimate the counts of different size classes in their image analysis pipeline. Yet, it is unclear how well this correction function actually performs, what kinds of errors it might produce, and how well it mapped to the calibration dataset the authors used to find the fit parameters.
 
 We agree with the reviewer that more details of the correction function should be included. We have included in the revised version of the Supporting Information, in lines 785-796, a more detailed explanation of the correction function. Furthermore, a direct comparison of raw and corrected histograms of the size distribution and its associated uncertainty is presented in the new Supplementary Figure S8.
 
 (6) Second, in their models they use a fractal dimension to estimate the number of cells in the group from the group radius, but the agreement between this fractal dimension fit and the data is not shown, so it is not clear how good an approximation this fractal dimension provides. This is especially important for their later derivation of the "aggregation-to-cell division" ratio (Equation 8)
 
 We agree with the reviewer that more details on the estimation of fractal dimension are needed. The revised version, under Materials and Methods in lines 508-515, now includes the detailed estimation procedure, the number of colonies analysed, and the associated uncertainty.
 
 Reviewer #1 (Recommendations For The Authors):
 
 In light of the weak evidence for claim #2 outlined above, I believe the paper would benefit from a more explicit comparison in Figure 2C of the two models - idealized erosion, and idealized binary fission. With such a comparison, the authors would have stronger footing to claim that one process is more important than the other.
 
 As mentioned in our answer above to comment #2 of public review, we have included in the revised version of Figure 2 (panels B and D) a direct comparison between the erosion and equal fragments (binary fission) models for large division-formed colonies fragmented under ε = 5.8 m2/s3. The comparison is further detailed in the new Supplementary Figure S9 for representative time points. Only the erosion models can recover the biovolume transfer from large colonies to single cells, as observed for the experimental results in Figure 2D and further detailed in Figure S9D. We believe that the revised version of Figure 2 and the new Supplementary Figure S9 provide strong evidence in support of the erosion fragmentation model.
 
 Would the authors comment on their chosen range of experimental dissipation rates? For instance, was their goal more to investigate industrial/engineering applications where the goal is to disrupt the cyanobacteria, but not really typical natural conditions under which the groups might form?
 
 The choice of experimental dissipation rates in our experiment was such that it covers engineering applications such as artificial mixing of eutrophic lakes using bubble plumes. We have now clarified in the Introduction, on lines 37-39, that artificial mixing has been successfully applied in several lakes to suppress cyanobacterial blooms. Furthermore, we have now clarified in the caption of Figure 5 that the bars on the right side indicate typical values of dissipation rates induced by natural wind-mixing, bubble plumes in artificially mixed lakes, and laboratory-scale experiments such as cone-and-plate systems and stirred tanks. The dissipation rates induced by the bubble plumes in artificially mixed lakes could potentially fragment aggregated cyanobacterial colonies and thus disrupt bloom formation. However, our preliminary experiments demonstrated that high levels of dissipation rate were required to achieve fragmentation, therefore we’ve focused on the upper range of values (0.01 to 10 m2/s3).
 
 The dissipation rates generated by the cone-and-plate approach are indeed higher than the dissipation rates under typical natural conditions in lakes. We have now added a detailed discussion of the range of dissipation rates generated by the cone-and-plate approach in the revised manuscript, under section Materials and Methods in lines 462-473, where we also explain that these values are higher than the natural dissipation rates generated by wind action in lakes. However, the more generic insights obtained by our study, shown in Figure 5, are relevant for dissipation rates of natural lakes (e.g., Zone II). Therefore, in our discussion of Figure 5 we have now included the recent findings of Wu et al. (2024) (reference number [64] of the revised manuscript), who studied bloom formation of Microcystis in mesocosm experiments at dissipation rates representative of natural conditions; see also our reply to the next comment.
 
 The authors should consider testing the space of Zone II on their phase map, for instance at very high particle concentrations and even lower rotational speeds, in order to show that their derivations match experiments.
 
 Good point. As mentioned in our answer above to comment #4 of the public review, Zone II lies beyond the measuring range of our experimental setup. Instead, we refer to the recent study of Wu et al. (2024) (reference number [64] of the revised manuscript) which demonstrated that dense scum layers of Microcystis colonies are aggregation-dominated. These mesocosm experiments agree with our model predictions and their parameter range falls within Zone II. We have included in the revised version, lines 328-337, a detailed discussion where we elucidate the parameter range covered in our experiments and compare our predictions for Zone II with the recent findings of Wu et al. (2024).
 
 The authors should show their calibration data and fit for the correction function of equation S1. Additionally, you may consider showing "raw" and "corrected" histograms of the size distribution, to demonstrate exactly what corrections are made.
 
 As mentioned in our answer above to comment #5 of the public review, we have included in the revised version of the Supporting Information the new Supplementary Figure S8, which shows the raw and adjusted histograms of the size distribution, including the associated uncertainties. Furthermore, the correction function is now explained in detail in the new Supporting Information Text in lines 785-796.
 
 The authors might consider commenting on Figure S3 a bit more in the main text. Even at very high dissipation rates, the cyanobacterial groups don't plummet to size 1, but stay in an equilibrium around 10-20x the diameter of a single cell. What might this mean for industrial applications trying to break up the groups?
 
 We agree with the reviewer that further discussion of Figure S3, panels E and F, is warranted. In the revised version of the manuscript, under section Fragmentation of Microcystis colonies occurs through erosion in lines 133-137, we have now included a discussion of this figure. Figure S3F shows that more than 90% of the total biovolume ends up in the category “small colonies” (mostly single cells and dimers); hence, most of the initially large colonies do fragment to single cells or dimers. Only about 5-10% of the biovolume remains as “large colonies” of 10-20 cells. Although it is challenging to draw definitive conclusions about the behavior of these remaining large colonies, as they account for only a minor fraction of the suspension, one hypothesis is that variability in mechanical properties between colonies results in a subset of colonies exhibiting exceptional resistance even to very high dissipation rates (see lines 133-137).
 
 Minor comments:
 
 Typo Caption of Figure 2: Should read [m^2/s^3] for units
 
 Thanks for catching this typo. The units in the caption of Figure 2 has been corrected to [m^2/s^3].
 
 There is no Equation 10 in Materials and Methods as indicated in the rheology section.
 
 We thank the reviewer for pointing out the lack of clarity in this algebraic manipulation. In fact, the yield stress has to be substituted in the current Equation 11 (previously Eq.10), from which the critical dissipation rate must be substituted in Equation 3. The result is the critical colony size (l* = 2.8) mentioned in line 243 of the revised manuscript. The correct equation numbers and algebraic substitutions are now indicated in lines 241-243 of the revised version of the manuscript.
 
 <Reviewer #2 (Public review):
 
 Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. As the importance of adhesion had been described elsewhere, it is not clear what this study reveals about cyanobacterial colonies that was not known before.
 
 We would like to emphasize several key findings that our study reveals about the impacts of fluid flow on cyanobacterial colonies:
 
 (I) Quantification of mechanical strength in cyanobacterial colonies: Our results demonstrate the high mechanical strength of cyanobacterial colonies, as evidenced by the requirement of high shear rates to achieve fragmentation. This is new knowledge, that was not known before for cyanobacterial colonies. To this end, our study highlights the resilience of these colonies against naturally occurring flows and bridges the gap between theoretical assumptions about colony strength and experimentally measured mechanical properties.
 
 (II) The discovery that the mechanical strength of colonies differs between colonies formed by cell division and colonies formed by aggregation. This is again new knowledge, that was not known before for cyanobacterial colonies.
 
 (III) Validation of a hypothesis regarding colony formation: Using a fluid-mechanical approach, we confirm the findings of recent genetic studies (references 25 and 67 of the revised version of the manuscript) which indicated that colony formation occurs predominantly via cell division rather than cell aggregation under natural conditions (except in very dense blooms).
 
 (IV) Practical guidelines for cyanobacterial bloom control: Our findings provide valuable insights into the design of artificial mixing systems applied in several lakes. Artificial mixing of lakes is based on fundamentals of fluid flow, aiming at preventing aggregation of buoyant cyanobacteria in scum layers at the water surface. Our results show that the dissipation rates generated by bubble blumes in artificially mixed lakes can fragment cyanobacterial colonies formed by aggregation, but are not intense enough to cause fragmentation of division-formed colonies (see Figure 5 and lines 348-360).
 
 The agreement between model and experiments is impressive, but the role of the fit parameters in achieving this agreement needs to be further clarified.
 
 The influence of the fit parameters (namely the stickiness α1 and the pairs of colony strength parameters S1,q1,S2,q2) is discussed in the sections Dynamical changes in colony size modelled by a two-category distribution in lines 247-253 and Materials and Methods in lines 559-565. We kept the discussion concise to maintain readability. However, we agree with the reviewer that additional details about the importance of the fit parameters and the sensitivity of the results to these parameters could be beneficial. In the revised version of the section Materials and Methods in lines 560-563, we have included a detailed discussion of the fit parameters.
 
 The article may not be very accessible for readers with a biology background. Overall, the presentation of the material can be improved by better describing their new method.
 
 We apologize for the limited readability of the description of the experimental setup and model used. In the revised version of the manuscript and the SI, we have detailed further the new methods presented here. The modifications include a detailed description of the operating range of the cone-and-plate shear setup (subsection Cone-and-plate shear of the section Materials and Methods, in lines 462-473). Furthermore, we think that incorporation of the recent experimental results of Wu et al. (2024), on lines 331-337 of the manuscript, will appeal to readers with a biology background. Their mesocosm experiments support our model prediction that aggregation is the dominant mechanism for colony formation in region (II) of Figure 5.
 
 Reviewer #2 (Recommendations For The Authors):
 
 (1) The authors seem too modest in claiming technological advance. They should describe the technological advance of combining microscopy with rheometry, in such a way that this invites others to apply this or similar approaches on biological samples. Even though I feel that the advancement of knowledge of this system by their method is relatively modest, there may be more advances in other systems.
 
 We appreciate the positive view of the reviewer towards the importance of this technology and we agree that its advantages should be advertised to researchers investigating similar systems. We have now given more attention to the technological advance of combining microscopic imaging with rheometry in the final paragraph of the Conclusions (lines 386400), where we now also briefly discuss an interesting recent study of marine snow (Song et al. 2023, Song and Rau 2022, reference numbers 70 and 71 of the revised manuscript), which used a similar combination of microscopy and rheometry as in our study. Furthermore, in the Methods section, we now briefly explain how the rheometry can be adjusted to investigate other systems (lines 474-480).
 
 (2) It seems reasonable -also based on what we already know about these aggregates - to assume that the main difference in shear sensitivity between field samples and cultures lies in the production of extracellular polysaccharide substance (EPS). To go beyond what is already known, the study could try to provide more direct and quantitative evidence for EPS involvement. For example, using a chemical quantification of EPS levels, or perturbing EPS levels using digestive enzymes.
 
 We agree with the reviewer that further characterization of the EPS is highly relevant to understand the mechanical strength of colonies. However, we believe that chemical quantification and/or degradation of EPS lies beyond the scope of our article and should be addressed by future studies.
 
 (3) Assuming EPS is indeed the reason for the differences in shear resistance: the authors speculate the reason why the field samples have more EPS lies in chemical composition (Calcium/nitrogen levels). In addition, there could be grazing that is known to promote aggregation (possibly increasing EPS), or just inherent genetic differences between strains. I am not necessarily expecting the authors to explore this direction experimentally, but it seems certainly feasible and would make the final result less speculative.
 
 We agree with the reviewer that there are more biotic and abiotic factors that can influence EPS amount and composition. The influence of grazing and other relevant factors on cell adhesion is discussed in references [26-29], cited in our introduction in lines 50-53. As discussed in our answer to recommendation #2, we believe that a quantitative investigation of these various factors is beyond the scope of this work and should be addressed in future studies.
 
 (4) A cool finding seems to be the critical relative diameter (Fig 2E), a colony size that seems invariant under shear. I was slightly surprised that the authors seem to take little effort to understand this critical diameter mechanistically (for example by predicting it, or experimentally perturbing it). Again, not a necessary requirement, but this is where the study could harness its technological advantage to provide a more quantitative understanding of something that goes beyond the existing knowledge of the system.
 
 We apologize to the reviewer if our descriptions and discussions of Figure 2 were unclear. One of the key conclusions from our experiments is that the critical relative diameter depends on the dissipation rate, as shown in Figure 2F. This dependence is also incorporated into the model through the constitutive equation (2). Furthermore, we expect the mechanical resistance of colonies, quantified by the critical relative diameter, to be affected by other biotic and abiotic factors that influence EPS amount and composition.
 
 (5) The jump from 0.019 to 1.1 m²/s³ seems large. What was the reason for not exploring intermediate values? The authors should also define low, modest and intense dissipation rates more clearly. Currently, they seem somewhat arbitrarily defined, i.e. 0.019 m²/s³ is described as low (methods) and moderate (results). In Fig 2, the authors further talk about low dissipation rates without a quantitative description.
 
 We thank the reviewer for pointing out the lack of clarity in the choice of parameter range and the nomenclature. Regarding the former, the suspension of division-formed colonies of Microcystis strain V163 displayed negligible fragmentation for dissipation rates between 0.019 to 1.1 m2/s3, as seen in Figures S2A and S3A. Due to the low sensitivity of the fragmentation results in this region, we don’t expect change in behavior for intermediate values. Regarding the nomenclature, we have corrected the inconsistencies throughout the text. We have chosen to name the dissipation rate values as: low for values typical of windmixing, moderate for values typical of the core of bubble plumes, and intense for values typical of propellers. Whenever mentioned in the text, the numerical value of dissipation rate is also included to avoid doubt.
 
 (6.) The structure and narrative of the paper can be improved. The article first describes all lab culture experiments and then the model, while the first figure already shows model fits. Perhaps it would be better to first describe the aggregation experiments, to constrain the appropriate terms of the model, and then move to fragmentation.
 
 We appreciate the recommendation of the reviewer regarding the structure. We have chosen to describe first the fragmentation experiments (Fig. 2), as these can be understood without introducing the aggregation effects. In contrast, the steady state results in the aggregation experiments (Fig. 3) come from the balance between aggregation and fragmentation. Therefore, we judged the current order to be more appropriate. The model fits are combined with the experimental results in Figures 2 and 3 to have a concise display. We have ensured that all the concepts required to understand each figure panel are explained prior to their discussion.
 
 (7) The number of data points that go into the histogram needs to be indicated. The main reason is that the authors report the distribution in terms of the biovolume fraction, suggesting the numerical counts are converted into volume. This to me seems like the most sensible parameter, but I could not find how this conversion is calculated (my apologies if I missed it). This seems especially relevant because a single large colony can impact this histogram quite considerably.
 
 We apologize for the lack of clarity in the calibration and conversion steps of the size distribution. As discussed above in the answer to comment #5 of the reviewer #1, more details of the calibration process have been added to the revised version of the Supporting Information Text in lines 785-796. Furthermore, the new Supplementary Figure S8 presents examples of the raw and adjusted size distribution, including the total number of counted colonies per histogram and the associated uncertainties in the concentration and biovolume distributions.
 
 (8) Over the timescales measured here, colonies could start sinking (or floating), possibly in a size-dependent manner, that could lead to a bias due to boundary effects. Did the authors consider this potential artifact?
 
 The sinking or floating of colonies is a relevant process which was taken into account in the choice of our parameter range for the dissipation rate. The minimum dissipation rate used in our experiments ensures that the upward inertial velocity near stagnation is sufficient to counteract the sedimentation of colonies. A detailed discussion of the choice of the parameter range is now included in the revised version of the Materials and Methods in lines 462-473.
 
 (9) "On the one hand, sequencing of the genetic diversity within Microcystis colonies supports the hypothesis that colony formation undernatural conditions is primarily driven by cell division [25]. On the other hand, cell aggregation can occur on a shorter time scale and may offer improved protection against high grazing pressure [26]." This appears somewhat constructed, as what is described as "on the other hand" is not evidence against the genetic diversity.
 
 We agree that the suggested dichotomy in this text appeared somewhat constructed, and we have now removed the wording “on the one hand” and “on the other hand”. The studies from reference [25] demonstrated that the genetic diversity between independent Microcystis colonies is much greater than the diversity within colonies. If cell aggregation was the dominant mechanism, a similar genetic diversity would be observed between and within colonies, which contrasts the findings from reference [25]. We have adjusted the text in the revised manuscript, in lines 46-54, to clarify this point.
 
 (10) The phase diagram seems largely based on extrapolations that are made outside of the measurement regime (e.g. dark red bars indicating the dissipation rate, Fig 5 - by the way 1 this color scheme could use some better contrast, by the way 2 Fig S7 suggests a wider dissipation rate range as indicated in Fig 5, why?). Hence there seems to be the need to more clearly lineate experimental results, simulations, and extrapolations in the phase diagram.
 
 We agree with the reviewer that further clarifications should be given about the parameter range covered in our experiments and apologize for the lack of readability in the color scheme of Fig 5. In lines 329-337, 346-347, 353-355, we have highlighted the parameters range covered by our experiments as well as the range covered by previous studies of windmixed mesocosm (namely reference [64] of the revised manuscript). Regarding the color scheme of Figure 5, we have modified the legend of the figure to improve readability. The color contrast was increased and leader lines were added to connect the colored bars with the respective label.
 
 (11) Unfortunately, the manuscript did not contain line numbers.
 
 We apologize to the reviewer for the lack of line numbers in our initial version. The revised version of the manuscript now contains line numbers, both in the main text and the supporting information.
 
 (12) Fig 2D. Caption is too minimal. Y-axis could better be named "Fraction of colonies" as both small and large colonies are plotted.
 
 The caption for Figure 2D was extended to better describe the plot. We have kept the y-axis label as “Fraction of small colonies”, since this is the quantity displayed by the three curves in the plot.
 
 (13) An inset should have axis labels.
 
 All the insets in our plots display the same variables as their respective plots. In order to keep the plots light and preserve readability, we therefore prefer to present the axis labels only along the x-axis and y-axis of the main plots, which implies by convention that the same axis labels also apply to the insets. To the best of our knowledge, this is a common approach.
 
 (14) Page 5, first words. Likely Fig 3A, not 2A was meant.
 
 We thank the reviewer for pointing out this readability issue. We intend to compare both Figures 2A and 3A. The text of the revised manuscript, in lines 146-148, has been adjusted with the correct figure numbers.
 
 (15) Introduction, second last paragraph, third last line. "suspension leaded to a broad distribution" I assume you meant "... led to a ..."
 
 We thank the reviewer for pointing out this typo. It has been corrected (line 122).
 
 AuthorResponse
Visit annotations in context

Tags

Review 1

AuthorResponse

Review 2

Summary

Annotators

Public_Reviews

URL

arxiv.org/abs/2407.21115
www.biorxiv.org www.biorxiv.org

Birds migrate longitudinally in response to the resultant Asian monsoons of the Qinghai-Tibet Plateau uplift

3
1. Public_Reviews 09 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This important and creative study finds that the uplift of the Qinghai-Tibet Plateau - via its resultant monsoon system rather than solely its high elevation - has shifted avian migratory directions from a latitudinal to a longitudinal orientation. The authors have expanded and clarified their lines of evidence (including an enlarged tracking set and explicit caveats on species-level eBird inference), such that the central claims are now solid. The conclusions - that monsoon dynamics, rather than elevation per se, are most consistent with observed longitudinal reorientation - illustrates how large, community-sourced and climate-model datasets can inform continent-scale shifts in migratory behavior over time that complement traditional approaches.
  
  Summary
2. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Joint Public Review:
  
  The study assesses how the rise of the Qinghai-Tibet Plateau affected patterns of bird migration between their breeding and wintering sites.
  
  This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The Methods are detailed and well-described, and written in such a fashion that they are transparent and repeatable.
  
  Editorial note: These latest revisions are minor in the sense that they expand on the dataset but do not change the primary results.
  
  Review 1
3. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Reviewer #1 (Public review):
  
  The authors have done a good job of responding to the reviewer's comments, and the paper is now much improved.
  
  Again, we thank the reviewer for positive comments during review.
  
  Reviewer #2 (Public review):
  
  I would like to thank the authors for the revision and the input they invested in this study.
  
  We are grateful for your thoughtful feedback and enthusiasms, which helps us improve our manuscript.
  
  With the revised text of the study, my earlier criticism holds, and your arguments about the counterfactual approach are irrelevant to that. The recent rise of the counterfactual approach might likely mirror the fact that there are too many scientists behind their computers, and few go into the field to collect in situ data. Studies like the one presented here are a good intellectual exercise but the real impact is questionable.
  
  We understand your concern about the relevance of the counterfactual approach used in our study. Our intent in using a counterfactual scenario (reconstructing migration patterns assuming pre-uplift conditions on the QTP) was to isolate the potential influence of the plateau’s geological history on current migration routes. Similar approach was widely used to estimate how biogeographic barriers facilitated the divergent vertebrate communities across the world (e.g., Williams et al. 2024). We agree that such an approach must be used carefully. In the revision, we have explicitly clarified why this counterfactual comparison is useful – namely it provides a theoretical baseline to test how much the QTP’s uplift (and the associated monsoon system) might have redirected migration paths (Gilbert and Lambert 2010, Sanmartín 2012, Bull et al. 2021). We acknowledge that the counterfactual results are theoretical and have explicitly emphasised the assumptions involved (i.e., species–environment relationships hold between pre- and post- lift environments) in the main text (Lines 91- 98). Nonetheless, we defend the approach as a valuable study design: it helps generate testable hypotheses about migration (for instance, that the plateau’s monsoon-driven climate, rather than just its elevation, introduces an east–west shift en route).
  
  References:
  
  Bull, J. W., N. Strange, R. J. Smith, and A. Gordon. 2021. Reconciling multiple counterfactuals when evaluating biodiversity conservation impact in social-ecological systems. Conservation Biology 35:510-521.
  
  Gilbert, D., and D. Lambert. 2010. Counterfactual geographies: worlds that might have been. Journal of Historical Geography 36:245-252.
  
  Sanmartín, I. 2012. Historical Biogeography: Evolution in Time and Space. Evolution: Education and Outreach 5:555-568.
  
  Williams, P. J., E. F. Zipkin, and J. F. Brodie. 2024. Deep biogeographic barriers explain divergent global vertebrate communities. Nature Communications 15:2457.
  
  All your main conclusions are inferred from published studies on 7! bird species. In addition, spatial sampling in those seven species was not ideal in relation to your target questions. Thus, no matter how fancy your findings look, the basic fact remains that your input data were for 7 bird species only! Your conclusion, “our study provides a novel understanding of how QTP shapes migration patterns of birds” is simply overstretching.
  
  We appreciate the reviewer’s comment here. We would like to clarify that our conclusions regarding longitudinal shifts in migratory distributions are based on distribution models derived from eBird data of 50 species, not merely on migration tracks from seven species. These species-level spatiotemporal models allow us to infer large-scale biogeographic patterns across the Qinghai-Tibet Plateau (QTP).
  
  The original seven tracking species were used specifically for analysing the relationship between migration directions (azimuths) and environmental variables, offering independent support for the patterns revealed in the eBird-based distribution models. Recognising the reviewer’s concern on sample size and coverage, we have now expanded this part by incorporating migration tracks from 12 additional species, derived through georeferenced digitisation of published migratory maps. Importantly, this expansion did not change our conclusions, i.e., the monsoons instead of the high elevations act as a prominent role in shaping the current migration direction of birds in the QTP. While the overall conclusion remains unchanged, the expanded dataset led to slight changes in difference between spring and autumn migration. We have updated the Figure 2 and the corresponding results and conclusions throughout the manuscript. We have also clarified in the Discussion that regions of the QTP with relatively less data might lead to underestimation of some migration routes to make sure readers are aware of these data limitations (Lines 211-218).
  
  The way you respond to my criticism on L 81-93 is something different than what you admit in the rebuttal letter. The text of the ms is silent about the drawbacks and instead highlights your perspective. I understand you; you are trying to sell the story in a nice wrapper. In the rebuttal you state: “we assume species' responses to environments are conservative and their evolution should not discount our findings.” But I do not see that clearly stated in the main text.
  
  Thanks, as suggested we have clearly stated the assumptions of niche conservatism in the Introduction (Lines 91-98).
  
  In your rebuttal, you respond to my criticism of "No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites" when you responded: ... "we can track the movement of species every week, and capture the breeding and wintering areas for specific populations" I am having a feeling that you either play with words with me or do not understand that from eBird data nobody will be ever able to estimate population-specific teleconnections between breeding and wintering areas. It is simply impossible as you do not track individuals. eBird gives you a global picture per species but not for particular populations. You cannot resolve this critical drawback of your study.
  
  We agree that inferring population-specific migratory connections (teleconnections) from eBird data is challenging and inherently limited. eBird provides occurrence records for species, but it generally cannot distinguish which breeding population an individual bird came from or exactly where it goes for winter. Our objective is not to determine one-to-one migratory links between specific populations, but to identify general broad-scale directional shifts when birds cross the QTP during their migration. We regret any confusion caused by our earlier wording. To make this clearer, we have now emphasised that our interests focus on the migratory direction and their environmental correlates, rather than population assignments. We have also rephrased the relevant text to explicitly clarify that our study operates at the species level and at large spatial scales (Lines 253–257). We exemplify how distribution of eBird observations and GPS tracking data of four species can be different from each other whilst showing similar migration patterns (Figure S10). We have also explicitly stated in the Discussion that confirming population connectivity would require targeted tracking or genetic studies, and that our eBird-based analysis could only suggest plausible routes and region-to-region linkages (Lines 200-202).
  
  I am sorry that you invested so much energy into this study, but I see it as a very limited contribution to understanding the role of a major barrier in shaping migration.
  
  We thank the reviewer’s honest assessment and understand the concern regarding the scope of our contribution. Our intention was not to provide an exhaustive account of all aspects of the QTP as a migratory barrier, but to address a specific and underexplored question: how the uplift of the plateau and the resulting monsoon system may have influenced the orientation of avian migration routes. By integrating both satellite tracking and community-contributed data, we have explored how the uplift of the QTP could shape avian migration across the area. We believe our findings provide important insights of how birds balance their responses to large-scale climate change and geological barrier, which yields the most comprehensive picture to date of how the QTP uplift have shaped migratory patterns of birds. We have also discussed the study’s limitations – including the small number of tracking species (Lines 205218), the use of occurrence data as a proxy for breeding and wintering regions (Lines 200-202), the uneven sampling coverage in the QTP (Lines 202-205) and the assumptions behind the counterfactual scenario (Lines 91-98). This ensures that readers understand the context and constraints of our findings.
  
  My modest suggestion for you is: go into the field. Ideally use bird radars along the plateau to document whether the birds shift the directions when facing the barrier.
  
  We thank the reviewer for this suggestion. We agree that radar holds promise for understanding certain aspects of bird migration, particularly for detecting flight intensity, altitudes, and timing. However, the radar systems are currently challenging to resolve migration at the level of species, populations, or individuals, which are central to questions of migratory connectivity and route selection. Most radar signals cannot distinguish between species in mixed flocks, nor can they link breeding and wintering sites for tracked individuals. In addition, the spatial coverage of radar installations remains limited, especially across remote and high-elevation regions like the Qinghai-Tibet Plateau, where infrastructure and continuous power supply are still logistically prohibitive.
  
  The eBird dataset used in our study is itself a form of field-based observation, contributed by tens of thousands of birdwatchers across continents, including the QTP region (Figure S11). While eBird cannot provide individual-level tracking, it captures spatiotemporal patterns of occurrence at broad scales, making it a valuable complement to satellite tracking data. We would also emphasis that our team has extensive field experience in the Qinghai-Tibet Plateau (about twenty years), including multi-year expeditions to deploy satellite tags and observe migration at stopover sites.
  
  We agree that more direct tracking (e.g. GPS tagging) would be an ideal way to validate migration pathways and population connectivity. Using the satellite-tracking data, we have showed that most tracking species shifted their migration direction when facing the QTP (Figure S6). In this revision, as stated we managed to add a number of 12 more species with satellite tracking routes. We have also noted that future studies should build on our findings by using dedicated tracking of more individual birds and monitoring of migration over the QTP. We have cited recent advances in these techniques and suggested that incorporating more tracking data could further test the hypotheses generated by our work (Lines 205-218).
  
  Reviewer #2 (Recommendations for the authors):
  
  L55 "an important animal movement behaviour is.." Is there any unimportant animal movement? I mean this sentence is floppy, empty.
  
  We used this sentence to introduce migration. We have removed “important” to reduce ambiguous phrasing.
  
  L 152-154 This sentence is full of nonsense or you misinterpretation. First of all, the issue of inflexible initiation of migration was related to long-distance migrants only! The way you present it mixes apples and oranges (long- and short-distance migrants). It is not "owing to insufficient responses" but due to inherited patterns of when to take off, photoperiod and local conditions.
  
  We stated that this claim is invoked for long-distance migrants before this sentence and have rewritten the sentence to highlight that this interpretation is for long-distance migrants.
  
  L 158 what is a migration circle? I do not know such a term.
  
  We have amended it as “annual migration cycle”, which is a more common way to describe the yearly round-trip journey between breeding and wintering grounds of birds.
  
  L 193 The way you present and mix capital and income breeding theory with your simulation study is quite tricky and super speculative.
  
  We thank the reviewer for raising this important concern. We have presented this idea as an inference rather than a conclusion: “This pattern could be consistent with a ‘capital breeding’ strategy — where birds rely on endogenous reserved energy gained prior to reproduction — rather than an ‘income’ strategy where birds ingest nutrients mainly collected during the period of reproductive activity. This collaborates with studies on breeding strategies of migratory birds in Asian flyways. However, we note that this interpretation would require further study.” By adding this caution, we made it clear that we are not asserting this link as proven fact, only suggesting it as one possible explanation. We have also doublechecked that the rest of the discussion around this point is framed appropriately. Moreover, to help illustrate why we raised this ecological interpretation, we would also draw attention to examples of satellite tracking points from several species (e.g., Beijing Swift, Demoiselle Crane) in the following, which show obvious shifts in migratory direction near the QTP region. These turning points suggest potential behavioral responses to environmental constraints, such as climatic corridors or energy availability, which could help motivate our discussion of possible capital breeding strategies in these species.
  
  AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.21.619453v3
www.biorxiv.org www.biorxiv.org

Generation of knock-in Cre and FlpO mouse lines for precise targeting of striatal projection neurons and dopaminergic neurons

5
1. Public_Reviews 09 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This important work has the potential to expand the repertoire of transgenic animals for systems neuroscience investigations across multiple fields. The generation of new reagents has the potential to open new directions in experimental design, and the Cas9-based approach for generating mice may provide additional benefits compared to existing BAC transgenic mouse lines. However, whereas some of the imaging data are compelling, quantitative analysis of transgene fidelity is incomplete, as it relies on a qualitative description of reporter XFP expression at low magnification, with some electrophysiological characterization.
  
  Summary
2. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  I read with much attention the manuscript titled "Generation of knock-in Cre and FlpO mouse lines for precise targeting of striatal projection neurons and dopaminergic neurons" in which the authors reveal five transgenic lines to target diverse neuronal populations of the basal ganglia. In addition, the authors also provide some assessments of the functionality of the lines.
  
  Strengths:
  
  Knockin lines made readily available through Jackson. Lines show specific expression.
  
  Weaknesses:
  
  Although I have no doubt these knocking lines will be broadly used by researchers in the field, I find the scientific advances of the study and the breadth of the resource provided quite limited. This is partly because 4 of these lines have been generated by other laboratories. For instance, there are already two other Dat-FlpO lines generated (JAX#: 033673 and 035436), with one of them already characterized (PMID: 33979604). Similarly, Drd1-Cre and Adora2a-Cre have been used abundantly since they were generated over a decade ago, and a novel Drd1-FlpO line has been characterized thoroughly recently (PMID: 38965445). Indeed, some of these lines were BAC transgenic, and I agree with the authors that there is a sound rationale for generating knock-in mice; however, the authors should then demonstrate if/how their new drivers are superior. Overall, the valuable resource generated by the authors would benefit from additional quantification and validation.
  
  Review 1
3. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors report the generation and validation of new knock-in mouse lines enabling precise targeting of basal ganglia projection neurons and midbrain dopamine neurons. By inserting recombinase sequences at endogenous loci, they provide tools that improve on older BAC-based models, with the additional benefit that all lines are openly available through Jackson Laboratories. This work is timely, fills a longstanding gap for the community, and will support both basic circuit mapping and disease-related research.
  
  Strengths:
  
  The major strength of this study is the provision of new genetic resources that will be widely used by the basal ganglia and dopamine research communities. Anatomical and electrophysiological data indicate appropriate expression and preserved intrinsic properties. The Flp lines, in particular, show labeling largely confined to basal ganglia circuits, making them especially attractive for circuit-based studies. A further strength is the use of a T2A-recombinase insertion at the native gene stop codon, which preserves endogenous regulation and maintains near-physiological expression of Adora2a, Drd1a, and DAT. The availability of both Cre and Flp versions enables powerful intersectional strategies, and open distribution through Jackson Laboratories ensures broad accessibility and long-term value.
  
  Weaknesses:
  
  The major limitation is the discrepancy between Cre and Flp lines, with Cre generally driving broader expression than Flp. This raises concerns about anatomical fidelity that require validation at the cellular level. For the DAT-FlpO line, efficiency remains insufficiently quantified, and higher-resolution co-labeling with TH immunostaining is needed. Electrophysiological comparisons between Cre and Flp versions are also incomplete; current data suggest potential physiological differences, which warrant additional statistical testing and, at a minimum, explicit discussion in the manuscript.
  
  Review 2
4. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Using latest knock-in technology, the authors generated a set of five mouse lines with expression of recombinases in striatal projection neurons and dopaminergic neurons for public use. They rigorously characterize the expression of the recombinases by intersectional crossing with reporter lines to demonstrate that these lines are faithful, and they perform electrophysiological experiments in slices to provide evidence that the respective neurons show the expected features in these assays.
  
  Strengths:
  
  The characterization of the new mouse lines is exceptional, and these will be widely used by the community. The mouse lines are openly available for the community to use.
  
  Weaknesses:
  
  No weaknesses were identified by this Reviewer.
  
  Review 3
5. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Author response:
  
  We thank all three reviewers for their thoughtful and constructive evaluations of our manuscript, “Generation of knock-in Cre and FlpO mouse lines for precise targeting of striatal projection neurons and dopaminergic neurons.” We are encouraged that the reviewers recognize the value, specificity, and utility of these new lines for the basal ganglia and dopamine research communities. Below, we summarize our planned revisions and clarifications in response to the reviewers’ comments.
  
  (1) Novelty and comparison with existing lines
  
  We appreciate Reviewer 1’s point regarding the existence of previously generated Cre and Flp lines targeting similar neuronal populations. Our project was initiated six years ago, and during the course of generating and characterizing all five lines, we became aware that similar individual lines have since been developed by other groups. Nevertheless, our study provides a coordinated and independently validated set of lines created using a standardized knock-in (KI) strategy and distributed through Jackson Laboratories for unrestricted community use. Importantly, whereas previous BAC transgenic approaches rely on random insertion, which can lead to position effects and ectopic expression, our design places the recombinase coding sequence immediately downstream of the endogenous stop codon using a self-cleaving T2A peptide. This ensures expression under native promoter and regulatory control, preserving physiological gene regulation.
  
  To address the Reviewers’ points, we will (i) expand the Introduction and Discussion to clarify the rationale and advantages of endogenous promoter–driven recombinase expression over BAC-based systems, emphasizing that our lines provide a uniform, promoter-controlled, and publicly accessible toolkit for the community, (ii) and explore including a comparative table summarizing differences in construct design, expression fidelity, and recombination efficiency across published lines (e.g., PMID 33979604, 38965445).
  
  (2) Quantification, validation, and comparison of Cre vs FlpO
  
  We agree with Reviewers 1 and 2 that further quantification and discussion of Cre versus FlpO fidelity will strengthen the manuscript. The observed difference in expression breadth between Cre and FlpO lines likely reflects a fundamental property of the recombinases themselves rather than a discrepancy in targeting. Cre recombinase is significantly more enzymatically efficient than FlpO, meaning that even very low endogenous levels of gene expression (e.g., Drd1a or Adora2a) can drive Cre-dependent recombination, whereas FlpO requires higher expression thresholds. Consequently, reporter-based readouts will inherently appear broader for Cre lines, despite both being driven by the same endogenous promoters.
  
  To address these points, we will (i) provide quantitative co-labeling analyses for the DAT-FlpO line with TH immunostaining to assess efficiency and specificity, (ii) clarify in the Results and Discussion that differences between Cre and FlpO expression patterns largely stem from differences in recombinase kinetics and sensitivity, not mismatched promoter activity, (iii) and include representative high-resolution images and relevant statistics in the revised figures. Importantly, we would like to note that RNAscope may not be an ideal validation approach in this context, as in situ transcript detection cannot capture the enzymatic threshold differences that determine reporter recombination and thus will not help address observed differences between Cre and FlpO lines. Finally, we are actively performing electrophysiological comparisons between Cre and FlpO lines to rigorously quantify potential physiological differences between them. Updated analyses will be incorporated as available or described as ongoing future work.
  
  (3) Discussion of scope and interpretation
  
  We appreciate the reviewers’ suggestions to better contextualize the scope of this resource. We will revise the Discussion to (i) highlight that the Cre–FlpO pairings enable powerful intersectional and cross-line strategies for dissecting basal ganglia and midbrain circuitry, (ii) and clarify that our goal was to generate a rigorously validated foundational resource, with detailed functional comparisons and manipulation studies to be explored in subsequent work.
  
  In summary, we thank the reviewers for their insightful feedback. The planned revisions and clarifications will underscore the unique strengths of our knock-in design, explore potential Cre–FlpO differences, and highlight the value of this standardized and accessible toolkit for the neuroscience community.
  
  AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.06.15.659794v1
www.biorxiv.org www.biorxiv.org

Decapping activators Edc3 and Scd6 act redundantly with Dhh1 in post-transcriptional repression of starvation-induced pathways

5
1. Public_Reviews 09 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This important study reports on the redundant roles of the decapping activators Edc3 and Scd6 in orchestrating post-transcriptional programs to modulate metabolic responses to nutrients in yeast. The authors employed mutagenesis studies in conjunction with a battery of transcriptome-wide analyses to provide convincing evidence supporting their conclusions. Considering the broad implications of post-transcriptional regulation of gene expression, this study will be of interest across a variety of biomedical disciplines ranging from biochemistry and molecular and cellular biology to those specializing in studying various pathologies.
 
 Summary
2. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 mRNA decapping and decay factors play critical roles in post-transcriptionally regulating gene expression. Here, Kumar and colleagues investigate how deleting two yeast decapping enhancer proteins (Edc3 and Scd6), either alone or in tandem, affects the transcriptome. Using RNA-Seq, CAGE-Seq and ribosome profiling, they conclude that these factors generally act in a redundant fashion, with a mutant lacking both proteins showing an increased abundance of select mRNAs. As these upregulated transcripts are also upregulated in mutants lacking the decapping enzyme, Dcp2, and show no increases in transcription of their cognate genes, the authors conclude that this is at the level of mRNA decapping and decay. This was further supported by CAGE-Seq analyses carried out in WT cells and the scd∆6edc3∆ double mutant. Their ribosome profiling data also lead them to conclude that Scd6 and Edc3 display functional redundancy and cooperativity with Dhh1/Pat1 in repressing the translation of specific transcripts. Finally, as their data suggest that Scd6 and Edc3 repress mRNAs coding for proteins involved in cellular respiration, as well as proteins involved in the catabolism of alternative carbon sources, they go on to show that these decapping activators play a role in repressing oxidative phosphorylation.
 
 Strengths:
 
 Overall, this manuscript is well-written and contains a large amount of compelling high-quality data and analyses. At its core, it helps to shed light on the overlapping roles Edc3 and Scd6 have in sculpting the yeast transcriptome.
 
 Weaknesses:
 
 While not essential, it would be interesting if the authors carried out add-back experiments to determine which domain within Scd6/Edce3 plays a critical role for enforcing the regulation that they see? Their double mutant now puts them in a perfect position to carry out such experiments.
 
 Review 1
3. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This manuscript by Kumar and Zhang presents compelling evidence that Edc3 and Scd6 decapping activators, present a high degree of redundancy that can only be overcome by double mutants of both. In addition, the authors provide strong evidence for their role in regulating starvation-induced pathways as evidenced by measurements of mitochondrial membrane potential, metabolomics and analysis of the flux of Krebs cycle intermediates.
 
 Strengths:
 
 Kumar, Zhang et al provide multiple source of evidence of the direct mechanism of Edc3 and Scd6, by using and comparing different approaches such as mRNA-seq, ribosome occupancies and translational efficiencies. By extensive analysis the authors show that this complex can also regulate genes outside the Environmental Stress Response (non-iESR) that are significantly up-regulated in all three mutants. Remarkably, the gene ontology analysis of these non-iESR genes identify enrichment for mitochondrial proteins that are implicated in the Krebs cycle. Overall, this study adds novel mechanistic insight into how nutrients control gene expression by modulating decapping and translational repression.
 
 Weaknesses:
 
 The authors show very nicely that growth phenotypes from scd6Δedc3∆ can be rescued by transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). Future work could make use of these rescue strategies, for example as a platform to further characterise protein-protein interactions between Edc3, Scd6 and Dhh1.
 
 Review 2
4. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 In this paper, Kumar et al investigated the role of two decapping activators, Edc3 and Scd6, in regulating mRNA decay and translation in yeast. Using a variety of approaches including RNA-seq, ribosome profiling, proteomics, polysome analysis, and metabolomics the authors demonstrate that whereas single deletions of Edc3 or Scd6 have modest effects, the double mutant leads to increased abundance of mRNAs, many of which overlap with those targeted by the decapping activators Dhh1 and Pat1. The data suggest that Edc3 and Scd6 function redundantly to recruit Dhh1 to the Dcp2 decapping complex, thereby promoting mRNA turnover and translational repression. The authors show that these factors cooperate with Dhh1/Pat1 to repress transcripts involved in respiration, mitochondrial function, and alternative carbon source utilization, linking post-transcriptional regulation to nutrient responses. The study establishes Edc3 and Scd6 as important, but redundant regulators that fine-tune gene expression and metabolic adaptation in response to nutrient availability.
 
 Strengths:
 
 The paper has several strengths, including the comprehensive approach taken by the authors using multiple experimental techniques (RNA-seq, ribosome profiling, Western blotting, TMT-MS, polysome profiling, and metabolomics) to provide multiple lines of evidence to support their conclusions. The authors demonstrate clear redundancy of the factors by using single and double mutants for Edc3 and Scd6 and their global approach enables an understanding of these factors' roles across the yeast transcriptome. The work connects post-transcriptional processes to nutrient-dependent gene regulation, providing insights into how cells adapt to changes in their environment. The authors demonstrate the redundant roles of Edc3 and Scd6 in mRNA decapping and translation repression. Their RNA-seq and ribosome profiling results convincingly show that many mRNAs are derepressed only in the double mutants, confirming their hypothesis of redundancy. Furthermore, the functional cooperation between Edc3/Scd6 and Dhh1/Pat1 in regulating specific metabolic pathways, including mitochondrial function and carbon source utilization, is supported by the metabolomic data.
 
 Weaknesses:
 
 The study uses indirect evidence to support claims about the effect on mRNA stability rather than directly measuring mRNA stability. However, the combination of Pol II occupancy and RNA abundance measurements is consistent with the claims regarding mRNA stability. The addition of new experiments in the revision co-IPing Dhh1 and Dcp2 strengthens the argument that Edc3 and Scd6 recruit these factors.
 
 Review 3
5. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public review):
 
 Strengths:
 
 Overall, this manuscript is well-written and contains a large amount of high-quality data and analyses. At its core, it helps to shed light on the overlapping roles of Edc3 and Scd6 in sculpting the yeast transcriptome.
 
 Weaknesses:
 
 (1) While the data presented makes conclusions about mRNA stability based on corresponding ChIP-Seq analyses and analyzing other mutants (e.g. Dcp2 knockout), at no point is mRNA stability actually ever directly assessed. This direct assessment, even for select transcripts, would further strengthen their conclusions.
 
 We appreciate the reviewer’s concern but wish to emphasize that we conducted ChIP-Seq analysis of RNA Polymerase II occupancies in the CDSs of all genes, known to be a reliable indicator of transcription rate, and found only small increases in Pol II occupancies that cannot account for the increased transcript levels of the cohort of mRNAs up-regulated in the scd∆6edc3∆ double mutant (Fig. 3E). This provides strong evidence that increased transcription is not the main driver of increased mRNA abundance in this mutant. Bolstering this conclusion, we showed that the Hap2/Hap3/Hap4/Hap5 complex of transcription factors responsible for induction of Ox. Phos. genes was not activated in scd6Δedc3Δ cells in glucose medium (Fig. 6F(ii)); nor was the Adr1 activator of CCR genes activated (Fig. S9C(i)), ruling out transcriptional induction of their target genes in glucose-replete scd6Δ/edc3Δ cells and instead favoring reduced degradation as the mechanism underlying derepression of Ox. Phos. and CCR gene transcripts in this mutant. In Fig. 3B, we further showed that the majority of mRNAs up-regulated in the scd6Δedc3Δ double mutant are also derepressed by dcp2Δ, and in Fig. 3D that the mRNAs up-regulated in scd∆6edc3∆ cells exhibit a higher than average codon protection index (CPI) indicating a heightened involvement of decapping and co-translational degradation by Xrn1 in their decay. To provide additional support for our conclusion, we have conducted new experiments to measure the abundance of capped mRNAs genome-wide by CAGE sequencing of total mRNA in both WT and scd∆6edc3∆ cells. As established previously, normalizing CAGE TPMs to total mRNA TPMs determined by RNA-Seq, dubbed the C/T ratio, provides a reliable measure of the capped proportion of each transcript. The new data presented in Fig. 3C indicate that the mRNAs up-regulated in the scd∆6edc3∆ mutant have significantly lower than average C/T ratios in WT cells, whereas the C/T ratios for the down-regulated transcripts are higher than average, and that these differences between the two groups and all expressed mRNAs are diminished in the scd∆6edc3∆ double mutant. These are the results expected if the up-regulated mRNAs are selectively targeted for decapping in WT cells dependent on Edc3/Scd6, whereas the downregulated mRNAs are targeted by Edc3/Scd6 less than the average transcript. In the original version of the paper, we came to the same conclusion by analyzing our previous CAGE data for the dhh1∆ mutant for the same transcripts dysregulated scd∆6edc3∆ cells, now presented as supportive data in Fig. S3F. Finally, we added the fact that among all four Dhh1 target mRNAs examined in the previous study of He et al. (2022) and found here to be up-regulated selectively in the scd6∆edc3∆ double mutant (Fig. S10), two of them (SDS23 and HXT6) were shown directly to have longer half-lives in dhh1∆ vs. WT cells by He et al. (2018). Hence, the combined evidence is compelling that selective up-regulation of particular mRNAs in the scd∆6edc3∆ mutant results from diminished decapping/decay rather than enhanced transcription; and we feel that the additional supporting evidence that would be provided by measuring half-lives of a small group of up-regulated transcripts would not justify the considerable effort required to do so. Moreover, the standard approach for such experiments of impairing transcription with an inhibitor of Pol II or a Pol II Ts- mutation has been criticized because of the known buffering (suppression) of mRNA decay rates in response to impaired transcription.
 
 (2) Scd6 and Edc3 show a high level of functional redundancy, as demonstrated by the double mutant. As these proteins form complexes with other decapping factors/activators, I'm curious if depleting both proteins in the double mutant destabilizes any of these other factors. Have the authors ever assessed the levels of other key decapping factors in the double mutants (i.e. Dhh1, Pat1, Dcp2...etc)? I wonder if depleting both proteins leads to a general destabilization of key complexes. It would also be interesting to see if depleting Edc3 or Scd6 leads to a concomitant increase in the other protein as a compensatory mechanism.
 
 We thank the reviewer for this insight. Examining our Ribo-Seq and TMT-MS data revealed that Dhh1 expression and steady-state abundance are increased ~2-fold in the scd6∆edc3∆ strain, indicating that the up-regulation of many of the same mRNAs by scd6∆edc3∆ and dhh1∆ does not result indirectly from reduced levels of Dhh1 in the scd6∆edc3∆ mutant. The predicted increased in Dhh1 expression might signify a compensatory response to the absence of Scd6/Edc3. We also observed an ~40% reduction in Dcp2 translation (RPFs) and mRNA abundance in the scd6∆edc3∆ strain, which might contribute to the up-regulation of mRNAs dysregulated in this mutant. However, our new immunoblot analyses revealed no significant reduction in steady-state Dcp2 levels in scd6∆edc3∆ cells (Input lanes in Figs. 3F and S4C(i)-(ii)). Moreover, our previous finding that the majority of mRNAs subject to NMD, up-regulated by both upf1∆ and dcp2∆, are not upregulated by scd6∆edc3∆ implies that Dcp2 abundance in scd6∆edc3∆ cells is adequate for normal levels of NMD and favors a direct role for Scd6/Edc3 in accelerating degradation of most transcripts up-regulated in this mutant. We have added these points to the DISCUSSION.
 
 (3) While not essential, it would be interesting if the authors carried out add-back experiments to determine which domain within Scd6/Edce3 plays a critical role in enforcing the regulation that they see. Their double mutant now puts them in a perfect position to carry out such experiments.
 
 We agree with the reviewer that our scd6∆edc3∆ strain provides an opportunity to dissect the Scd6 and Edc3 proteins to determine which domains and motifs of each protein are most critically required for their functions in activating mRNA decay. However, if conducted thoroughly, this would entail an extensive analysis requiring a combination of genetics, biochemistry and genomics. Considering the large amount of data already presented in 43 and 34 panels of main and supplementary figures, respectively, we feel that these additional experiments would be conducted more appropriately as a stand-alone follow-up study.
 
 Reviewer #2 (Public review):
 
 Weaknesses:
 
 The authors show very nicely in Figure S1A that growth phenotypes from scd6Δedc3∆ can be rescued by transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). The manuscript might benefit from using these rescue strategies in the analysis performed (e.g. RNA-seq, ribosome occupancies, and translational efficiencies). Also, these rescue assays could provide a good platform to further characterise the protein-protein interactions between Edc3, Scd6, and Dhh1.
 
 We responded to this point immediately above in responding to Rev. #1.
 
 Reviewer #3 (Public review):
 
 Weaknesses:
 
 The limitations of the study include the use of indirect evidence to support claims that Edc3 and Scd6 recruit Dhh1 to the Dcp2 complex, which is inferred from correlations in mRNA abundance and ribosome profiling data rather than direct biochemical evidence.
 
 While the reviewer makes a valid point, it is important to note that the greater correlations between effects of scd6∆edc3∆ with those conferred by dhh1∆ vs. pat1∆ also extended to changes in metabolites (Fig. 7A-C). To provide more direct evidence that Edc3 and Scd6 recruit Dhh1 to the Dcp2 complex, we have now conducted co-immunoprecipitation experiments (presented in new Figs. 3F and S5) demonstrating that association of Dhh1 with Dcp2 is diminished in the scd6∆edc3∆ double mutant but not in either scd6∆ or edc3∆ single mutant, thus providing biochemical support for our proposal.
 
 Also, there is limited exploration of other signals as the study is focused on glucose availability, and it is unclear whether the findings would apply broadly across different environmental stresses or metabolic pathways. Nonetheless, the study provides new insights into how mRNA decapping and degradation are tightly linked to metabolic regulation and nutrient responses in yeast. The RNA-seq and ribosome profiling datasets are valuable resources for the scientific community, providing quantitative information on the role of decapping activators in mRNA stability and translation control.
 
 While not disputing the facts of this comment, we think it is unjustified to label as a weakness that our study focused on glucose-grown cells considering the large amount of new data and insights made possible by our multi-omics approach, presented in >70 separate figure panels and nine supplementary datafiles, which the reviewer has characterized as being valuable to the scientific community. Parallel studies in non-preferred carbon or nitrogen sources are underway and represent large-scale investigations in their own right, for which the current dataset in glucose-replete cells provides the critical reference condition.
 
 Reviewer #1 (Recommendations for the authors):
 
 The authors made a note that a set of 37 mRNAs is repressed exclusively by Edc3 with little contribution by Scd6, a list that includes the RPS28B mRNA. Edc3 has been previously reported to promote the decay of this mRNA in a deadenylation-independent fashion by binding to an element in its 3'UTR (PMIDs 15225544, 24492965). Can the authors comment on whether Edc3 may be binding to similar elements in the 3'UTRs of these transcripts in their shortlist? This could be an interesting topic matter for discussion as well.
 
 While an interesting idea, this seems unlikely because the 3’UTR sequence in RPS28B mRNA was shown to bind Rps28 protein itself to confer heightened decapping and decay dependent on Edc3 in a negative autoregulatory loop that exerts tight control over Rps28 protein levels. It would be surprising if Edc3mediated repression of the other 36 mRNAs would involve Rps28 as none of them encode cytoplasmic ribosomal proteins. Nevertheless, we searched for a conserved motif among the 3’UTRs of the 37 mRNAs using the MEME suite and found enrichment for motifs identified for RNA binding proteins Hrp1 and Nab2 and two novel motifs, but none of these motifs could be recognized within in the Rps28 autoregulatory loop. We have chosen not to comment on these findings in the revised manuscript to avoid lengthening it unnecessarily with inconclusive observations.
 
 Reviewer #2 (Recommendations for the authors):
 
 The authors show very nicely in Figure S1A that growth phenotypes from scd6Δedc3∆ can be rescued by the transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). The manuscript might benefit from using these rescue strategies on the analysis performed (e.g. RNA-seq, ribosome occupancies, and translational efficiencies); or expressing truncated mutants of EDC3 (pLfz614-7) or SCD6 (pLfz615-5), to show that they can act as dominant negative competitors, either on the binding to Dhh1 and Dcp2.
 
 We addressed this comment above in our response to this Reviewer.
 
 Reviewer #3 (Recommendations for the authors):
 
 (1) Labels such as "mRNA_up_s6,e3" are not defined in figures or the text. I suggest clearer sample labeling throughout.
 
 The labels had been defined at first mention in the RESULTS but are now indicated there more explicitly, as well as in the legend to Fig. 1.
 
 (2) In Figure 1D it is surprising that the mRNA profile has a peak in the 5' UTR. I would expect to see such a peak in ribosome footprinting data. Is it possible these are incorrectly labeled?
 
 The figure is correctly labeled. Generally, one does not expect to see RPFs in the 5’UTR region unless there is an efficiently translated uORF, which appears not to be the case for MDH2.
 
 In general, the information in this panel and C is inadequate. None of the numbers are clearly explained in the figure legend or in the figure.
 
 We had cited the legend to Fig. S3C for details of all such gene browser images but have now inserted this information into the Fig. 1D legend, at the first occurrence of such data in the regular figures.
 
 (3) Figures 1C and 1D are in the wrong order.
 
 Corrected.
 
 (4) Figure 2D is a very complicated Venn Diagram. I suggest using UpSet plots as an alternative to Venn diagrams to more clearly convey overlaps between sets.
 
 We provided additional explanatory text in the Fig. 2D legend to facilitate understanding.
 
 (5) The use of the same color scheme to represent different sets in panels of the same figure is a source of confusion. E.g. the cyan in Figures 2A, 2D, and 2E indicates unrelated categories, but one would think they are related.
 
 The use of the same cyan color in these three figure panels actually does designate results for the same set of 591 mRNAs up-regulated in the three mutants. The application of the color schemes is now mentioned explicitly in Figs. 1, 2, and S3.
 
 (6) Reporting of p-values = 0 in figures is not useful.
 
 Corrected.
 
 (7) The whole manuscript is extremely long which reduces the overall impact. For example, the introduction is six pages long. I suggest reducing redundant text and being more concise to enhance readability.
 
 We tried to streamline the text wherever possible, in particular shortening the Introduction by two pages.
 
 (8) Many abbreviations are used throughout the text that are not introduced the first time they are used.
 
 Corrected throughout.
 
 (9) The ERCC normalization is unclear. Were the spike-ins added before cell lysis to allow estimation of per-cell RNA counts or to the extracted RNA? If added to extracted RNA rather than cells it is not clear to me how the claim can be made regarding increased mRNA abundance in the mutants.
 
 We thank the reviewer for this comment. As we explained in the Methods, 2.4 µl of 1:100 diluted ERCC RNA Spike-In Control Mix 1 was added to 1.2 µg of each total RNA sample prior to cDNA library preparation. Because the majority of total mRNA is comprised of rRNA, this normalization yields the abundance of each mRNA relative to rRNA. Owing to repression of rESR mRNAs encoding ribosomal proteins and biogenesis factors in the scd6∆edc3∆ strain (Fig. S3D), the ribosome content per cell is expected to be reduced in this mutant vs. WT. We showed previously that the isogenic dcp2∆ mutant that elicits an ESR response of similar magnitude, showed a 30% reduction in bulk ribosomal subunits per cell compared to same WT strain examined here {Vijjamarri, 2023 #7866}. Assuming a similar reduction in ribosome abundance in the scd6∆edc3∆ mutant, the changes in mRNA per cell conferred by the scd6∆edc3∆ mutation are expected to be 0.7-fold of the ERCCnormalized values given in Fig. 3E, yielding fold-changes of 2.00 and 0.62 for the mRNA_up and mRNA_dn, groups, respectively, which still differ substantially from the corresponding changes in normalized Rpb1 occupancies of 1.2 and 0.93, respectively. We have added this new analysis to the text of RESULTS.
 
 (10) The use of the terms "up-regulated" and "derepressed" throughout is confusing. Both refer to observed increased abundance of mRNAs, but they imply different causes which are never clearly defined.
 
 We changed all occurrences of “derepressed” to “up-regulated”.
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.28.610059v2
www.biorxiv.org www.biorxiv.org

Conduction pathway for potassium through the E. coli pump KdpFABC

4
1. Public_Reviews 09 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This manuscript revisits the well-studied KdpFABC potassium transport system from bacteria with a convincing set of new higher resolution structures, a protein expression strategy that permits purification of the active wildtype protein, and solid insight obtained from mutagenesis and activity assays. The thorough and thoughtful mechanistic analyses makes this a valuable contribution to the membrane transport field.
 
 Summary
2. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The paper describes the high-resolution structure of KdpFABC, a bacterial pump regulating intracellular potassium concentrations. The pump consists of a subunit with an overall structure similar to that of a canonical potassium channel and a subunit with a structure similar to a canonical ATP-driven ion pump. The ions enter through the channel subunit and then traverse the subunit interface via a long channel that lies parallel to the membrane to enter the pump, followed by their release into the cytoplasm.
 
 The work builds on the previous structural and mechanistic studies from the authors' and other labs. While the overall architecture and mechanism have already been established, a detailed understanding was lacking. The study provides a 2.1 Å resolution structure of the E1-P state of the transport cycle, which precedes the transition to the E2 state, assumed to be the rate-limiting step. It clearly shows a single K+ ion in the selectivity filter of the channel and in the canonical ion binding site in the pump, resolving how ions bind to these key regions of the transporter. It also resolves the details of water molecules filling the tunnel that connects the subunits, suggesting that K+ ions move through the tunnel transiently without occupying well-defined binding sites. The authors further propose how the ions are released into the cytoplasm in the E2 state. The authors support the structural findings through mutagenesis and measurements of ATPase activity and ion transport by surface-supported membrane (SSM) electrophysiology.
 
 Review 1
3. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 By expressing protein in a strain that is unable to phosphorylate KdpFABC, the authors achieve structures of the active wildtype protein, capturing a new intermediate state, in which the terminal phosphoryl group of ATP has been transferred to a nearby Asp, and ADP remains covalently bound. The manuscript examines the coupling of potassium transport and ATP hydrolysis by a comprehensive set of mutants. The most interesting proposal revolves around the proposed binding site for K+ as it exits the channel near T75. Nearby mutations to charged residues cause interesting phenotypes, such as constitutive uncoupled ATPase activity, leading to a model in which lysine residues can occupy/compete with K+ for binding sites along the transport pathway.
 
 Strengths:
 
 The high resolution (2.1 Å) of the current structure is impressive, and allows many new densities in the potassium transport pathway to be resolved. The authors are judicious about assigning these as potassium ions or water molecules, and explain their structural interpretations clearly. In addition to the nice structural work, the mechanistic work is thorough. A series of thoughtful experiments involving ATP hydrolysis/transport coupling under various pH and potassium concentrations bolsters the structural interpretations and lends convincing support to the mechanistic proposal. The SSME experiments are generally rigorous.
 
 Weaknesses:
 
 The present SSME experiments do not support quantitative comparisons of different mutants, as in Figures 4D and 5E. Only qualitative inferences can be drawn among different mutant constructs.
 
 Review 2
4. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This study on potassium ion transport by the protein complex KdpFABC from E. coli reveals a 2.1 Å cryo-EM structure of the nanodisc-embedded transporter under turnover conditions. The results confirm that K+ ions pass through a previously identified tunnel that connects the channel-like subunit with the P-type ATPase-type subunit.
 
 Strengths:
 
 The excellent resolution of the structure and the thorough analysis of mutants using ATPase and ion transport measurements help to strengthen new and previous interpretations. The evidence supporting the conclusions is solid, including biochemical assays and analysis of mutants. The work will be of interest to the membrane transporter and channel communities and to microbiologists interested in osmoregulation and potassium homeostasis.
 
 Weaknesses:
 
 There is insufficient credit and citation of previous work.
 
 The manuscript has been thoroughly revised with special attention to acknowledging all past work relevant to the study.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The paper describes the high-resolution structure of KdpFABC, a bacterial pump regulating intracellular potassium concentrations. The pump consists of a subunit with an overall structure similar to that of a canonical potassium channel and a subunit with a structure similar to a canonical ATP-driven ion pump. The ions enter through the channel subunit and then traverse the subunit interface via a long channel that lies parallel to the membrane to enter the pump, followed by their release into the cytoplasm.
 
 Strengths:
 
 The work builds on the previous structural and mechanistic studies from the authors' and other labs. While the overall architecture and mechanism have already been established, a detailed understanding was lacking. The study provides a 2.1 Å resolution structure of the E1-P state of the transport cycle, which precedes the transition to the E2 state, assumed to be the ratelimiting step. It clearly shows a single K+ ion in the selectivity filter of the channel and in the canonical ion binding site in the pump, resolving how ions bind to these key regions of the transporter. It also resolves the details of water molecules filling the tunnel that connects the subunits, suggesting that K+ ions move through the tunnel transiently without occupying welldefined binding sites. The authors further propose how the ions are released into the cytoplasm in the E2 state. The authors support the structural findings through mutagenesis and measurements of ATPase activity and ion transport by surface-supported membrane (SSM) electrophysiology.
 
 Weaknesses:
 
 While the results are overall compelling, several aspects of the work raised questions. First, the authors determined the structure of the pump in nanodiscs under turnover conditions and observed several structural classes, including E1-P, which is detailed in the paper. Two other structural classes were identified, including one corresponding to E2. It is unclear why they are not described in the paper. Notably, the paper considers in some detail what might occur during the E1-P to E2 state transition, but does not describe the 3.1 Å resolution map for the E2 state that has already been obtained. Does the map support the proposed structural changes?
 
 As was seen in previous work by Silberberg et at. (2022), imaging KdpFABC under turnover conditions can produce multiple enzymatic states. We focus on the E1~P state and associated biophysical analyses to provide a clear and concise story that is focused on the conduction pathway for K+ ions. We continue to work with the cryo-EM data as well as other supporting methodologies and datasets with the goal of producing an additional manuscript that will describe other conformations. The class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and thus requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We cannot therefore derive any conclusions about the configuration of side chains at the CBS based on this structure. Nevertheless, two previous structures of the E2.Pi state - 7BGY and 7BH2 which were stabilized MgF4 and BeFx, respectively – show the structural change that is described in the paragraph discussing D583A. Given the consistency and relatively high resolution (2.9 and 3.0 Å, respectively) of these two independent structures, we believe that they provide strong support for our proposal for Lys586 acting as a built-in counter ion.
 
 The paper relies on the quantitative activity comparisons between mutants measured using SSM electrophysiology. Such comparisons are notoriously tricky due to variability between SSM chips and reconstitution efficiencies. The authors should include raw traces for all experiments in the supplementary materials, explain how the replicates were performed, and describe the reproducibility of the results. Related to this point above, size exclusion chromatography profiles and reconstitution efficiencies for mutants should be shown to facilitate comparison between measured activities. For example, could it be that the inactive V496R mutant is misfolded and unstable?
 
 Similarly, are the reduced activities of V496W and V496H (and many other mutants) due to changes in the tunnel or poor biochemical properties of these variants? Without these data, the validity of the ion transport measurements is difficult to assess.
 
 To address this concern, we have generated a series of supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also included further detail about the experimental protocols, including number and type of replicates, in an expanded "Activity Assays" section of Methods.
 
 In addition, we have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH4 in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.
 
 The authors propose that the tunnel connecting the subunits is filled with water and lacks potassium ions. This is an important mechanistic point that has been debated in the field. It would be interesting to calculate the volume of the tunnel and estimate the number of ions that might be expected in it, given their concentration in bulk. It may also be helpful to provide additional discussion on whether some of the observed densities correspond to bound ions with low occupancy.
 
 As suggested, we calculated the internal volume of the tunnel within KdpA (from the S4 K+ site to the KdpA/KdpB subunit interface) based on the profile derived from Caver. Based on this volume (4.9 x 10-25 L), a single K+ ion within this cavity would correspond to 3.4 M, which is near saturation for a solution of KCl. We added this information together with an acknowledgment of low-occupancy K+ to the fourth paragraph of the Discussion:
 
 " Fourth, based on the volume of the cavity in KdpA, a single K+ ion would correspond to a concentration of 3.4 M, suggesting that multiple ions would exceed the solubility limit especially in the absence of counterions. Finally, map densities within the tunnel were either of comparable strength or weaker than surrounding side chain atoms, unlike at S3 and canonical binding sites. Although it is possible that weaker density could represent low occupancy K+ ions, we favor a mechanism whereby individual K+ ions occupy the tunnel transiently as they transit between the selectivity filter and the canonical binding site."
 
 In order to make this analysis, we developed a python script to calculate the volume of the tunnel as defined by the Caver software (this software is available via github.com/dls4n/tunnel). In turn, this enabled us to distinguish water molecules that were actually in the tunnel rather than bound more deeply within the structure of KdpA. As a result, we updated the water distribution plot in Fig. 4b. Notably, the 17 water molecules within this cavity would correspond to 57.8 M, which is reasonably near the expected 55 M for an aqueous solution.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 By expressing protein in a strain that is unable to phosphorylate KdpFABC, the authors achieve structures of the active wild-type protein, capturing a new intermediate state, in which the terminal phosphoryl group of ATP has been transferred to a nearby Asp, and ADP remains covalently bound. The manuscript examines the coupling of potassium transport and ATP hydrolysis by a comprehensive set of mutants. The most interesting proposal revolves around the proposed binding site for K+ as it exits the channel near T75. Nearby mutations to charged residues cause interesting phenotypes, such as constitutive uncoupled ATPase activity, leading to a model in which lysine residues can occupy/compete with K+ for binding sites along the transport pathway.
 
 Strengths:
 
 Although this structure is not so different from previous structures, its high resolution (2.1 Å) is impressive and allows the resolution of many new densities in the potassium transport pathway. The authors are judicious about assigning these as potassium ions or water molecules, and explain their structural interpretations clearly. In addition to the nice structural work, the mechanistic work is thorough. A series of thoughtful experiments involving ATP hydrolysis/transport coupling under various pH and potassium concentrations bolsters the structural interpretations and lends convincing support to the mechanistic proposal.
 
 Weaknesses:
 
 The structures are supported by solid membrane electrophysiology. These data exhibit some weaknesses, including a lack of information to assess the rigor and reproducibility (i.e., the number of replicates, the number of sensors used, controls to assess proteoliposome reconstitution efficiency, and the stability of proteoliposome absorption to the sensor).
 
 To address this concern, we have generated a series of supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also included further detail about the experimental protocols, including number and type of replicates, in the "Activity Assays" section of Methods.
 
 Reviewing Editor Comments
 
 After discussing the evaluations, the Reviewers and Reviewing Editor have identified the following essential revisions that would need to be addressed to improve the eLife assessment:
 
 (1) Work from others in the field should be adequately described and acknowledged:
 
 (a) Page 2: " A series of X-ray and cryo-EM structures of KdpFABC from E. coli have led to proposals of a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex."
 
 The authors must give credit where credit is due (namely, the Haenelt/Paulino groups having discovered the transport pathway). Why don't they cite Stock et al., where this pathway was described first? The Stokes group proposed an entirely different pathway initially.
 
 Explicit reference to this work has been added to as follows:
 
 “A series of X-ray and cryo-EM structures of KdpFABC from E. coli (Huang et al., 2017; Silberberg et al., 2022, 2021; Stock et al., 2018; Sweet et al., 2021) indicate a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex. As first proposed by Stock et al. (Stock et al., 2018), there is now a consensus that K+ enters the complex from the extracellular side of the membrane through the selectivity filter of KdpA, but is blocked from crossing the membrane.”
 
 (b) Page 4 " As a result, many previous structures (Huang et al., 2017; Silberberg et al., 2021; Stock et al., 2018; Sweet et al., 2021) feature the S162A mutation to avoid inhibition rather than the fully WT protein used for the current work."
 
 This is not correct. At least the work by Huang et al 2017 and Stock et al 2021 was done without the mutation. This is why the structures also captured the off-cycle state when no E2 inhibitor was used. But in Silberberg et al 2022 the mutant was used, but this is not mentioned
 
 The Q116R mutant was used by Huang et al., but indeed not used for the Stock et al paper. We have replaced the sentence in the manuscript with the following:
 
 “Use of the KdpD knockout strain allowed us to produce WT and mutant protein free from Ser162 phosphorylation.”
 
 (c) Page 4: " In the paper, we report on the most highly populated state (44% of particles)". Exactly the same was also seen in detergent solution, which should be mentioned.
 
 Reference to the Silberberg 2022 paper, where E1~P was the most highly populated state, has been added. The percentage of particles was removed as we are still processing data from the other states, which will we hope will be described in a future manuscript.
 
 (d) Page 7 "Asp583 and Lys586 are two conserved residues on M5 that have previously been shown......indicating that this particular mutation interfered with energy coupling." The lack of discussion of the Haenelt/Paulino 2021 paper, where they have analyzed the coupling in detail and described a proximal binding site where K+ is coordinated by D583 and the neighbouring Phe is very concerning.
 
 To correct this oversight, we made the following changes to the text:
 
 On pg. 7 in the Results section, we refer to the 2005 paper from Bramkamp & Altendorf:
 
 “Consistent with earlier work on this mutant (Bramkamp and Altendorf, 2005), the D583A mutant displayed substantial ATPase activity (30% of WT) but no transport, indicating that this particular mutation interfered with energy coupling.”
 
 At the end of pg. 10 in the Discussion, we revised the paragraph discussing D583 and Lys586 to explicitly refer to the mechanism of transport described in the 2021 paper from Silberberg et al, including proximal and distal binding sites as well as uncoupling due to the D583A mutation.
 
 “Similar to the Glu370/Arg493 charge pair in KdpA, Asp583 and Lys586 are the only charged residues in the membrane core of KdpB. Although they are not seen to interact directly in our structure, they coordinate accessory waters associated with the canonical binding site. Previous molecular dynamics simulations (Silberberg et al., 2021) indicate that Asp583 couples with Phe232 to form a “proximal binding site” for K+ ions. Based on these simulations, these authors proposed a mechanism whereby neutralization of this site either by ion binding or by D583A substitution served to stimulate ATPase activity. Indeed, earlier work on D583A (Bramkamp and Altendorf, 2005) as well as current data demonstrate uncoupling, in which K+ independent ATPase activity was observed even though transport was abolished. A plausible explanation for this stimulation is seen in the behavior of Lys586 in previous structures of the E2·Pi state (7BGY and 7BH2) (Sweet et al., 2021). In these structures, M5 undergoes a conformational change that pushes the side chain of Lys586 into the CBS. As a consequence of the D583A mutation, this Lys could be freed to act as a built-in counter ion as in related P-type ATPases ZntA (Wang et al., 2014) and AHA2 (Pedersen et al., 2007). In regard to the proximal binding site and the partnering “distal binding site” on the KdpA-side of the subunit interface, our structure does not show densities at either site and thus does not provide any support for the related mechanism. In any case, in the WT complex it seems likely that Asp583 exerts allosteric control over Lys586 and ensures that its movement into the binding site is coordinated with the transition from E1~P to E2·Pi, thus leading to displacement of K+ from the CBS and release to the cytoplasm. “
 
 (e) Page 8 " The intersubunit tunnel is arguably one of the most intriguing elements of the KdpFABC complex. Although it has been postulated to conduct K+, experimental evidence has been lacking. "
 
 Incorrect, see Silberberg 2021.
 
 On this point, we beg to differ. Although this 2021 paper shows densities in experimental cryo-EM maps and effects of mutations to residues at the KdpA and KdpB interface, the intra-tunnel transport mechanism is based on computational analysis (MD simulations) and not experimental evidence. We softened the statement to read as follows:
 
 “Although it has been postulated to conduct K+, direct experimental evidence has been hard to come by.”
 
 (f) In this context, also f232 is not mentioned anywhere in the text, although depicted in almost all figures.
 
 Phe232 is shown as a point of reference for the KdpA/KdpB subunit interface. We added a reference to Phe232 in the Results section labeled “Intersubunit tunnel” as well as the paragraph in the Discussion addressed in point d) above.
 
 " These densities, which we have modeled as water, are most prevalent near the vestibule, which is the wider part of the tunnel, but then disappear completely at the subunit interface near Phe232, which is the narrowest part of the tunnel and also distinctly hydrophobic (Fig. 4)."
 
 " Previous molecular dynamics simulations (Silberberg et al., 2021) indicate that Asp583 couples with Phe232 to form a “proximal binding site” for K+ ions."
 
 (g) Page 2 "Later, it was recognized that KdpA belongs to the Superfamily of K+ Transporters (SKT superfamily), which also includes bona fide K+ channels such as KcsA, TrkH and KtrB (Durell et al., 2000). "
 
 KcsA is not a member of the SKT superfamily.
 
 Thanks. This is correct, although the SKT superfamily is believed to have evolved from KcsA. KcsA has been removed from the sentence and a reference added to a review of the SKT superfamily:
 
 “which also includes bona fide K+ channels such as TrkH and KtrB (Diskowski et al., 2015; Durell et al., 2000).”
 
 (2) Two other structural classes were identified, including one corresponding to E2. It is unclear why they are not described in the paper. Notably, the paper considers in some detail what might occur during the E1-P to E2 state transition, but does not describe the 3.1 Å resolution map for the E2 state that has already been obtained. Does the map support the proposed structural changes?
 
 As was seen in previous work by Silberberg et at. (2022), imaging KdpFABC under turnover conditions can produce multiple enzymatic states. We focus on the E1~P state and associated biophysical analyses to provide a clear and concise story. We continue to work with the cryo-EM data as well as other supporting methodologies and datasets with the goal of producing an additional manuscript that will describe other conformations. The class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and thus requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We cannot therefore derive any conclusions about the configuration of side chains at the CBS based on this structure. Nevertheless, two previous structures of the E2.Pi state - 7BGY and 7BH2 which were stabilized MgF4 and BeFx, respectively – show the structural change that is described in the paragraph discussing D583A. Given the consistency and relatively high resolution (2.9 and 3.0 Å, respectively) of these two independent structures, we believe that they provide strong support for our proposal for Lys586 acting as a built-in counter ion.
 
 (3) The paper relies on the quantitative activity comparisons between mutants measured using SSM electrophysiology. Such comparisons are notoriously tricky due to variability between SSM chips and reconstitution efficiencies. The authors should include raw traces for all experiments in the supplementary materials, explain how the replicates were performed, and describe the reproducibility of the results.
 
 To address this concern, we have generated supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also added a detailed description of replicates, sensor stability and the experimental protocols in the "Activity Assays" section of Methods. In addition, we have highlighted observations of pre-steady state binding currents that were seen for some mutants (e.g., Q116R assayed with Rb+, NH4+ and Na+), in which an initial, transient current response was observed without an ensuing transport current. The depiction of this raw data has allowed us to explain our use of the current response at 1.25 s, after decay of this binding current, as a measure of transport rate. This approach is consistent with recommendations by the manufacturer, as documented in their 2023 publication (Bazzone et al. https://doi.org/10.3389/fphys.2023.1058583).
 
 (4) Related to this point above, size exclusion chromatography profiles and reconstitution efficiencies for mutants should be shown to facilitate comparison between measured activities. For example, could it be that the inactive V496R mutant is misfolded and unstable? Similarly, are the reduced activities of V496W and V496H (and many other mutants) due to changes in the tunnel or poor biochemical properties of these variants? Without these data, the validity of the ion transport measurements is difficult to assess.
 
 We have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH4 in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.
 
 (5) What are the different lines in Figure 1 - Supplement 1, panel G?
 
 This panel depicted a series of SSME traces as an example of the raw data, but has been removed from the revised version given the inclusion of all the raw traces. These new figures include a legend explaining the conditions for each trace.
 
 (6) How was the 44 % population of the single-occupancy E1 state estimated (it does not correspond to the number of particles in Figure 1 - Supplement 2.
 
 The calculation of 44% for the E1~P state was premature, given that we are still analyzing the data from the turnover conditions. The revised manuscript simply states that E1~P represented the largest population of particles, which is consistent with this state preceding the rate limiting step of the PostAlbers cycle. Reference is made to the Silberberg 2022 paper, which made a similar observation in a detergent-solubilized sample.
 
 (7) The text states that Km for Q116E is "<10 uM". However, the fitted value is 90 µM in Figure 2e.
 
 This was a typographical error. The text now states that Km for Q116E is <100 M.
 
 (8) The Km values for Rb, NH4, and Na in Figures 2g and h, and Na in Figure 2i do not make sense. They should be removed.
 
 The values for Km were determined by fitting the Michaelis-Menton equation to the data as detailed in the Methods section. Although the curves visually appear rather flat relative to other ions, the fitting generated respectable confidence limits and are therefore defensible in a statistical context. Furthermore, the curves that are shown are based on those values of Km and it would be inappropriate not to cite them.
 
 (9) Figure 3 would benefit from a slice through the protein to orient the viewer.
 
 Thanks for the suggestion. We have added panels to Figs. 3, 5 and 6 in an effort to orient the reader to the site that is depicted.
 
 (10) The differences between R493E, Q, and M do not appear to be significant.
 
 The y-axis is logarithmic which makes a visual comparison difficult. To alleviate this, P values were calculated based on one-way ANOVA analysis are results are indicated in Fig. 3c and 3d. They show that all of the Arg493 mutations have Km significantly higher than WT. Differences between R493E orR493Q and R493Q orR493M are not significant at the p<0.01 level, while the difference between R493E and R493M is highly significant (p<0.001). The associated text on pg. 6 has been slightly modified as follows:
 
 “Changes to Arg493 generally increase Km (lower apparent affinity) without affecting Vmax, with Met substitution having greater effect than charge reversal (R493E).”
 
 (11) Page 5, paragraph 2. Q116R and G232D don't seem like the world's most intuitive mutations. It appears there is a historical reason for looking at these. Could the rationale be explained in the text? (Why R and D specifically?)
 
 These mutations have historical significance, having been generated by random mutagenesis during early characterization of the Kdp system by Epstein and colleagues. A sentence containing relevant references has been added to this paragraph to provide this context:
 
 “Specifically, Q116R and G232D substitutions were initially discovered by random mutagenesis during early characterization of the Kdp system (Buurman et al., 1995; Epstein et al., 1978) and have featured in many follow-up studies (Dorus et al., 2001; Schrader et al., 2000; Silberberg et al., 2021; Sweet et al., 2020; van der Laan et al., 2002).”
 
 Below are the recommendations from each of the reviewers, some of which were not included as essential revisions, but that can also be helpful to further strengthen the manuscript.
 
 Reviewer #1 (Recommendations for the authors):
 
 It is essential that the authors correct their selective, incomplete, and in places inappropriate references to work from others in the field.
 
 Specific points:
 
 (1) Page 2: " A series of X-ray and cryo-EM structures of KdpFABC from E. coli have led to proposals of a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex."
 
 The authors must give credit where credit is due (namely, the Haenelt/Paulino groups having discovered the transport pathway). Why don't they cite Stock et al., where this pathway was described first? The Stokes group proposed an entirely different pathway initially.
 
 (2) Page 4 " As a result, many previous structures (Huang et al., 2017; Silberberg et al., 2021; Stock et al., 2018; Sweet et al., 2021) feature the S162A mutation to avoid inhibition rather than the fully WT protein used for the current work."
 
 This is not correct. At least the work by Huang et al 2017 and Stock et al 2021 was done without the mutation. This is why the structures also captured the off-cycle state when no E2 inhibitor was used. But in Silberberg et al 2022 the mutant was used, but this is not mentioned
 
 (3) Page 4: " In the paper, we report on the most highly populated state (44% of particles)". Exactly the same was also seen in detergent solution, which should be mentioned.
 
 (4) Page 7 "Asp583 and Lys586 are two conserved residues on M5 that have previously been shown......indicating that this particular mutation interfered with energy coupling." The lack of discussion of the Haenelt/Paulino 2021 paper, where they have analyzed the coupling in detail and described a proximal binding site where K+ is coordinated by D583 and the neighbouring Phe is very concerning.
 
 (5) Page 8 " The intersubunit tunnel is arguably one of the most intriguing elements of the KdpFABC complex. Although it has been postulated to conduct K+, experimental evidence has been lacking. "
 
 Incorrect, see Silberberg 2021.
 
 (6) In this context, also f232 is not mentioned anywhere in the text, although depicted in almost all figures.
 
 References have been added to address all of these points. See item 1) under Reviewing Editor’s Comments above.
 
 Other points:
 
 (7) Page 2 "Later, it was recognized that KdpA belongs to the Superfamily of K+ Transporters (SKT superfamily), which also includes bona fide K+ channels such as KcsA, TrkH and KtrB (Durell et al., 2000). "
 
 KcsA is not a member of the SKT superfamily.
 
 KcsA has been removed from the sentence and a reference added to a review of the SKT family:
 
 “which also includes bona fide K+ channels such as TrkH and KtrB (Diskowski et al., 2015; Durell et al., 2000).”
 
 (8) Page 9 " Our demonstration of coupled transport of NH4+ and Rb+ G232D not only confirms that the selectivity filter governs ion selection, but that the pump subunit, KdpB, is relatively promiscuous." Check grammar.
 
 This sentence has been updated as follows:
 
 “Our observation that G232D is capable of coupled transport for NH4++ confirms not only that the selectivity filter governs ion selection, but that the pump subunit, KdpB, is relatively promiscuous.
 
 Reviewer #2 (Recommendations for the authors):
 
 (1) From an editorial point of view, I suggest a few changes to enhance readability and clarity for non-specialists. A description of the overall transport cycle at the start of the paper (perhaps as a supplementary figure) could help put the work into perspective for general readers who may not be familiar with P-type ATPase mechanisms. It is unclear what "single" and "double" occupancy refer to in the structural classes description. Why is only one structural class described in detail? I would suggest moving the discussion of what is going on with the Nterminus of KdpB to the Results section, where it is described, and shortening the corresponding paragraph in the Discussion. I would furthermore suggest adding a figure that illustrates the proposed regulatory role of the terminus and how phosphorylation might affect it. Otherwise, this section of the results reads very hollow.
 
 A diagram showing the Post-Albers cycle is shown as part of Fig. 1 and is described at the end of the second paragraph. This sentence only mentioned KdpB, which may have caused confusion. We therefore changed the sentence to read as follows:
 
 “Like other P-type ATPases, KdpFABC employs the Post-Albers reaction cycle (Fig. 1) involving two main conformations (E1 and E2) and their phosphorylated states (E1~P and E2-P) to drive transport (Albers, 1967; Post et al., 1969).”
 
 Single and double occupancy was meant to refer to the number of KdpFABC complexes residing in a nanodisc. This can be seen in the class averages in Fig. 1 - figure supplement 2. The legends to Fig. 1 figure supplements 1 and 2 have been revised to explain this observation more explicitly:
 
 "Slight asymmetry of the main peak is consistent with a subpopulation of nanodiscs containing two KdpFABC complexes (Fig. 1 - figure supplement 2)."
 
 and
 
 "A subset of these particles were further classified to generate four main classes representing nanodiscs with a single copy of KdpFABC in either E1 or E2 conformations, nanodiscs with two copies of KdpFABC which were mainly E1 conformation, and junk."
 
 As stated above, the class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We continue to analyze the cryo-EM data and aim to produce a second manuscript that will include descriptions of other conformations together with the additional biophysical analysis related to their function.
 
 With regard to the N-terminus, we have gone on to generate a truncation of residues 2-9 in KdpB. After expression and purification, this construct remained coupled with ATPase and transport activities similar to WT, which makes proposals of a regulatory effect less compelling. Because of the novelty of observing the N-terminus and the possibility that it plays a subtle role in the kinetics of the cycle not revealed under the current assay conditions, we have retained a brief discussion of this structural observation, but moved it into the Results section as suggested.
 
 "Given the regulatory roles played by N- and C-termini of a variety of other P-type ATPases (Bitter et al., 2022; Cali et al., 2017; Lev et al., 2023; Timcenko et al., 2019; Zhao et al., 2021), we generated a construct in which residues 2-9 of the N-terminus of KdpB were truncated. However, ATPase and transport activities remained coupled at levels similar to WT, indicating that any functional role of the N-terminus is relatively subtle and not manifested under current assay conditions."
 
 (2) The wording "exceedingly strong densities" seems ambiguous.
 
 We have changed this to “strong” in the Abstract and "exceptionally strong" in the Discussion. The precise values for these densities are shown in density histograms in Fig. 2 – figure supplement 1 and Fig. 5 – figure supplement 2. In the text, the densities are described as follows:
 
 Results sections describing the selectivity filter:
 
 "In fact, this S3 site contains the strongest densities in the entire map, measuring 7.9x higher than the threshold used for Fig. 2a (Fig. 2 – figure suppl. 1a)."
 
 Results section describing the CBS:
 
 "Given that this is the strongest density in KdpB, measuring 5.6x higher than the map densities shown in Fig. 5 (Fig. 5 – figure suppl 2b), we have modeled it as K+."
 
 (3) What are the different lines in Figure 1 - Supplement 1, panel G?
 
 This panel depicted a series of SSME traces as an example of the raw data, but has been removed from the revised version given the inclusion of all the raw traces. These new figures include a legend explaining the conditions for each trace.
 
 (4) How was the 44 % population of the single-occupancy E1 state estimated (it does not correspond to the number of particles in Figure 1 - Supplement 2.
 
 The calculation of 44% for the E1~P state was premature, given that we are still analyzing the data from the turnover conditions. We will consider citing an updated value in a future publication once this analysis is complete. The revised manuscript simply states that E1~P represented the largest population of particles, which is consistent with this state preceding the rate limiting step of the Post-Albers cycle. Reference was made to the Silberberg 2022 paper, where a similar observation was made.
 
 (5) Panel 1d is called out of order after panel 1e. Please label Ser 162 in the panel.
 
 The order of these panels have been switched and Ser162 has been labelled as suggested.
 
 (6) Several panels in Figure 1- Supplement 1 are neither referenced nor described.
 
 This figure supplement is referred to multiple times in the Results and the Methods sections of the text as well as in the figure legends. Although each panel is not individually referenced, all of this information is relevant at different points in the manuscript and is explained in the legend.
 
 (7) Is the coordinating geometry for the S3 site consistent with what was previously observed for KcsA and relatives?
 
 The general arrangement of carbonyl atoms in the S3 site is the same in KcsA and KdpA, described by the MacKinnon group as a square antiprism. However, KcsA has strict four-fold symmetry and KdpA does not. As a result, there are small discrepancies between the coordinating geometries in the two structures. This point was made graphically in our original report on the X-ray structure of KdpFABC (Huang et al. 2007, Extended Data Fig. 3), though the positions of the carbonyls are more accurately determined in the current structure due to increased resolution. We added a sentence to the Selectivity Filter section of the Results stating the following:
 
 "This coordination geometry is also consistent with that seen in the K+ channel KcsA, though the strict four-fold symmetry of that homo-tetramer produces a more regular structure, as indicated by the smaller variance in liganding distance (2.77 Å with s.d. 0.075 Å in 1K4C) and as depicted by Huang et al. in Extended Data Fig. 3 (Huang et al., 2017)."
 
 (8) Label G232D in Figure 2a.
 
 G232 is out of the plane shown in Fig. 2a. However, we have added a label for Cys344 to help identify the selectivity filter strands that are shown. Note, however, that G232 is visible and labeled in Fig. 2 - figure suppl. 1. This has now been noted in the legend for Fig. 2.
 
 (9) The text states that Km for Q116E is "<10 uM". However, the fitted value is 90 uµ in Figure 2e.
 
 This was a typographical error. The text now states that Km for Q116E is <100 M.
 
 (10) The Km values for Rb, NH4, and Na in Figures 2g and h, and Na in Figure 2i do not make sense. They should be removed.
 
 The values for Km were determined by fitting the Michaelis-Menton equation to the data as detailed in the Methods section. Although the curves visually appear rather flat relative to other ions, the fitting generated respectable confidence limits and are therefore defensible in a statistical context. Furthermore, the curves that are shown are based on those values of Km and it would be inappropriate not to cite them.
 
 (11) Figure 3 would benefit from a slice through the protein to orient the viewer.
 
 Thank you for the suggestion. We have added panels to Figs. 3, 5 and 6 in an effort to orient the reader to the site that is depicted.
 
 (12) The differences between R493E, Q, and M do not appear to be significant.
 
 The y-axis is logarithmic which makes a visual comparison difficult. To alleviate this, P values were calculated based on one-way ANOVA analysis are results are indicated in Fig. 3c and 3d. They show that all of the Arg493 mutations have Km significantly higher than WT. Differences between R493E orR493Q and R493Q orR493M are not significant at the p<0.01 level, while the difference between R493E and R493M is highly significant (p<0.001). The associated text on pg. 6 has been slightly modified as follows:
 
 “Changes to Arg493 generally increase Km (lower apparent affinity) without affecting Vmax, with Met substitution having greater effect than charge reversal (R493E).”
 
 Reviewer #3 (Recommendations for the authors):
 
 Overall, the text was very clear, experiments were rationalized well, and conclusions were justified. A few small comments:
 
 (1) Page 5, paragraph 2. Q116R and G232D don't seem like the world's most intuitive mutations. It appears there is a historical reason for looking at these. Could the rationale be explained in the text? (Why R and D specifically?)
 
 These mutations are of historical importance, having been generated by random mutagenesis during early characterization of the Kdp system. A sentence containing relevant references has been added to this paragraph to provide this information as context:
 
 “Specifically, Q116R and G232D substitutions were initially discovered by random mutagenesis during early characterization of the Kdp system (Buurman et al., 1995; Epstein et al., 1978) and have featured in many follow-up studies (Dorus et al., 2001; Schrader et al., 2000; Silberberg et al., 2021; Sweet et al., 2020; van der Laan et al., 2002).”
 
 (2) Typo: page 14, "diluted"
 
 This typo has been corrected.
 
 (3) The Methods section for SSM electrophysiology could use some additional description of how the data/statistics were collected. How many replicates? Were all replicates from a single sensor/ were multiple sensors examined? Were controls done to test whether the same number of liposomes remain absorbed by the sensor over the length of the experiment?
 
 We have extended our description of experimental protocols in the "Activity Assays" section of Methods. This includes the number and type of replicates as well as a discussion of binding currents that were seen for some mutants. Furthermore, a new series of supplementary figures for Figs. 2, 4, 5, and 6 show all of the raw traces for the SSME measurements (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1, Figure 5 - figure supplement 3, Figure 6 - figure supplement 2).
 
 We have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH4 in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.
 
 AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.05.05.652293v2
www.biorxiv.org www.biorxiv.org

Center-surround inhibition by expectation: a neuro-computational account

4
1. Public_Reviews 09 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This is a methodologically rich manuscript that is important for revealing the center-surround inhibition profile of expectation in orientation space. The analyses are compelling in validating the critical role of predictive coding feedback. The findings provide novel insights into how expectation optimizes perception via enhancement and suppression.
 
 Summary
2. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The authors tested two competing mechanisms of expectation (1) a sharpening model that suppresses unexpected information via center-surround inhibition; (2) a cancellation model that predicts a monotonic gradient response profile. Using two psychophysical experiments manipulating feature space distance between expected and unexpected stimuli, the results consistently supported the sharpening model. Computational modeling further showed that expectation effects were explained by either sharpened tuning curves or tuning shifts. Finally, convolutional neural network simulations revealed that feedback connections critically mediate the observed center-surround inhibition.
 
 Strengths:
 
 The manuscript provides compelling and convergent evidence from both psychophysical experiments and computational modeling to robustly support the sharpening model of expectation, demonstrating clear center-surround inhibition of unexpected information.
 
 Comments on revisions:
 
 I appreciate the authors' thoughtful revisions. I have no further comments.
 
 Review 1
3. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 This is a compelling and methodologically rich manuscript. The authors used a variety of methods, including psychophysics, computational modeling, and artificial neural networks, to reveal a non-monotonic, center-surround "Mexican-hat" profile of expectation in orientation space. Their data convincingly extend analogous findings in attention and working memory, and the modeling nicely teases apart sharpening vs. shift mechanisms.
 
 Strengths:
 
 The findings are novel and important in elucidating the potential neural mechanisms by which expectation shapes perception. The authors conducted a series of well-designed psychophysical experiments to careful examination of the profile of expectation's modulation. Computational modeling also provides further insights, linking the neural mechanisms of expectation to behavioral results.
 
 Comments on revisions:
 
 I think the authors did a great job in addressing my previous comments. I have no further comments.
 
 Review 2
4. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the original reviews.
 
 Reviewer #2 (Public review):
 
 (1) The sharpening model of expectation can predict surround suppression. The authors could further clarify how the cancellation model predicts a monotonic profile of expectation (Figure 1C) with the highest response at the expected orientation, while the cancellation model suggests a suppression of neurons tuned toward the expected stimulus.
 
 We thank the reviewer for the comment. We would like to emphasize that as the expected signal is suppressed, the relative weight or salience of unexpected inputs increases. We have clarified this interpretation in the manuscript as follows:
 
 “Here, given these two mechanisms making opposite predictions about how expectation changes the neural responses of unexpected stimuli, thereby displaying different profiles of expectation, we speculated that if expectation operates by the sharpening model with suppressing unexpected information, we should observe an inhibitory zone surrounding the focus of expectation, and its profile then should display as a center-surround inhibition (Fig. 1c, left). If, however, expectation operates as suggested by the cancelation model with highlighting unexpected information, the inhibitory zone surrounding the focus of expectation should be eliminated, and the profile should instead display a monotonic gradient (Fig. 1c, right).”
 
 (2) I'm a bit concerned about whether the profile solely arises from modulation of expectation. The two auditory cues are each associated with a fixed orientation, which may be confounded by other cognitive processes like visual working memory or attention (which I think the authors also discussed). Although the authors tried to use SFD task to render orientation task-irrelevant, luminance edges (i.e., orientation) and spatial frequency in gratings are highly intertwined and orientation of the gratings may help recall the first grating's SF (fixed at 0.9 c/{degree sign}), especially given the first and second grating's orientations are not very different (4.8{degree sign}).
 
 We agree that dissociating expectation from attention and other top-down processes remains a key challenge in visual expectation research (see Summerfield & Egner, 2009; Summerfield & de Lange, 2014; de Lange et al., 2018). As is generally acknowledged, expectation reflects the probability of a sensory event, while selective attention relates to its behavioral relevance. To minimize attentional influences, our task design ensured that grating orientation was not taskrelevant: on each trial, participants discriminated either orientation or spatial frequency difference, such that orientation itself did not require attentional allocation, a point already discussed in the manuscript.
 
 Regarding visual working memory, we argue that even if participants recalled the first grating’s spatial frequency in the SFD task, they were not required to retain its precise spatial frequency (or orientation), as their task was simply to judge whether the second grating appeared denser or sparser. In other words, orientation (or spatial frequency) itself was not task-relevant. Moreover, although not included in the manuscript, we conducted a post-experiment debriefing in which participants were asked whether they noticed any association between the auditory tone and the grating orientation. None of the participants reported this relationship correctly, suggesting that the tone-orientation mapping remained implicit and was unlikely to be driven by strategic attention or memory.
 
 However, we acknowledge that certain confounding processes such as statistical learning or implicit mapping acquisition cannot be fully ruled out given the current paradigm. Future studies using methods with higher temporal resolution (e.g., EEG/MEG) may help to dissociate these mechanisms more precisely.
 
 (3) For each of the expected orientations (20{degree sign} or 70{degree sign}), the unexpected ones are linearly separable (i.e., all unexpected ones lie on one side of the expected angle). This might further encourage people to shift their attended or expected orientation, according to the optimal tuning hypothesis. Would this provide an alternative explanation to the tuning shift that the authors found?
 
 We thank the reviewer for pointing out the relevance of the optimal tuning hypothesis. We acknowledge that the optimal tuning theory (Navalpakkam & Itti, 2007) is an important framework, particularly in visual search paradigms, where attentional templates may shift away from non-target features to enhance discriminability.
 
 In our task, this hypothesis would predict a shift of expectation toward <20° in E20° trials and >70° in E70° trials, given that all unexpected orientations lie on one side of the expected angle. Importantly, the optimal tuning hypothesis predicts such shifts not only in Δ20°, Δ25°, and Δ30° trials but also in the Δ0° trials. In this regard, the observed shift in Δ20° and Δ30° (Experiment 2) and Δ25° (Experiment 3) trials is broadly consistent with the predictions of the optimal tuning account. However, we did not observe a corresponding shift away from nontarget features in the Δ0° condition, suggesting limited behavioral evidence for optimal tuning effects under our current task settings.
 
 It is important to note that most previous studies supporting optimal tuning (e.g., Navalpakkam & Itti, 2007; Scolari & Serences, 2009; Geng, DiQuattro, & Helm, 2017; Yu & Geng, 2019) have used visual search paradigms that differ from our design in several critical ways, including the number of stimuli presented, their spatial arrangement (eccentricity), task demands, and so on. Therefore, it is difficult to determine whether the optimal tuning hypothesis could serve as an alternative explanation within the context of our current study. We agree that future studies could further examine how such task parameters influence the presence or absence of optimal tuning.
 
 (4) It is great that the authors conducted computational modeling to elucidate the potential neuronal mechanisms of expectation. But I think the sharpening hypothesis (e.g., reviewed in de Lange, Heilbron & Kok, 2018) focuses on the neural population level, i.e., narrowing of population tuning profile, while the authors conducted the sharpening at the neuronal tuning level. However, the sharpening of population does not necessarily rely on the sharpening of individual neuronal tuning. For example, neuronal gain modulation can also account for such population sharpening. I think similar logic applies to the orientation adjustment experiment. The behavioral level shift does not necessarily suggest a similar shift at the neuronal level. I would recommend that the authors comment on this.
 
 We thank the reviewer for this to-the-point comment. As de Lange et al. (2018) noted, “there is not always a direct correspondence between neural-level and voxel-level selectivity patterns.” That is, neuronal tuning, population-level tuning, voxel-level selectivity, and behavioral adaptive outcomes may reflect different underlying mechanisms and do not necessarily align in a one-toone fashion. We fully acknowledge that population-level tuning effects may also result from various neuronal mechanisms such as gain modulation (for review, see Salinas & Thier, 2000), shifts in preferred orientation (Ringach, et al., 1997; Jeyabalaratnam et al., 2013), asymmetric broadening of tuning curves (Schumacher et al., 2022), or tuning curve sharpening (Ringach, et al., 1997; Schoups et al., 2001).
 
 In our modeling, we implemented sharpening and shifts of neuronal tuning curves as a conceptual model simplification, intended to explore potential mechanisms underlying expectation-related center-surround suppression effects. While sharpening-based accounts (e.g., Kok et al. 2012) have often been emphasized, we stress that other mechanisms, such as gain modulation or tuning shifts, may also contribute. Our goal is not to provide a definitive account, but to highlight such plausible mechanisms and encourage future investigation. We have revised the Discussion to emphasize that multiple mechanisms may underlie the observed effects.
 
 “We note that our implementation of sharpening and shifts at the neuronal level serves as a conceptual model simplification, as population-level tuning, voxel-level selectivity, and behavioral adaptive outcomes may reflect different underlying neuronal mechanisms and do not necessarily align in a one-to-one fashion. Here, we stress that other potential mechanisms beyond sharpening, such as tuning shifts, may also contribute to visual expectation.”
 
 (5) If the orientation adjustment experiment suggests that both sharpening and shifting are present at the same time, have the authors tried combining both in their computational model?
 
 We agree with the reviewer that it is necessary to consider the combined model. Accordingly, we implemented a computational model incorporating sharpening of the expected orientation channel together with shifting of the unexpected orientation channels. This model
 
 successfully captured the sharpening of the expected-orientation channel and the shift of the unexpectedorientation channels (Supplementary Fig. 3). For the expected orientation (Δ0°) , results showed that the amplitude change was significantly higher than zero on both OD (t(23) = 2.582, p = 0.017, Cohen’s d = 0.527) and SFD (t(23) = 2.078, p = 0.049, Cohen’s d = 0.424) tasks (Supplementary Fig. 3e, vertical stripes); the width change was significantly lower than zero on both OD (t(23) = -2.438, p = 0.023, Cohen’s d = 0.498) and SFD (t(23) = -2.578, p = 0.017, Cohen’s d = 0.526) tasks (Supplementary Fig. 3e, diagonal stripes). For unexpected orientations (Δ10°-Δ40°), however, the amplitude and width changes were not significant with zero on either OD (amplitude change: t(23) = 0.443, p = 0.662, Cohen’s d = 0.091; width change: t(23) = -1.819, p = 0.082, Cohen’s d = 0.371) or SFD (amplitude change: t(23) = 1.130, p = 0.270, Cohen’s d = 0.231; width change: t(23) = -1.710, p = 0.101, Cohen’s d = 0.349) tasks (Supplementary Fig. 3f). In the meantime, the location shift was significantly different than zero for unexpected orientations (Δ10°-Δ40°, OD task: t(23) = 3.611, p = 0.001, Cohen’s d = 0.737; SFD task: t(23) = 2.418, p = 0.024, Cohen’s d = 0.493 (Supplementary Fig. 3g). These results provided further evidence that tuning sharpening and tuning shift jointly contribute to center– surround inhibition in expectation.
 
 Reviewer#1 (Recommendation for the Author):
 
 (1) A direct comparison between tasks (baseline vs. expectation conditions) would have strengthened the findings. Specifically, contrasting performance in the orientation discrimination task with the spatial frequency discrimination task could have provided clearer evidence that participants actually used the auditory cues to attend to the expected orientation. This comparison would be particularly important for validating cue manipulation in the orientation discrimination task.
 
 We agree that a direct comparison between the orientation discrimination (OD) and spatial frequency discrimination (SFD) tasks could further clarify how expectation (auditory cues) differentially modulates orientation relevance. However, the primary goal of the current study was to examine expectation effects within each task separately and to demonstrate that such effects are independent of attentional modulation driven by the task-relevance of orientation.
 
 In addition, the OD and SFD tasks differ not only in the relevant task features (orientation vs. spatial frequency discrimination), but also in stimulus properties and difficulty, for example, the arbitrary use of 20–70° as the orientation range and ~0.9 cycles/° as the spatial frequency setting, a direct comparison could introduce confounding factors unrelated to expectation.
 
 Importantly, Previous studies (e.g., Kok et al., 2012, 2017; Aitken et al., 2020) and our current results show that participants performed significantly better when the auditory cue matched the expected orientation, supporting the validity of our expectation manipulation.
 
 (2) An interesting consideration is why the center-surround inhibition profile of expectation was independent of the task-relevance of orientation. Previous studies (e.g., Kok et al., 2012) have found that orientation discrimination patterns differ depending on whether orientation is taskrelevant or irrelevant. This could be useful to discuss the possible discrepancies.
 
 We thank the reviewer for this inspiring comment. Kok et al. (2012) showed that both orientation and contrast tasks elicited similar fMRI decoding results, regardless of task relevance, suggesting neural mechanisms of expectation operate independently of whether orientation is task relevant. Behaviorally, they reported better performance for expected versus unexpected trials in the orientation task (3.4° vs. 3.8°, t(17) = 2.8, p = 0.013), and a marginal trend (although not significant) in the contrast task (4.3% vs. 5.0%, t(17) = 1.9, p = 0.075). If any differences between the two tasks exist, they may lie in the correlation between behavioral and fMRI effects, a question that goes beyond the scope of the current study. Therefore, it is hard to strongly conclude that orientation discrimination patterns differ depending on whether orientation is taskrelevant or irrelevant in their paper.
 
 Our study differs from theirs in at least two important ways, which may account for the clearer expectation facilitatory effect we observed in the expectation (Δ0°) condition. First, in our study, the orientation-irrelevant task involved spatial frequency discrimination (SFD) rather than contrast discrimination. Compared to contrast, spatial frequency has been shown to exhibit a clear cueing effect, as reported in Fang & Liu (2019). Second, our design included a baseline condition, which was absent in their study. We computed discrimination sensitivity (DS) to quantify how much the discrimination threshold (DT) changed relative to baseline. By using this baseline-referenced approach, we observed a significant facilitatory expectation effect in the Δ0° condition, an effect that shifted from marginal significance in their orientation-irrelevant task to clear significance in our study.
 
 (3) The authors might consider briefly explaining how the orientation adjustment paradigm used in this study is particularly effective for examining the potential co-existence of tuning sharpening and tuning shift computations, and how this approach complements traditional orientation discrimination tasks in characterizing expectation-related mechanisms.
 
 We thank the reviewer for this valuable suggestion. We agree that further clarification is needed to better connect the two experiments. To explain this, we have elaborated further in the manuscript.
 
 “To further explore the co-existence of both Tuning sharpening and Tuning shift computations in center-surround inhibition profile of expectation, participants were asked to perform a classic orientation adjustment experiment. Unlike profile experiment (discrimination tasks), the adjustment experiment provides a direct, trial-by-trial measure of participants’ perceived orientation, capturing the full distribution of responses. This enables the construction of orientation-specific tuning curves, allowing us to detect both tuning sharpening and tuning shifts, thereby offering a more nuanced understanding of the computational mechanisms underlying expectation.”
 
 (4) These interesting findings raise important questions about their relationship to existing hybrid models of attentional modulation. Could the authors discuss how their results might align with or extend previous work demonstrating combined feature-similarity gain and surround suppression effects for orientation (e.g., Fang & Liu, 2019)? Could a hybrid model potentially provide a better account of these data than the pure surround suppression model?
 
 We thank the reviewer for this valuable comment. We agree that hybrid model should be mentioned in the manuscript and we have elaborated further in the Discussion.
 
 “For example, within the orientation space, the inhibitory zone was about 20°, 45°, and 54° for expectation evident here, feature-based attention[21], and visual perceptual learning[35], respectively; within the feature-based attention, it was about 30° and 45° in color [77] and motion direction [53] spaces, respectively These variations hint at the exciting possibility that the width of the inhibitory surround may flexibly adapt to stimulus context and task demands, ultimately facilitating our perception and behavior in a changing environment. This principle is consistent with the hybrid model of feature-based attention [53,54,75], where attention is deployed adaptively to prioritize task-relevant information through feature-similarity gain which filters out the most distinctive distractors, and surround suppression which inhibits similar and confusable ones, thereby jointly shaping the attentional tuning profile.”
 
 (5) On page 19, there appears to be a missing symbol in the description of the Tuning Sharpening model. The text states: 'the tuning width of each channel's tuning function is parameterized by ??', where the question marks seem to indicate a missing parameter symbol.
 
 We appreciate the reviewer’s careful attention. Yes, the "ơ" is missing, which was likely caused by a formatting issue. We have corrected it.
 
 AuthorResponse
Visit annotations in context

Tags

Summary

AuthorResponse

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.26.609781v2
rori.figshare.com rori.figshare.com

The Matthew effect and early-career setbacks in research funding - a replication study (RoRI Working Paper No. 16)

4
1. Public_Reviews 09 Oct 2025
  
  in eLife (unscoped)
  
  eLife Assessment
  
  This important study reports the results of efforts to replicate two phenomena of significant interest to early-career scientists and scientific policymakers: the Matthew effect and the early-career setback effect. Several previous studies of these effects have focused on early-career researchers with grant proposals that fell just below or just above a funding threshold. Those just above the threshold were more likely to be successful when they applied for funding later in the career (an example of the well-known Matthew effect), while those just below were more likely to go on to have stronger publication records (the early-career setback effect). In this study the Matthew effect was found to be robust across funders, and to generalize from those close to the funding threshold to the whole population. The early-career setback effect was not robust across funders and did not generalize to the whole population. The evidence reported is convincing.
  
  Summary
2. Public_Reviews 09 Oct 2025
  
  in eLife (unscoped)
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors performed a multi-funder study to determine if the Matthew effect and early-career setback effect were reproducible across funding programs and processes. The authors extended the analysis of these effects to all applicants and compared the results to the prior studies that only looked at near-hit/near-miss applicants to determine if the effects were generalizable to the whole applicant pool. Further, the authors included new models that also account for researcher behavior and their overall likelihood to reapply for later funding and how this behavior may resolve what appears to be a paradox between the Matthew effect and the early-career setback effect.
  
  Strengths:
  
  Figure 4 shows that the "Post (late) MFCR" is the same for the funded and unfunded groups, indicating that the impact of early career funding (at least, in terms of citation metrics) is transient in researcher's overall careers. This finding should encourage researchers to persevere when needed and that long-term success is attainable.
  
  The inclusion of the collider bias in the models to account for researcher behavioral responses is a key strength of the paper and enhance the analysis and nuanced discussion of the results.
  
  Weaknesses:
  
  The discussion of limitations is thorough and point to the need for additional studies. One limitation that is acknowledged is that the authors only looked at applicants who reapplied for funding at the same funder. Given that the authors had the names and affiliations of the applicants from all of the funders, it would be helpful to understand why they were not able to look at applicants across their full data set. Was the limitation technical or a result of the study design? What would have to change to enable this broader analysis?
  
  In Section 4.1, the authors make a statement that the "between MFCR" difference was seen at 5 years, but not at 10 years, and so the authors chose to use the 5-year period for the presentation of their results. It would be helpful to also see the 10-year analysis and have further justification from the authors on why they selected to look at the 5-year period and how their conclusions might or might not change if they consider the longer time period.
  
  The discussion could also include that many funders require novel research directions as a condition of receiving an early-career award. For those who receive these awards, they must establish the new research program, begin publishing, and they may initially see a lower citation rate until the impact of the research is more broadly recognized. Are there ways to explore how these time lags impact the "Between MFCR" on those who were funded more so than those who were not funded?
  
  Review 1
3. Public_Reviews 09 Oct 2025
  
  in eLife (unscoped)
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The manuscript evaluates the generalizability of two phenomena of great interest to early-career scientists and scientific policymakers. These phenomena describe how early funding success can promote future funding success (the Matthew Effect) and how initially unsuccessful applicants may later succeed (the early-career setback effect). Given the often-normative aspirations of science-of-science studies, the manuscript represents a much-needed and highly significant effort, as it allows a broader audience to assess whether they should reconsider their behavior or policies.
  
  Strengths:
  
  The evidence provided by the authors for the generalizability of the Matthew Effect is very strong and convincing. The manuscripts addresses an important topic of practical concern to early-career scientists and scientific policymakers.
  
  Weaknesses: If I am correctly interpreting S11 and S12, the statements on the early-career setback effect could be stronger and more direct. The argument in the main text relies on assumptions and simulations to suggest that observations of the early-career setback effect may depend on reapplications. In contrast, S11 and S12 appear to provide more direct evidence against its generalizability, showing that the effect seems to exist in, and be driven by, only one of the six funding agencies considered (FWF). This narrow replication may not be obvious to readers ("the early-career setback effect also replicates, but is not robust across funders").
  
  I would also suggest that the authors provide a more nuanced discussion of the limitations of their Bayesian model. While the model seems appropriate for accounting for major factors, it appears to exclude others, such as the emergence of new scientific fields or the strategic reorientation of funders toward such fields.
  
  Review 2
4. Public_Reviews 09 Oct 2025
  
  in eLife (unscoped)
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This paper investigates the Matthew effect, where early success in funding peer review can translate into potentially unwarranted later success. It also investigated the previously found "setback" effect for those who narrowly miss out on funding.
  
  Strengths:
  
  The study used data from six funding agencies, which increases the generalisability, and was able to link bibliographic data for around 95% of applicants. The authors nicely illustrate how the previously found "setback" effect for near-miss applicants could be a collider bias due to those who chose to apply sometime later. This is a good explanation for the counter-intuitive effect and is nicely shown in Figure 5.
  
  Weaknesses:
  
  Most of the methods were clearly presented, but I have a few questions and comments, as outlined below.
  
  In Figure 4(a) why are the "post" means much lower than the "pre"? This contradicts the expected research trajectory of researchers. Or is this simply due to less follow-up time? But doesn't the field citation ratio control for follow-up time?
  
  The choice of the log-normal distribution for latent quality was not entirely clear to me. This would create some skew, rather than a symmetric distribution, which may be reasonable but log-normal distributions can have a very long tail which might not mimic reality, as I would not expect a small number of researchers to be extremely above the crowd. However, then the skew was potentially dampened by using percentile scores. Some further reasoning and plots of the priors would help.
  
  Can the authors confirm the results of Figure S9 which show no visible effect of altering the standard deviation for the review parameter or the mean citations? Is this just because the prior for quality is dominated by the data? Could it be that the width of the distribution for quality does not matter, as it's the relative difference/ranking that counts? So the beta in equation 6 changes to adjust to the different quality scale?
  
  The contrary result for the FWF is not explained (Table S3). Does this funder have different rules around re-applicants or many other competing funders?
  
  The outlined qualitative research sounds worthwhile. Another potential mechanism (based on anecdote) is that some researchers react irrationally to rejection or acceptance, tending to think that the whole agency likes or hates their work based on one experience. Many researchers do not appreciate that it was a somewhat random selection of reviewers who viewed their work, and it will unlikely be the same reviewers next time.
  
  "A key implication is the importance of encouraging promising, but initially unsuccessful applicants to reapply." Yes, A policy implication is to give people multiple chances to be lucky, perhaps by giving fewer grants to more people, which could be achieved by shortening the funding period (e.g., 4 year fellowships instead of 5 years). Although this will have some costs as applicants would need to spend more time on applications and suffer increased stress of shorter-term contracts. The bridge grants is potentially an ideal half-way house between many short-term and few long-term awards. Giving more grants to fewer people is supported by this analysis showing a diminishing returns in research outputs with more funding, DOI: 10.1371/journal.pone.0065263.
  
  Making more room for re-applicants also made me wonder if there should be an upper cap on funding, potentially for people who have been incredibly successful. Of course, funders generally want to award successful researchers, but people who've won over some limit, for example $50 million, could likely be expected to win funding from other sources such as philanthropy and business. Graded caps could occur by career stage.
  
  Review 3
Visit annotations in context

Tags

Review 1

Review 3

Review 2

Summary

Annotators

Public_Reviews

URL

rori.figshare.com/articles/preprint/The_i_Matthew_i_effect_and_early-career_setbacks_in_research_funding_-_a_replication_study_RoRI_Working_Paper_No_16_/29302004
www.biorxiv.org www.biorxiv.org

Sense of control buffers against stress

5
1. Public_Reviews 09 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This important research addresses the effects of subjective control and task difficulty on experienced stress using a novel behavioral task administered on the same day in two large online samples. Convincing evidence is provided, establishing the internal and external task validity of the task, as well as a relationship between the sense of control and task difficulty, with individual differences in relevant mental health constructs. Evidence for the specificity of the link between control and stress would be more substantial if the design had not conflated control and reward rate. This work will be of interest to psychologists and clinicians studying the concepts of controllability, stress, and psychopathology.
 
 Summary
2. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This work investigated how the sense of control influences perceptions of stress. In a novel "Wheel Stopping" task, the authors used task variations in difficulty and controllability to measure and manipulate perceived control in two large cohorts of online participants. The authors first demonstrate that their behavioral task exhibits good internal consistency and external validity, indicating that perceived control during the task is linked to relevant measures of anxiety, depression, and locus of control. Most importantly, manipulating controllability in the task resulted in reduced subjective stress, demonstrating a direct impact of control on stress perception. However, this work has some minor limitations to this work due to the design of the stressor manipulations/measurements and the necessary logistics associated with online versus in-person stress studies. Nevertheless, this research adds to our understanding of when and how control can influence the effects of stress and has particular relevance for mental health interventions.
 
 Strengths:
 
 The primary strength of this research is the development of a unique and clever task design that can reliably and validly elicit variations in beliefs about control. Impressively, higher subjective control in the task was associated with decreased psychopathology measures such as anxiety and depression in a non-clinical sample of participants. In addition, the authors found that lower control and higher task difficulty led to higher perceived stress, suggesting that the task can reliably manipulate perceptions of stress. Prior tasks have not included both controllability and difficulty in this manner and have not directly tested the direct influence of these factors on incidental stress, making this work both novel and important for the field.
 
 Weaknesses:
 
 One minor weakness of this research is the validity of the online stress measurements and manipulations. In this study, the authors measure subjective stress via self-report both during the task and after either a Trier Social Stress Test (high-stress condition) or a memory test (low-stress condition). One concern is that these stress manipulations were really "threats" of stress, where participants never had to complete the stress tasks (i.e., recording a speech for judgment). While this is not unusual for an in-lab study and can reliably elicit substantial stress/anxiety, in an online study, there is a possibility for communication between participants (via online forums dedicated to such communication), which could weaken the stress effects. That said, the authors did find sensible increases and decreases in perceived stress between relevant time points; however, future work could improve upon this design by including more comprehensive stress manipulations and by measuring implicit physiological signs of stress.
 
 Comments on revisions:
 
 I appreciate the authors' responses to my comments and concerns. I have decided not to make changes to my public review, as I believe it remains relevant and fair after revisions.
 
 Review 1
3. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors have developed a behavioral paradigm to experimentally manipulate the sense of control experienced by participants by varying the level of difficulty in a wheel-stopping task. In the first study, this manipulation is tested by administering the task in a factorial design with two levels of controllability and two levels of stressor intensity to a large number of participants online, while simultaneously recording subjective ratings of perceived control, anxiety, and stress. In a second study, the authors employed the wheel stopping task to induce a high sense of controllability and investigate whether this manipulation buffers the response to a subsequent stress induction when compared to a neutral task, such as watching pleasant videos.
 
 Strengths:
 
 (1) The authors validate a method to manipulate stress.
 
 (2) The authors use an experimental manipulation to induce an enhanced sense of controllability to test its impact on the response to stress induction.\
 
 (3) The studies involved big sample sizes.
 
 Weaknesses:
 
 (1) The study was not preregistered.
 
 (2) The control manipulation is conflated with task difficulty and, therefore, the reward rate. In the revised version of the manuscript, the authors perform statistical analysis to demonstrate that the relationship between perceived level of control and subjective stress remains robust after the inclusion of win rate in the model. This analysis strengthens the authors's claims, but the evidence would more substantial if the design did not conflate reward rate and control. The authors properly discuss this issue in the revised manuscript.
 
 This study will be of interest to psychologists and cognitive scientists who are interested in understanding how controllability and its subjective perception influence how people respond to stress exposure. The demonstration that an increased sense of control buffers/protects against subsequent stress is important and may trigger further studies to characterize this phenomenon better. However, beyond the highlighted weaknesses, the current study only studied the effect of stress induction consequent to the performance of the WS task on the same day, and its generalizability is not warranted.
 
 Review 2
4. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This is an interesting investigation on the benefits of perceiving control and its impact on the subjective experience of stress. To assess the subjective sense of control, the authors introduce a novel wheel stopping (WS) task where control is manipulated via size and speed to induce conditions of low and high control. The authors demonstrate that the subjective sense of control is associated with experienced subjective stress and individual differences related to mental health measures. In a second experiment, they further demonstrate that an increased sense of control buffers subjective stress induced by a trier social stress manipulation, more so than a typical stress-buffering mechanism of watching neutral/calming videos.
 
 Strengths:
 
 Several strengths of the manuscript can be highlighted. For instance, the paper introduces a new paradigm and a clever manipulation to test a significant and important question. Additionally, it is a well-powered investigation that allows for confidence in replicability and demonstrate both high internal consistency and high external validity, along with an interesting set of individual difference analyses. Finally, the results are quite interesting and support prior literature, while also making a significant contribution to the field in understanding the benefits of perceiving control.
 
 Weaknesses:
 
 The authors have addressed all my queries, and I believe the revised paper has been improved and will make an important contribution to the literature.
 
 Review 3
5. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Author response:
 
 The following is the authors’ response to the previous reviews.
 
 Reviewer #1 (Public review):
 
 Summary:
 
 This work investigated how the sense of control influences perceptions of stress. In a novel "Wheel Stopping" task, the authors used task variations in difficulty and controllability to measure and manipulate perceived control in two large cohorts of online participants. The authors first show that their behavioral task has good internal consistency and external validity, showing that perceived control during the task was linked to relevant measures of anxiety, depression, and locus of control. Most importantly, manipulating controllability in the task led to reduced subjective stress, showing a direct impact of control on stress perception. However, this work has minor limitations due to the design of the stressor manipulations/measurements and the necessary logistics associated with online versus in-person stress studies.
 
 Nevertheless, this research adds to our understanding of when and how control can influence the effects of stress and is particularly relevant to mental health interventions.
 
 We thank the reviewer for their clear and accurate summary of the findings.
 
 Strengths:
 
 The primary strength of this research is the development of a unique and clever task design that can reliably and validly elicit variations in beliefs about control. Impressively, higher subjective control in the task was associated with decreased psychopathology measures such an anxiety and depression in a non-clinical sample of participants. In addition, the authors found that lower control and higher difficulty in the task led to higher perceived stress, suggesting that the task can reliably manipulate perceptions of stress. Prior tasks have not included both controllability and difficulty in this manner and have not directly tested the direct influence of these factors on incidental stress, making this work both novel and important for the field.
 
 We thank the reviewer for their positive comments.
 
 Weaknesses:
 
 One minor weakness of this research is the validity of the online stress measurements and manipulations. In this study, the authors measure subjective stress via self-report both during the task and also after either a Trier Social Stress Test (high-stress condition) or a memory test (low-stress condition). One concern is that these stress manipulations were really "threats" of stress, where participants never had to complete the stress tasks (i.e., recording a speech for judgment). While this is not unusual for an in-lab study and can reliably elicit substantial stress/anxiety, in an online study, there is a possibility for communication between participants (via online forums dedicated to such communication), which could weaken the stress effects. That said, the authors did find sensible increases and decreases of perceived stress between relevant time points, but future work could improve upon this design by including more complete stress manipulations and measuring implicit physiological signs of stress.
 
 We thank the reviewer for urging us to expand on this point. The reviewer is right that stress was merely anticipatory and is in that sense different to the canonical TSST. However, there are ample demonstrations that such anticipatory stress inductions are effective at reliably eliciting physiological and psychological stress responses (e.g. Nasso et al., 2019; Schlatter et al., 2021; Steinbeis et al., 2015). Further, there is evidence that online versions of the TSST are also effective (DuPont et al., 2022; Meier et al., 2022), including evidence that the speech preparation phase conducted online was related to increases in heart rate and blood pressure (DuPont et al., 2022). Importantly, and as the reviewer notes in relation to our study specifically, the anticipatory TSST had a significant impact on subjective stress in the expected direction demonstrating that it was effective at eliciting subjective stress. We have elaborated further on this in our manuscript (pages 8 and 9) as follows:
 
 “Prior research has found TSST anticipation to elicit both psychological and physiological stress responses [37-39], suggesting that the task anticipation would be a valid stress induction despite participants not performing the speech task. Moreover, prior research has validated the use of remote TSST in online settings [40, 41], including evidence that the speech preparation phase (online) was related to increased heart rate and blood pressure compared to controls [40].”
 
 Reviewer #2 (Public review):
 
 Summary:
 
 The authors have developed a behavioral paradigm to experimentally manipulate the sense of control experienced by the participants by changing the level of difficulty of a wheel-stopping task. In the first study, this manipulation is tested by administering the task in a factorial design with two levels of controllability and two levels of stressor intensity to a large number of participants online while simultaneously recording subjective ratings on perceived control, anxiety, and stress. In the second study, the authors used the wheel-stopping task to induce a high sense of controllability and test whether this manipulation buffers the response to a subsequent stress induction when compared to a neutral task, like looking at pleasant videos.
 
 We thank the reviewer for their accurate summary.
 
 Strengths:
 
 (1) The authors validate a method to manipulate stress.
 
 (2) The authors use an experimental manipulation to induce an enhanced sense of controllability to test its impact on the response to stress induction.
 
 (3) The studies involved big sample sizes.
 
 We thank the reviewer for noting these positive aspects of our study.
 
 Weaknesses:
 
 (1) The study was not preregistered.
 
 This is correct.
 
 (2) The control manipulation is conflated with task difficulty, and, therefore the reward rate. Although the authors acknowledge this limitation at the end of the discussion, it is a very important limitation, and its implications are not properly discussed. The discussion states that this is a common limitation with previous studies of control but omits that many studies have controlled for it using yoking.
 
 We agree that these are very important issues to consider in the interpretation of our findings. It is important to note, that while our task design does not separate these constructs, we are able to do so in our statistical analyses. For example, our measure of perceived difficulty was included in analyses assessing the fluctuations in stress and control in which subjective control still had a unique effect on the experience of stress over and above perceived difficulty, suggesting that subjective control explains variance in stress beyond what is accounted for by perceived difficulty. Similarly, we have also included additional analyses in which we include the win rate (i.e. percentage of trials won) as a covariate when assessing the relationship between subjective control, perceived difficulty and subjective stress, in which subjective control and perceived difficulty still uniquely predict subjective stress when controlling for win rate. This suggests that there is unique variance in subjective control, separate from perceived task difficulty and win rate that is relevant to stress. We have included these analyses (page 16 of manuscript) as follows:
 
 “To further isolate the relationship between subjective control and stress separate from perceived task difficulty or objective task performance, we also included the overall win rate (percentage of trials won during the WS task) in the models. In Study 1, lower feelings of control were related to higher levels of subjective stress (β= -0.12, p<.001) even when controlling for both win rate (β= -0.06, p=.220) and perceived task difficulty (β= 0.37, p<.001, Table S10). This also replicated in Study 2, where lower subjective control was associated with higher feelings of stress (β= -0.32, p<.001) when controlling for perceived task difficulty (β= 0.31, p<.001) and win rate (β= -0.11, p=.428, Table S11). This suggests that there is unique variance in subjective feelings of control, separate from task performance, relevant to subjective stress.”
 
 As well as expanding on this in the Discussion (pages 27 and 28) as follows:
 
 “While our task design does not separate control from obtained reward, we are able to do so in the statistical analyses. Like with perceived difficulty, we statistically accounted for reward rate and showed that the relationship between subjective control and stress was not accounted for by reward rate, for example. Similarly, participants received feedback after every trial, and thus feedback valence may contribute to stress perception. However, given that overall win rate (which captures the feedback received during the task) did not predict stress over and above perceived difficulty or subjective control, it suggests that feedback is unlikely to relate to stress over and above difficulty. Future work will need to disentangle this further to rule out such potential confounds.”
 
 Further, in terms of the wider literature on these issues, we have added more to this point in our discussion, especially in relation to previous literature that also varies control by reward rate (e.g. Dorfman & Gershman, 2019, who use a reward rate of 80% in high control conditions and 50% in low control conditions). This can be found in the manuscript on page 27 as follows:
 
 “Previous research typically accounts for different outcomes (e.g. punishment) by yoking controllable and uncontrollable conditions [3] though other work has manipulated the controllability of rewards by changing the reward rate [for example 30] where a decoy stimulus is rewarded 50% of the time in the low control condition but 80% in the high control condition).”
 
 (3) The methods are not always clear enough, and it is difficult to know whether all the manipulations are done within-subjects or some key manipulations are done between subjects.
 
 We have added more information in the methods section (page 8) clarifying withinsubject manipulations (WS task parameters) and between-subject manipulations (stressor intensity task, WS task version in Study 1, and WS task/video task in Study 2). Additionally, as recommended by Reviewer 1, we have provided more information in the methods section and Table S3 regarding the details of on-screen written feedback provided to participants after each trial of the WS Task.
 
 (4) The analysis of internal consistency is based on splitting the data into odd/even sliders. This choice of data parcellation may cause missed drifts in task performance due to learning, practice effects, or tiredness, thus potentially inflating internal consistency.
 
 We agree that this can indeed be an issue, though drift is likely to be present in any task including even in mood in resting-state (Jangraw et al., 2023). To respond to this specific point, we parcellated the timepoints into a 1st/2nd half split and report the ICC in the supplementary information. While values are lower, indeed likely due to systematic drifts in task performance as participants learn to perform the task (especially for Study 2 since the order of parameters were designed to get easier throughout the experiment), the ICC values are still high. Control sliders: Study 1 = 0.82, Study 2: = 0.68; Difficulty sliders: Study 1: = 0.84, Study 2 = 0.57; Stress sliders: Study 1 = 0.45, Study 2 = 0.71. As seen, the lowest ICC is for stress sliders in Study 1. This may be because the first 3 sliders (included in the 1st half split) were all related to the stress task (initial, post-stress, task, post-debrief) and the final 4 sliders (in the 2nd half split) were the three sliders during the WS task and shortly afterwards.
 
 (5) Study 2 manipulates the effect of domain (win versus loss WS task), but the interaction of this factor with stressor intensity is not included in the analysis.
 
 We agree that this would be a valuable analysis to include. We have run additional analyses (section Sensitivity and Exploratory Analyses, pages 24 and 25), testing the interaction of Domain (win or loss) with stressor intensity (and time) when predicting the stress buffering and stress relief effects. This revealed no significant main effects of domain or interactions including domain, suggesting that domain did not impact the stress induction or relief differently depending on whether it was followed by the high or low stressor intensity condition. While the control by time interaction (our main effect of interest) still held for stress induction in this more complex model, the control by time interaction did not hold for the stress relief. However, this more complex model did not provide a better fit for the data, motivating us to continue to draw conclusions from the original model specification with domain as a covariate (rather than an interaction).
 
 We outline these analyses on page 24 of the manuscript, as follows:
 
 “Third, we included the interaction of domain with stressor intensity and with time, to test whether the win or loss domain in the WS task significantly impacted stress induction or stress relief differently depending on stressor intensity. There were no significant effects or interactions of domain (Table S14) for stress induction or stress relief, and the main effect of interest (the interaction between time and control) still held for the stress induction (β= 10.20, SE=4.99 p=.041, Table S14), though was no longer significant for the stress relief (β= 6.72, SE=4.28, p=.117, Table S14). This more complex model did not significantly improve model fit (χ²(3)= 1.46, p=.691) compared to our original specification (with domain as a covariate rather than an interaction) and had slightly worse fit (higher AIC and BIC) than the original model (AIC = 5477.2 versus 5472.7, BIC = 5538.5 versus 5520.8).”
 
 This study will be of interest to psychologists and cognitive scientists interested in understanding how controllability and its subjective perception impact how people respond to stress exposure. Demonstrating that an increased sense of control buffers/protects against subsequent stress is important and may trigger further studies to characterize this phenomenon better. However, beyond the highlighted weaknesses, the current study only studied the effect of stress induction consecutive to the performance of the WS task on the same day and its generalizability is not warranted.
 
 We thank the reviewer for this assessment and agree that we cannot assume these findings would generalise to more prolonged effects on stress responses.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This is an interesting investigation of the benefits of perceiving control and its impact on the subjective experience of stress. To assess a subjective sense of control, the authors introduce a novel wheel-stopping (WS) task where control is manipulated via size and speed to induce low and high control conditions. The authors demonstrate that the subjective sense of control is associated with experienced subjective stress and individual differences related to mental health measures. In a second experiment, they further show that an increased sense of control buffers subjective stress induced by a trier social stress manipulation, more so than a more typical stress buffering mechanism of watching neutral/calming videos.
 
 We agree with this accurate summary of our study.
 
 Strengths:
 
 There are several strengths to the manuscript that can be highlighted. For instance, the paper introduces a new paradigm and a clever manipulation to test an important and significant question. Additionally, it is a well-powered investigation that allows for confidence in replicability and the ability to show both high internal consistency and high external validity with an interesting set of individual difference analyses. Finally, the results are quite interesting and support prior literature while also providing a significant contribution to the field with respect to understanding the benefits of perceiving control.
 
 We thank the reviewer for this positive assessment.
 
 Weaknesses:
 
 There are also some questions that, if addressed, could help our readership.
 
 (1) A key manipulation was the high-intensity stressor (Anticipatory TSST signal), which was measured via subjective ratings recorded on a sliding scale at different intervals during testing. Typically, the TSST conducted in the lab is associated with increases in cortisol assessments and physiological responses (e.g., skin conductance and heart rate). The current study is limited to subjective measures of stress, given the online nature of the study. Since TSST online may also yield psychologically different results than in the lab (i.e., presumably in a comfortable environment, not facing a panel of judges), it would be helpful for the authors to briefly discuss how the subjective results compare with other examples from the literature (either online or in the lab). The question is whether the experienced stress was sufficiently stressful given that it was online and measured via subjective reports. The control condition (low intensity via reading recipes) is helpful, but the low-intensity stress does not seem to differ from baseline readings at the beginning of the experiment.
 
 We agree that it would be helpful to expand on this further. Similar to the comment made by Reviewer 1, we wish to point out that there are ample demonstrations that such anticipatory stress inductions are effective at reliably eliciting physiological and psychological stress responses (e.g. Nasso et al., 2019; Schlatter et al., 2021; Steinbeis et al., 2015). Further, there is evidence that online versions of the TSST are also effective (DuPont et al., 2022; Meier et al., 2022), including evidence that the speech preparation phase conducted online was related to increases in heart rate and blood pressure (DuPont et al., 2022). We have elaborated further on this in our manuscript on pages 8 and 9 as follows:
 
 “Prior research has found TSST anticipation to elicit both psychological and physiological stress responses [37-39], suggesting that the task anticipation would be a valid stress induction despite participants not performing the speech task. Moreover, prior research has validated the use of remote TSST in online settings [40, 41], including evidence that the speech preparation phase (online) was related to increased heart rate and blood pressure compared to controls [40].”
 
 (2) The neutral videos represent an important condition to contrast with WS, but it raises two questions. First, the conditions are quite different in terms of experience, and it is interesting to consider what another more active (but not controlled per se) condition would be in comparison to the WS performance. That is, there is no instrumental action during the neutral video viewing (even passive ratings about the video), and the active demands could be an important component of the ability to mitigate stress. Second, the subjective ratings of the stress of the neutral video appear equivalent to the win condition. Would it have been useful to have a high arousal video (akin to the loss condition) to test the idea that experience of control will buffer against stress? That way, the subjective stress experience of stress would start at equivalent points after WS3.
 
 We agree with the reviewer that this is an important issue to clarify. In our deliberations when designing this study, we considered that that any task with actionoutcome contingencies would have a degree of controllability. To better distinguish experiences of control (WS task) to an experience of no/neutral control (i.e., neither high nor low controllability), we decided to use a task in which no actions were required during the task itself. Importantly, however, there was an active demand and concentration was still required in order to perform the attention checks regarding the content of the videos and ratings of the videos.
 
 Thank you for the suggestion of having a high arousal video condition. This would indeed be interesting to test how experiencing ‘neutral’ control and high(er) stress levels preceding the stressor task influences stress buffering and stress relief, and we have included this suggestion for future research in the discussion section (page 28) as below:
 
 “Another avenue for future research would be to test how control buffers against stress when compared to a neutral control scenario of higher stress levels, akin to the loss domain in the WS Task, given that participants found the video condition generally relaxing. However, given that we found no differences dependent on domain for the stress induction in the WS Task conditions, it is possible that different versions of a neutral control condition would not impact the stress induction.”
 
 (3) For the stress relief analysis, the authors included time points 2 and 3 (after the stressor and debrief) but not a baseline reading before stress. Given the potential baseline differences across conditions, can this decision be justified in the manuscript?
 
 We thank the reviewer for raising this. Regarding the stress relief analyses (timepoints 2 and 3) and not including timepoint 1 (after the WS/video task) stress in the model, we have added to the manuscript that there was no significant difference in stress ratings between the high control and neutral control (collapsed across stress and domain) at timepoint 1 (hence why we do not think it’s necessary to include in the stress relief model). Nevertheless, we have now included a sensitivity analysis to test the Timepoint*Control interaction of stress relief when including timepoint 1 stress as a covariate. The timepoint by control interaction still holds, suggesting that the initial stress level prior to the stress induction does not impact our results of interest. The details of this analysis are included in the Sensitivity and Exploratory Analyses section on page 24:
 
 “Although there were no significant differences between control groups in subjective stress immediately after the WS/video task (t(175.6)=1.17, p=.244), we included participants’ stress level after the WS/video task as a covariate in the stress relief analyses (Table S12). The results revealed a main effect of initial stress (β= 0.643, SE=0.040, p<.001, Table S12) on the stress relief after the stressor debrief. Compared to excluding initial stress as in the original analyses (Table 4), there was now no longer a main effect of domain (β= 0.236, SE=2.60, p=.093, Table S12), but the inference of all other effects remained the same. Importantly, there was still a significant time by control interaction (β= 9.65, SE=3.74, p=.010, Table S12) showing that the decrease in stress after the debrief was greater in the highly controllable WS condition than the neutral control video condition, even when accounting for the initial stress level.”
 
 (4) Is the increased control experience during the losses condition more valuable in mitigating experienced stress than the win condition?
 
 We agree that this would be helpful to clarify. To test whether the loss domain was more valuable at mitigating experiences of stress than the win condition, we ran additional analyses with just the high control condition (WS task) to test for a Domain*Time interaction. This revealed no significant Domain*Time interaction, suggesting that the stress buffering or stress relief effect was not dependent on domain in the high control conditions. These analyses are outlined in the Sensitivity and Exploratory Analyses section on page 25:
 
 “Finally, to test whether the loss domain was more valuable at mitigating experiences of stress than the win condition, we ran additional analyses with just the high control condition (WS task) for the stress induction and stress relief to test for an interaction of domain and time. For the stress induction, there was no significant two-way interaction of domain and time (β= -1.45, SE=4.80, p=.763), nor a significant three-way interaction of domain by time by stressor intensity (β= -3.96, SE=6.74, p=.557, Table S15), suggesting that there were no differences in the stress induction dependent on domain. Similarly for the stress relief, there was no significant two-way interaction of domain and time (β= -5.92, SE=4.42, p=.182), nor a significant three-way interaction of domain by time by stressor intensity interaction (β= 8.86, SE=6.21, p=.154, Table S15), suggesting that there were no differences in the stress relief dependent on the WS Task domain.
 
 (5) The subjective measure of control ("how in control do you feel right now") tends to follow a successful or failed attempt at the WS task. How much is the experience of control mediated by the degree of experienced success/schedule of reinforcement? Is it an assessment of control or, an evaluation of how well they are doing and/or resolution of uncertainty? An interesting paper by Cockburn et al. 2014 highlights the potential for positive prediction errors to enhance the desire for control.
 
 We thank the reviewer for this comment. Similar to comments regarding reward rate, our task does not allow us to fully separate control from success/reinforcement because of the manipulation of difficulty. However, we did undertake sensitivity analyses and the inclusion of overall win rate accounted for limited variance when predicting stress over and above subjective control and difficulty (page 16).
 
 “To further isolate the relationship between subjective control and stress separate from perceived task difficulty or objective task performance, we also included the overall win rate (percentage of trials won during the WS task) in the models. In Study 1, lower feelings of control were related to higher levels of subjective stress (β= -0.12, p<.001) even when controlling for both win rate (β= -0.06, p=.220) and perceived task difficulty (β= 0.37, p<.001, Table S10). This also replicated in Study 2, where lower subjective control was associated with higher feelings of stress (β= -0.32, p<.001) when controlling for perceived task difficulty (β= 0.31, p<.001) and win rate (β= -0.11, p=.428, Table S11). This suggests that there is unique variance in subjective feelings of control, separate from task performance, relevant to subjective stress.”
 
 (6) While the authors do a very good job in their inclusion and synthesis of the relevant literature, they could also amplify some discussion in specific areas. For example, operationalizing task controllability via task difficulty is an interesting approach. It would be useful to discuss their approach (along with any others in the literature that have used it) and compare it to other typically used paradigms measuring control via presence or absence of choice, as mentioned by the authors briefly in the introduction.
 
 We are delighted to expand on this particular point and have done so in the Discussion on page 27:
 
 “Previous research typically accounts for different outcomes (e.g. punishment) by yoking controllable and uncontrollable conditions [3] though other work has manipulated the controllability of rewards by changing the reward rate [for example 30] where a decoy stimulus is rewarded 50% of the time in the low control condition but 80% in the high control condition). While our task design does not separate control from obtained reward, we are able to do so in the statistical analyses.”
 
 (7) The paper is well-written. However, it would be useful to expand on Figure 1 to include a) separate figures for study 1 (currently not included) and 2, and b) a timeline that includes the measurements of subjective stress (incorporated in Figure 1). It would also be helpful to include Figure S4 in the manuscript.
 
 We have expanded Figure 1 to include both Studies 1 and 2 and a timeline of when subjective stress was assessed throughout the experiment as well as adding Figure S4 to the main manuscript (now top panel within Figure 4).
 
 Reviewer #1 (Recommendations for the authors):
 
 (1) Study 2 shows a greater decrease in subjective stress after the high-control task manipulation than after the pleasant video. One possible confound is whether the amount of time to complete the WS task and the video differ. It could be helpful to look at the average completion time for the WS task and compare that to the length of the videos. Alternatively, in future studies, control for this by dynamically adjusting the video play length to each participant based on how long they took to complete the WS task.
 
 This is an interesting suggestion. As a result, we have included the time taken as a covariate in the stress induction and stress relief analyses to ensure that any differences in time between the WS task and video task were not accounting for any of the stress induction or relief analyses. Controlling for the total time taken did not impact the stress induction or relief results. This is included in the Sensitivity and Exploratory Analyses section on page 24:
 
 “Our second sensitivity analyses was conducted because the experiment took longer to complete for the video condition (mean = 54.3 minutes, SD = 12.4 minutes) than the WS task condition (mean = 39.7 minutes, SD = 12.8 minutes, t(186.19)=-9.32, p<.001). We therefore included the total time (in ms) as a covariate in the stress induction and stress relief analyses for Study 2. This showed that accounting for total time did not change the results of interest (Table S13), further highlighting that the time by control interactions were robust.”
 
 (2) Because participants received feedback about their success/failure in the WS task, a confounding factor could be that they received positive feedback on highly controllable trials and negative feedback on low control trials (and/or highly difficult trials). This would suggest that it is not controllability per se that contributes to stress perception but rather feedback valence. The authors show that this is a likely factor in their results in Study 2, which shows significant effects of the loss domain on perceived control and stress. Was a similar analysis done in Study 1? Do participants receive feedback in Study 1? It would be helpful to include this information somewhere in the manuscript. I would be curious to know whether *any* feedback at all influences controllability/stress perceptions.
 
 We thank the reviewer for this interesting suggestion. It is an interesting question as to whether feedback valence is related to stress in Study 1, and we have added this point to the Discussion on pages 27 and 28. To speak to this point, when we include the overall win rate (which captures the subsequent feedback received) when predicting subjective stress, win rate is not a significant predictor of stress over and above perceived difficulty and subjective control, suggesting that overall feedback valence may not be related to stress in Study 1. We take this as evidence that feedback may not be as important in terms of accounting for the relationship between stress and control. However, we unfortunately do not have any data in which there was no feedback provided to speak to this conclusively. This would be an interesting future study. The excerpt below is added to pages 27 and 28 of the discussion section:
 
 “Like with perceived difficulty, we statistically accounted for reward rate and showed that the relationship between subjective control and stress was not accounted for by reward rate, for example. Similarly, participants received feedback after every trial, and thus feedback valence may contribute to stress perception. However, given that overall win rate (which captures the feedback received during the task) did not predict stress over and above perceived difficulty or subjective control, it suggests that feedback is unlikely to relate to stress over and above difficulty. Future work will need to disentangle this further to rule out such potential confounds.”
 
 To respond specifically to the reviewer’s question about the feedback given to participants, written feedback was provided on screen to participants on a trial-bytrial basis also in Study 1 (i.e. for both studies), and we have provided more clarity about this in the manuscript on page 8 as well as providing additional details in Table S3:
 
 “After each trial, participants were shown written feedback on screen as to whether the segment had successfully stopped on the red zone (or not), and the associated reward (or lack of). See Table S3 for details.”
 
 (3) I'm not sure how to interpret the fact that in Figure S1, the BICs are all essentially the same. Does this mean that you don't really need all of these varying aspects of the task to achieve the same effects? Could the task be made simpler?
 
 The similarity of BIC values suggests that a simpler WS task would have produced a worse account of the data approximately in keeping with the extent to which it is a simpler model. Here, the BIC scores for the models are similar, suggesting that adding these parameters adds explanatory power in keeping with what would have been expected from adding a parameter, but not more. We do note that the BIC is a relatively strict and conservative comparison. The fact that the most complex model overall narrowly improves parsimony; combined with the interpretable parameter values and the prior expectations given the task setup led us to focus on this most complex model.
 
 (4) A minor point, but the authors refer to their sample as "neurotypical." Were they assessed for prior/current psychopathology/medications? If not, I might use a different term here (perhaps "non-clinical sample"), since some prior work has shown that online samples actually have higher instances of psychopathology compared to community samples.
 
 We have changed the phrasing of ‘neurotypical’ to a ‘non-clinical sample’ as recommended.
 
 Reviewer #2 (Recommendations for the authors):
 
 Figure 4S is very informative and could be presented in the main text.
 
 We have expanded Figure 1 to include both Studies 1 and 2 and a timeline of when subjective stress was assessed throughout the experiment as well as adding Figure S4 to the main manuscript (top panel of Figure 4).
 
 References:
 
 Dorfman, H. M., & Gershman, S. J. (2019). Controllability governs the balance between Pavlovian and instrumental action selection. Nature Communications, 10(1), 5826. https://doi.org/10.1038/s41467-019-13737-7
 
 DuPont, C. M., Pressman, S. D., Reed, R. G., Manuck, S. B., Marsland, A. L., & Gianaros, P. J. (2022). An online Trier social stress paradigm to evoke affective and cardiovascular responses. Psychophysiology, 59(10), e14067. https://doi.org/10.1111/psyp.14067
 
 Jangraw, D. C., Keren, H., Sun, H., Bedder, R. L., Rutledge, R. B., Pereira, F., Thomas, A. G., Pine, D. S., Zheng, C., Nielson, D. M., & Stringaris, A. (2023). A highly replicable decline in mood during rest and simple tasks. Nature Human Behaviour, 7(4), 596–610. https://doi.org/10.1038/s41562-023-015197
 
 Meier, M., Haub, K., Schramm, M.-L., Hamma, M., Bentele, U. U., Dimitroff, S. J., Gärtner, R., Denk, B. F., Benz, A. B. E., Unternaehrer, E., & Pruessner, J. C. (2022). Validation of an online version of the trier social stress test in adult men and women. Psychoneuroendocrinology, 142, 105818. https://doi.org/10.1016/j.psyneuen.2022.105818
 
 Nasso, S., Vanderhasselt, M.-A., Demeyer, I., & De Raedt, R. (2019). Autonomic regulation in response to stress: The influence of anticipatory emotion regulation strategies and trait rumination. Emotion, 19(3), 443–454. https://doi.org/10.1037/emo0000448
 
 Schlatter, S., Schmidt, L., Lilot, M., Guillot, A., & Debarnot, U. (2021). Implementing biofeedback as a proactive coping strategy: Psychological and physiological effects on anticipatory stress. Behaviour Research and Therapy, 140, 103834. https://doi.org/10.1016/j.brat.2021.103834
 
 Steinbeis, N., Engert, V., Linz, R., & Singer, T. (2015). The effects of stress and affiliation on social decision-making: Investigating the tend-and-befriend pattern. Psychoneuroendocrinology, 62, 138–148. https://doi.org/10.1016/j.psyneuen.2015.08.003
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.05.626945v2
www.biorxiv.org www.biorxiv.org

Evolutionary Adaptations of IRG1 Refines Itaconate Synthesis and Mitigates Innate Immunometabolism Trade-offs

5
1. Public_Reviews 09 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This important study addresses the timely and interesting question of how itaconate generation emerged in evolution, using taxonomic analysis of the gene and enzyme cis-aconitate decarboxylase (CAD). The authors provide solid evidence identifying three CAD branches in metazoans and showing that the early metazoan paleo-form indeed generates aconitate and is already linked to innate immunity. They further provide limited evidence suggesting that taxonomic differences in subcellular localisation of this enzyme may allow for innate immune signalling without compromising cellular energetics. The implications of the study will be of high interest to the field of innate host defence and immunometabolism.
  
  Summary
2. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The taxonomic analysis of IRG1 evolution is compelling and fills an important gap in the literature. However, the experimental evidence for IRG1 localization requires greater detail and confirmation.
  
  Strengths:
  
  The phylogenetic analysis of IRG1 evolution fills an important gap in the literature. The identification of independent acquisition of metazoan and fungal IRG1 from prokaryotic sources is novel, and the observation that human IRG1 lost mitochondrial matrix localization is particularly interesting, with potentially significant implications for the study of itaconate biology.
  
  Weaknesses:
  
  The protease protection assay was conducted with MTS-IRG1 but not with wild-type IRG1, which should also be tested. Moreover, no complementary methods, such as microscopy, were employed to validate localization. Beyond humans, the structure and localization of mouse IRG1, highly relevant given the widespread use of the mouse as a model for IRG1 functional studies, are not addressed. Finally, if itaconate is indeed synthesized outside the mitochondrial matrix to safeguard metabolic activity, it is not discussed how this reconciles with its reported inhibitory effect on SDH.
  
  Review 1
3. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors are trying to explain how the metabolite itaconate evolved, since although it's involved in host defense, it can also limit mitochondrial function. They are trying to probe the trade-off between these two functions.
  
  Strengths:
  
  The evolutionary aspect is novel; this is the first time to my knowledge that the evolution of IRG1 has been analysed, and there are interesting findings here. The key finding appears to be that subcellular localisation is an important aspect, allowing host defense in some organisms without compromising bioenergetics. This is an interesting finding in the context of immunomebolism, although it needs extra analysis.
  
  Weaknesses:
  
  The work concerning sub-mitochondrial localisation is confusing and needs better analysis.
  
  Review 2
4. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  IRG1 is highly expressed in activated human and mouse myeloid cells. It encodes the mitochondrial enzyme cis-aconitate decarboxylase 1 (ACOD1) that generates itaconate. Itaconate has anti-microbial activity and acts immunoregulatory by interfering with cellular metabolism, signaling to cytokine production, and multiple other processes.
  
  The authors perform a phylogenetic analysis of IRG1 to obtain insight into the evolution of itaconate biosynthesis. Combining BLAST with human IRG1 and a MmgE/Ptrp domain search, they find CAD in all domains of life, but the presence of IRG1 homologs is patchy in eukaryotes, indicating that itaconate biosynthesis is not essential. The phylogenetic analysis showed a more distant relationship of fungal and metazoan CAD/IRG1 to many prokaryotic sequences, suggesting independent acquisition of these metazoan and fungal CAD genes. In metazoans, three subbranches of paleo-IRG1 (in mollusks/early chordates) and two paralogous vertebrate forms (IRG1 and IRG1-like) were identified, with the latter derived from paleo-IRG1, and by genome duplication. While most jawed vertebrates have both IRG1 and IRG1L, metatherian and eutherian mammals have lost IRG1L and contain only IRG1.
  
  Interestingly, sequence analysis of both paralogues showed that many IRG1L genes contain an N-terminal mitochondrial targeting sequence (MTS) that is absent from most IRG1 sequences. Limited proteolysis of submitochondrial localization confirmed that zebrafish IRG1L is only sensitive to proteases in the presence of high Triton X-100, indicative of association with mitochondrial matrix. In contrast, a recent paper from the Galan lab (Lian 2003 Nature Microbiology) reported that human IRG1 is not localized to the mitochondrial matrix, although enriched in mitochondria. Here, the authors generated a matrix-targeted human IRG1 by adding the N-terminal MTS and found that it localizes to the matrix based on a limited proteolysis assay. The loss of MTS-containing IRG1L from most mammals appears, therefore, to indicate that itaconate generation is directed to the cytoplasm, potentially reducing inhibition of TCA cycle activity in the mitochondria.
  
  Next, the authors confirmed that the recombinant IRG1L protein has CAD activity in vitro. The last part of the manuscript addresses the expression of paleo-IRG1 in oysters and amphioxus, where they found high mRNA levels in oyster hemocytes which was further increased by poly(I:C), which was also the case in amphioxus tissues after feeding of LPS or poly(I:C), indicating a role for paleo-IRG1/itaconate in early metazoan innate immunity.
  
  Strengths
  
  (1) Phylogenetic perspective largely lacking so far in the IRG1/itaconate field.
  
  (2) Manuscript clearly written and understandable across disciplines.
  
  (3) Phylogenetic analyses complemented by biochemical and gene expression analyses to link to function.
  
  (4) Lack of MTS in IRG1 and change in localization from mitochondria, highly relevant antimicrobial and cellular effects of itaconate.
  
  Weaknesses:
  
  (1) Biochemical and functional analysis of different CAD mRNA and proteins lacks depth.
  
  (2) The submitochondrial localization assay lacks a native human IRG1 control.
  
  (3) CAD activity shown for IRG1L but not paleo-IRG1.
  
  (4) Itaconate production by early metazoans after PAMP stimulation?
  
  (5) No measurement of energy metabolism (trade-offs?).
  
  I acknowledge that some of these limitations are inevitable because the range of detailed experimental analysis is necessarily limited. However, some of these data would be important to support central claims of the manuscript (further discussed below).
  
  Review 3
5. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Author response:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The taxonomic analysis of IRG1 evolution is compelling and fills an important gap in the literature. However, the experimental evidence for IRG1 localization requires greater detail and confirmation.
  
  Strengths:
  
  The phylogenetic analysis of IRG1 evolution fills an important gap in the literature. The identification of independent acquisition of metazoan and fungal IRG1 from prokaryotic sources is novel, and the observation that human IRG1 lost mitochondrial matrix localization is particularly interesting, with potentially significant implications for the study of itaconate biology.
  
  We thank the reviewer for appreciating the novelty of our study in exploring IRG1 evolution.
  
  Weaknesses:
  
  The protease protection assay was conducted with MTS-IRG1 but not with wild-type IRG1, which should also be tested. Moreover, no complementary methods, such as microscopy, were employed to validate localization. Beyond humans, the structure and localization of mouse IRG1, highly relevant given the widespread use of the mouse as a model for IRG1 functional studies, are not addressed.
  
  Regarding submitochondrial localization of IRG1, we want to draw attention to the published data that a protease protection assay for wild-type mammalian IRG1 has been performed by Lian et al. 2023 (Extended Data Fig. 4), which convincingly demonstrated an outer-mitochondrial membrane localization of endogenous mouse IRG1 in mouse DC2.4 cells upon LPS stimulation that induces IRG1 expression.
  
  Regarding complementary microscopy evidence, the same paper performed two-color, DNA-paint super-resolution imaging to demonstrate an enrichment of IRG1 to mitochondria with a lack of co-localization of the inner membrane/matrix marker Cox IV.
  
  Given the direct visualization of sub-mitochondrial localization, we consider applying super-resolution microscopy to revisit the sub-mitochondrial localization of di[erent IRG1 constructs in the study.
  
  Reference:
  
  Lian H, Park D, Chen M, Schueder F, Lara-Tejero M, Liu J, Galán JE. Parkinson's disease kinase LRRK2 coordinates a cell-intrinsic itaconate-dependent defence pathway against intracellular Salmonella. Nat Microbiol. 2023 Oct;8(10):1880-1895. doi: 10.1038/s41564-023-01459-y. Epub 2023 Aug 28. PMID: 37640963; PMCID: PMC10962312.
  
  Finally, if itaconate is indeed synthesized outside the mitochondrial matrix to safeguard metabolic activity, it is not discussed how this reconciles with its reported inhibitory e[ect on SDH.
  
  We thank the excellent point raised by the reviewer. Indeed, itaconate has been proposed to inhibit matrix SDH exhibiting anti-inflammation function (Lampropoulou, Cell Metab 2016). While the mitochondrial transport of itaconate has not been fully characterized in vivo or in cells, a specific itaconate transport activity has been shown for the mitochondrial 2-oxoglutarate transporter OGC using in vitro proteoliposome system (Mills et al. Nature 2018).
  
  We plan to discuss this important point on mitochondrial itaconate transport in the revision.
  
  Reference:
  
  Lampropoulou V, Sergushichev A, Bambouskova M, Nair S, Vincent EE, Loginicheva E, Cervantes-Barragan L, Ma X, Huang SC, Griss T, Weinheimer CJ, Khader S, Randolph GJ, Pearce EJ, Jones RG, Diwan A, Diamond MS, Artyomov MN. Itaconate Links Inhibition of Succinate Dehydrogenase with Macrophage Metabolic Remodeling and Regulation of Inflammation. Cell Metab. 2016 Jul 12;24(1):158-66. doi: 10.1016/j.cmet.2016.06.004. Epub 2016 Jun 30. PMID: 27374498; PMCID: PMC5108454.
  
  Mills EL, Ryan DG, Prag HA, Dikovskaya D, Menon D, Zaslona Z, Jedrychowski MP, Costa ASH, Higgins M, Hams E, Szpyt J, Runtsch MC, King MS, McGouran JF, Fischer R, Kessler BM, McGettrick AF, Hughes MM, Carroll RG, Booty LM, Knatko EV, Meakin PJ, Ashford MLJ, Modis LK, Brunori G, Sévin DC, Fallon PG, Caldwell ST, Kunji ERS, Chouchani ET, Frezza C, Dinkova-Kostova AT, Hartley RC, Murphy MP, O'Neill LA. Itaconate is an anti-inflammatory metabolite that activates Nrf2 via alkylation of KEAP1. Nature. 2018 Apr 5;556(7699):113117. doi: 10.1038/nature25986. Epub 2018 Mar 28. PMID: 29590092; PMCID: PMC6047741.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors are trying to explain how the metabolite itaconate evolved, since although it's involved in host defense, it can also limit mitochondrial function. They are trying to probe the trade-o[ between these two functions.
  
  Strengths:
  
  The evolutionary aspect is novel; this is the first time to my knowledge that the evolution of IRG1 has been analysed, and there are interesting findings here. The key finding appears to be that subcellular localisation is an important aspect, allowing host defense in some organisms without compromising bioenergetics. This is an interesting finding in the context of immunomebolism, although it needs extra analysis.
  
  Weaknesses:
  
  The work concerning sub-mitochondrial localisation is confusing and needs better analysis.
  
  We thank the reviewer for the constructive feedback. As in our response to reviewer 1, we want to draw attention to the published data in which the outer mitochondrial membrane localization of IRG1 has been demonstrated by protease protection assay and explored using super-resolution imaging by Lian et al. 2023 (Extended Data Fig. 4). Given the direct visualization of sub-mitochondrial localization by super-resolution imaging, we plan to revisit and to apply the method to di[erent IRG1 constructs used in the paper.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  IRG1 is highly expressed in activated human and mouse myeloid cells. It encodes the mitochondrial enzyme cis-aconitate decarboxylase 1 (ACOD1) that generates itaconate. Itaconate has anti-microbial activity and acts immunoregulatory by interfering with cellular metabolism, signaling to cytokine production, and multiple other processes.
  
  The authors perform a phylogenetic analysis of IRG1 to obtain insight into the evolution of itaconate biosynthesis. Combining BLAST with human IRG1 and a MmgE/Ptrp domain search, they find CAD in all domains of life, but the presence of IRG1 homologs is patchy in eukaryotes, indicating that itaconate biosynthesis is not essential. The phylogenetic analysis showed a more distant relationship of fungal and metazoan CAD/IRG1 to many prokaryotic sequences, suggesting independent acquisition of these metazoan and fungal CAD genes. In metazoans, three subbranches of paleo-IRG1 (in mollusks/early chordates) and two paralogous vertebrate forms (IRG1 and IRG1-like) were identified, with the latter derived from paleo-IRG1, and by genome duplication. While most jawed vertebrates have both IRG1 and IRG1L, metatherian and eutherian mammals have lost IRG1L and contain only IRG1.
  
  Interestingly, sequence analysis of both paralogues showed that many IRG1L genes contain an N-terminal mitochondrial targeting sequence (MTS) that is absent from most IRG1 sequences. Limited proteolysis of submitochondrial localization confirmed that zebrafish IRG1L is only sensitive to proteases in the presence of high Triton X-100, indicative of association with mitochondrial matrix. In contrast, a recent paper from the Galan lab (Lian 2003 Nature Microbiology) reported that human IRG1 is not localized to the mitochondrial matrix, although enriched in mitochondria. Here, the authors generated a matrix-targeted human IRG1 by adding the N-terminal MTS and found that it localizes to the matrix based on a limited proteolysis assay. The loss of MTS-containing IRG1L from most mammals appears, therefore, to indicate that itaconate generation is directed to the cytoplasm, potentially reducing inhibition of TCA cycle activity in the mitochondria.
  
  Next, the authors confirmed that the recombinant IRG1L protein has CAD activity in vitro. The last part of the manuscript addresses the expression of paleo-IRG1 in oysters and amphioxus, where they found high mRNA levels in oyster hemocytes which was further increased by poly(I:C), which was also the case in amphioxus tissues after feeding of LPS or poly(I:C), indicating a role for paleo-IRG1/itaconate in early metazoan innate immunity.
  
  Strengths
  
  (1) Phylogenetic perspective largely lacking so far in the IRG1/itaconate field.
  
  (2) Manuscript clearly written and understandable across disciplines.
  
  (3) Phylogenetic analyses complemented by biochemical and gene expression analyses to link to function.
  
  (4) Lack of MTS in IRG1 and change in localization from mitochondria, highly relevant antimicrobial and cellular e[ects of itaconate.
  
  We thank the reviewer for the positive comments with the strengths.
  
  Weaknesses:
  
  (1) Biochemical and functional analysis of di[erent CAD mRNA and proteins lacks depth.
  
  We plan to explore two types of experiments:
  
  First, we plan to purify di[erent CAD recombinant proteins; and if successful, we will test their in vitro enzymatic activity in synthesize itaconate. The positive data will also answer question (3) below.
  
  Second, we plan to measure itaconate level in oyster hemocytes after PAMP stimulation, to demonstrate an in vivo itaconate production activity by paleo-IRG1. The data will also address question (4) below.
  
  (2) The submitochondrial localization assay lacks a native human IRG1 control.
  
  As in our response to reviewer 1, we believe Lian et al. 2023. provided strong evidence supporting an outer mitochondrial membrane localization of wild-type endogenous, mouse IRG1. Given the direct visualization using suer-resolution imaging, we plan to revisit submitochondrial localization of di[erent IRG1 constructs using super-resolution imaging.
  
  (3) CAD activity shown for IRG1L but not paleo-IRG1.
  
  We plan to purify di[erent CAD recombinant proteins; and if successful, we will test their in vitro enzymatic activity in producing itaconate.
  
  (4) Itaconate production by early metazoans after PAMP stimulation?
  
  We plan to measure itaconate level in oyster hemocytes after PAMP stimulation, to demonstrate an in vivo itaconate production activity by paleo-IRG1.
  
  (5) No measurement of energy metabolism (trade-o[s?).
  
  Because PAMP signaling might trigger other downstream e[ects that also impair mitochondrial function, for instance nitric oxide that inhibits complex IV, we plan to avoid PAMP condition and direct test the e[ect of itaconate production. We plan to compare the impact on mitochondrial bioenergetics, if the same CAD enzymes (thus with the same activity) can be expressed at the same level intra-mitochondrially and extramitochondrially, for instance in the case of MTS-hACOD1 and hACOD1.
  
  AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.06.17.496652v3
www.biorxiv.org www.biorxiv.org

When word order matters: human brains represent sentence meaning differently from large language models

5
1. Public_Reviews 09 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This work provides a valuable comparison of sentence structure representations in the human brain and state-of-the-art Large Language Models (LLMs). Based on solid analysis of 7T fMRI data, it systematically identifies sentences in which LLMs underperform relative to models that explicitly code for syntactic structure. The study will be of significant interest to both cognitive neuroscientists and artificial intelligence researchers.
  
  Summary
2. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper investigates whether transformer-based models can represent sentence-level semantics in a human-like way. The authors designed a set of 108 sentences specifically to dissociate lexical semantics from sentence-level information and collected 7T fMRI data from 30 participants reading these sentences. They conducted representational similarity analysis (RSA) comparing brain data and model representations, as well as the human behavioral ratings. It is found that transformer-based models match brain representation better than a static word embedding baseline, which ignores word order, but fall short of models that encode the structural relations between words. The main contributions of this paper are:
  
  (1) The construction of a sentence set that disentangles sentence structure from word meaning.
  
  (2) A comprehensive comparison of neural sentence representations (via fMRI), human behavior, and multiple computational models at the sentence level.
  
  Strengths:
  
  (1) The paper evaluates a wide variety of models, including layer-wise analysis for transformers and region-wise analysis in the human brain.
  
  (2) The stimulus design allows precise dissociation between lexical and sentence-level semantics. The RSA-based approach is empirically sound and intuitive.
  
  (3) The constructed sentences, along with the fMRI and behavioral data, represent a valuable resource for studying sentence representation.
  
  Weaknesses:
  
  (1) The rationale behind averaging sentence embeddings across multiple transformer models (with different architectures and training objectives) is unclear. These transformer-based models have different training paradigms and model architectures, which may result in misaligned semantic spaces. The averaging operation may dilute the distinct sentence representations learned by each model, potentially weakening the overall semantic encoding for sentences. Please clarify this choice or cite supporting methodology.
  
  (2) All structure-sensitive models discussed incorporate semantics to some extent. Including a purely syntactic baseline, such as a model based on context-free grammar, would help confirm the importance of syntactic structures.
  
  (3) In Figure 2, human behavioral judgments show weak correlations with neural data, and even fall below those of computational models, suggesting the behavioral judgments may not reflect the sentence structures in a brain-like way. This discrepancy between behavioral and neural data should be clarified, as it affects the interpretation of the results.
  
  (4) To better contextualize model and neural performance, sentence similarity should be anchored to a notion of semantic "ground truth", such as the matrix shown in Figure 1a. Comparing this reference with human judgments, brain responses, and model similarities would help establish an upper bound.
  
  (5) The structure of this paper is confusing. For instance, Figure 5 is cited early but appears much later. Reordering sections and figures would enhance readability.
  
  (6) While the analysis is broad and comprehensive, it lacks depth in some respects. For instance, it remains unclear what specific insights are gained from comparing across brain regions (e.g., whole brain, language network, and other subregions). Similarly, the results of simple-average and group-average RSA appear quite similar and may not advance the interpretation.
  
  (7) While explaining the grid-like pattern due to sentence length is important, this part feels somewhat disconnected from the central question of this paper (word order). It might be better placed in supplementary material.
  
  Review 1
3. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The paper used fMRI data while reading a set of sentences. The sentences are designed to disentangle syntax from meaning. RSA was performed using voxel activations and a variety of language models. The results show that transformers are inferior to models with explicit syntactic representation in terms of matching brain representations.
  
  Strengths:
  
  (1) The study controls for some variables that allow for an investigation of sentence structure in the brain. This controlled setting has an advantage over naturalistic stimuli in targeting more specific linguistic phenomena.
  
  (2) The study combines fMRI data with behavioral similarity ratings and a variety of language models (static, transformers, graph-based models).
  
  Weaknesses:
  
  (1) The stimuli are not fully controlled for lexical content across conditions. Residual lexical differences between sentences could still influence both brain and model similarity patterns. To more cleanly isolate syntactic effects, it would be useful to systematically vary only a single structural element while keeping all other lexical content constant (e.g., the boy kicked the ball / the ball kicked the boy). It would be better to engage more with the minimal pair paradigm, which is widely used in large language model probing research.
  
  (2) The comparisons are done across fundamentally different model types, including static embeddings, graph-based parsers, and transformers. The inherent differences in dimensionality and training objectives might make the conclusion drawn from RSA inconclusive. Transformer embeddings typically occupy much higher-dimensional, anisotropic representational spaces, and their similarity structure may reflect richer, more heterogeneous information than models explicitly encoding semantic roles. A lower RSA correlation in this study does not necessarily imply that transformers fail to encode syntactic information; rather, they may represent additional aspects of meaning or context that diverge from the narrow structural contrasts probed here.
  
  (3) The interpretation of the RSA correlation largely depends on the understanding of models. The authors suggest that because hybrid models correlate better than transformers, this implies that transformers are inferior at representing syntax. However, this is not a direct test of syntactic ability. Transformers may encode syntactic information, but it may not be expressed in a way that aligns with the RSA paradigm or the chosen stimuli. RSA does not reveal what the model encodes, and the models might achieve a good correlation for non-syntactic reasons (e.g., length of sentence, orthographic similarity, lexical features).
  
  Review 2
4. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Large Language Models have revolutionized Artificial Intelligence and can now match or surpass human language abilities on many tasks. This has fueled interest in cognitive neuroscience in exposing representational similarities between Language Models and brain recordings of language comprehension. The current study breaks from this mold by: (1) Systematically identifying sentence structures for which brain and Large Language Model representations diverge. (2) Demonstrating that brain representations for these sentences can be better accounted for by a model structured by the semantic roles of words in the sentence. As such, the study may now fuel interest in characterizing how Large Language Models and brain representations differ, which may prompt new, more brain-like language models.
  
  Strengths:
  
  (1) This study presents a bold and solid challenge to a literature trend that has touted similarities between Transformer models and human cognition based on representational correlations with brain activity. This challenge is substantiated by identifying sentences for which brain and model representations of sentences diverge and explaining those divergences using models structured by semantic roles/syntax.
  
  (2) This study conducts a rigorous pre-registered analysis of a comprehensive selection of the state-of-the-art Large Language Models, on a controlled sentence comprehension fMRI dataset. The analysis is conducted within a Representation Similarity framework to support similarity comparisons between graph structures and brain activity without needing to vectorize graphs. Transformer models are predicted and shown to diverge from brain representations on subsets of sentences with similar word-level content but different sentence structures.
  
  (3) The study introduces a 7T fMRI sentence comprehension dataset and accompanying human sentence similarity ratings, which may be a fruitful resource for developing more human-like language models. Unlike other model-based sentence datasets, the relation between grammatical structure and word-level content is controlled, and subsets of sentences for which models and brains diverge are identified.
  
  Weaknesses:
  
  (1) The interpretation of findings is nuanced. Although Transformers underperform as brain models on the critical subsets of controlled sentences, a Transformer outperforms all other models when evaluated on the union of all sentences when both word-level content and structure vary. Transformers also yield equivalent or better models of human behavioral data. Thus, although Transformers have demonstrable flaws as human models, which are pinpointed here, in the general case, (some) Transformers are more human-like than the other models considered.
  
  (2) There may be confounds between the critical sentence structure manipulations and visual representations of sentence stimuli. This is inconvenient because activation in brain regions that process semantics tends to partially correlate with visual cortex representations, and computational models tend to reflect the number of words/tokens/elements in sentences. Although the study commendably controls for confounds associated with sentence length, there could still be residual effects that remain. For instance, the Graph model correlates most strongly with the visual cortex despite these sentence length controls.
  
  (3) Sentence similarity computations are emphasized as the basis for unifying comparative analyses of graph structures and vector data. A strength of this approach is that correlation is not always the ideal similarity metric. However, a weakness is that similarity computations are not unified across models. This has practical consequences here because different similarity metrics applied to the same model produce positive or negative correlations with brain data.
  
  Review 3
5. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Author response:
  
  We thank the reviewers for their insightful comments on our manuscript. Here we briefly highlight our responses to several issues raised by reviewers, and also provide a summary of planned changes to be made with the next draft.
  
  Reviewer 1:
  
  (1) The reviewer questions the rationale for averaging sentence embeddings across different models. However, our method involves computing correlations separately for each model, then averaging the correlations. We also report model correlations for each model separately in Fig S2. We will clarify this in our revised manuscript.
  
  (2) We agree with the reviewer that including a context-free grammar model as a comparison would be informative. We will incorporate this in the revised manuscript.
  
  (3) The reviewer raises questions about the low correlation between behavioural and brain similarities. While the behavioural judgements are made by different participants and involve a different task than the neuroimaging results, nonetheless we agree the difference is surprising and warrants more detailed consideration. We will provide additional discussion of the relationship between behavioural judgements and brain data in the revised manuscript.
  
  (4) The reviewer suggests contrasting our models with a ‘semantic ground truth’, as in our design matrix shown in Fig 1. While our design matrix served as the basis for constructing a set of stimuli with systematic modifications, we respectfully suggest that it should not be regarded as a ‘semantic ground truth’. In particular, sentence pairs within each category will not have the same degrees of semantic similarity since the words and context differ across sentences in a graded manner. Furthermore, while we anticipated ‘different’ sentence pairs would be less similar than ‘swapped’ sentence pairs, and that within each of the six block diagonals the ‘modified’ or ‘substituted’ sentence pairs would be the most similar, we did not have any prediction about the magnitude of these differences. Our goal was to construct a set of sentence pairs which spanned a range of semantic similarities, and allowed for dissociation between lexical similarity and overall similarity in meaning. The design matrix is not intended to represent a ‘ground truth’ that human judgements or brain representations would be expected to conform with.
  
  (5) In the revised draft we will modify the location of Fig. 5 so that it flows better with the text.
  
  (6) We agree that the discussion of the differences between brain regions could be expanded. We will include this in the revised version of our manuscript. The reviewer questions our inclusion of the simple-average and group-average RSA analysis as they show similar results. We included both analyses in line with our preregistration, and also because we believe the fact that two distinct approaches to analyzing the data yield similar results strengthens our conclusions.
  
  (7) We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript.
  
  Reviewer 2:
  
  (1) The reviewer argues that our stimuli do not fully control for lexical content across conditions, and that a more appropriate paradigm may be to utilise minimal pairs in which only a single variable of interest (such as sentence structure) is modified. We agree that most of our sentence pairs do not constitute minimal pairs, however this was not our objective. Our study design aimed to synthesise traditional minimal pair approaches with more recent research paradigms using naturalistic stimuli. As such, we selected stimuli which are more complex and contain more variable features than traditional minimal pair studies, but which also are tailored to highlight differences which are of particular theoretical interest. Because we are interested in comparing the effects of multiple sentence elements and semantic roles, a systematic pairwise comparison of minimal pairs is not necessarily optimal. Instead, we designed our stimuli to leverage the advantage of fMRI in that we can measure the brain representations corresponding to each sentence, and hence can conduct a full series of pairwise comparisons of sentence representations. Most of these comparisons will not be between minimal pairs, but we selected sentences so as to provide a range of semantic similarities (low to high), while also providing for semantic contrasts of theoretical interest (such as the ‘swapped’ and ‘substituted’ sentence pairs). We do not claim this approach to be universally superior to a minimal pair approach, but we do believe our novel approach provides additional insights and a new perspective on semantic representation relative to minimal pair studies. We will add additional detail in the revised manuscript providing additional explanation for how stimuli were chosen, and contrasting this with minimal pair approaches.
  
  (2) The reviewer notes that low RSA correlations do not imply that transformers fail to encode syntactic information. We acknowledge this in our discussion (page 10), where we also highlight that our focus is not on whether transformers encode such information, but rather what transformer representations can tell us about how sentence structure is represented in the brain. Our results indicate that transformer embeddings do not have the same geometric properties as brain representations of sentence meaning, at least for certain types of sentences where lexical information is insufficient to determine overall meaning. The reviewer also notes that transformer embeddings are highly anisotropic, however we adjust for this by normalising each feature as discussed on page 14. Finally, the reviewer notes that the transformers we examine differ in architecture and training objectives. This is not critical for our study because we are not seeking to determine which architecture or training objectives are best. Our goal is simply to compare a range of approaches and see which, if any, have similar sentence representations to those formed by the brain. In fact, our results indicate that architecture and training regime make relatively little difference for our stimuli.
  
  (3) The reviewer argues that RSA correlations do not measure the extent to which a model encodes syntactic information. This is very similar to the previous point. We do not claim that our results show that transformers do not encode syntactic information. Rather, our claim is that sentence embeddings derived from transformers have different geometric properties to brain representations, and that brain representations are better described by models explicitly representing key semantic roles. From this we conclude that, at least for the sentences we present, the brain is highly sensitive to semantic roles in a way that transformer representations are not (at least to the same extent). We also respectfully disagree with the reviewer’s suggestions that sentence length and orthographic or lexical similarities may drive model correlations with brain activity. As we discuss on page 19, we explicitly control for differences in sentence length when computing correlations. Our process for constructing our sentence set also controls for lexical similarity by generating pairs of sentences with all or mostly the same words but different orderings. We did not explicitly address orthographic similarity, but this will be strongly correlated with lexical similarity.
  
  Reviewer 3:
  
  (1) The reviewer emphasises the need for nuance in our conclusions, given that some of the transformers achieve higher correlations when assessed over the full set of sentences. We agree with this comment, and will modify the discussion section in the revised manuscript to address this point. Having said that, we would like to note one of the disadvantages of transformers as a model of mind or brain representations is that they are largely a ‘black box’ whose workings are poorly understood. One advantage of hybrid models like our simple semantic role model is that they can be much easier to interpret, thereby enabling them to be used to determine which features are most important for brain representations of sentence meaning, and what mechanisms are used to combine individual words into a full sentence. Given their relative simplicity and interpretability, we believe hybrid models have considerable value as scientific tools, even in cases where they achieve comparable correlations to transformers. We will highlight this issue more clearly in our revised manuscript.
  
  (2) The reviewer notes that despite our existing controls, residual confounds of sentence length may remain. We agree that this is a potential issue, and will add discussion to the revised manuscript. We also will present further supplementary analyses which we believe indicate that sentence length effects do not drive our main results. At the same time, we believe the fact that our results are robust to simultaneously controlling for sentence length and the ‘minimum length effect’ (Fig. S5) indicates they are not primarily driven by sentence length effects.
  
  (3) The reviewer notes that the method for computing similarities differs between the vector-based (mean and transformer) models, and the hybrid and syntax-based models, thereby potentially adding an additional confound to our results. We agree that this is a potential limitation, and our correlations should always be understood as applying to a model paired with a similarity metric. However, we believe that this is mostly unavoidable when comparing different formalisms. An alterative approach of first embedding a graph into a vector and then training an encoding model on the graph embeddings has a similar limitation of being dependent not just on the graph representation, but also on the way it was embedded into a vector and the way the encoding model was trained. Arguably this process is more opaque than similarity methods, since it is unclear to what extent the graph embeddings preserve the logic and properties of a graph-based representation. Further, it not clear whether there is any single method which can overcome the difficulty of comparing distinct formalisms for representing semantics. The reviewer also highlights how the correlations measured for the syntax model differ greatly depending on whether the Smatch or WWLK similarity metrics are used. We believe this highlights the need for careful examination of commonly used graph similarity metrics, as has been noted in previous research. We will include additional discussion of this issue in our revised manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.07.19.665701v1
www.biorxiv.org www.biorxiv.org

TrueProbes: Quantitative Single-Molecule RNA-FISH Probe Design Improves RNA Detection

5
1. Public_Reviews 09 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This useful study introduces a computational pipeline for designing RNA in situ fluorescence hybridization probes that could improve the sensitivity and specificity of RNA detection in cells. While the approach is novel and the preliminary data suggestive, the evidence supporting a clear advantage over existing probe design strategies is incomplete. The work will be of interest to researchers developing or using molecular tools for imaging RNA in cells.
  
  Summary
2. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  The authors describe a new computational pipeline designed to identify smFISH probes with improved RNA detection compared to preexisting approaches. smFISH is a powerful and relatively straightforward technique to detect single RNAs in cells at subcellular resolution, which is critical for understanding gene expression regulation at the RNA level. However, existing methods for designing smFISH oligos suffer from several limitations, including off-target binding that produces high background signals, as well as a restricted number of probes that are sufficiently specific to target shorter-than-average mRNAs. To address these challenges, the authors developed TrueProbes, a computational method that aims to minimize off-target-mediated background fluorescence.
  
  Overall, the study addresses a technically relevant problem. If improved, this would allow researchers to study gene expression regulation more effectively using single-molecule FISH. However, based on the current presentation of data, it is not yet clear that TrueProbes offers significant advantages over preexisting pipelines. In the following section, I describe some concerns, which should be adequately addressed.
  
  Major Comments:
  
  (1) The manuscript currently presents only one example in which different pipelines were tested to generate probes (targeting ARF4). While the images suggest that both TrueProbes and Stellaris outperform the other pipelines, the comparison is potentially misleading because the number of probes used differs substantially. I recommend that the authors include at least three independent examples in which an equal number of probes are designed across pipelines, so that signal-to-noise can be assessed in a controlled and comparable way. This would allow the probe number to be held constant while directly evaluating performance.
  
  (2) It is also unclear how many biological replicates were performed for the ARF4 experiments. If only a single replicate was included, it is difficult to conclude that TrueProbes consistently outperforms other pipelines in a robust and reproducible manner. I suggest the authors include data from at least three biological replicates with appropriate statistical analysis, and ideally extend this to additional smFISH targets as outlined in Comment 1.
  
  (3) No controls are presented to demonstrate that the TrueProbes-designed smFISH spots are specifically detecting ARF4. The current experiment primarily measures signal-to-noise, but it remains possible that some detected spots do not correspond to ARF4 mRNAs. Since one of the major criteria used by TrueProbes is to limit cross-hybridization, the authors should perform ARF4 knockdown experiments and demonstrate that nearly all ARF4 smFISH signal is lost. A similar approach should be applied to the additional examples recommended in Comment 1.
  
  (4) In the limitations of the study, the authors note that "RNA secondary and tertiary structures are not included, which may lead to inaccuracies if binding sites are structurally occluded." However, I am not convinced that this is a true limitation, since formamide in the smFISH protocol should denature secondary structures and allow oligo access to the RNA. I recommend that the authors comment on this point and clarify whether secondary structure poses a practical limitation in smFISH probe design.
  
  (5) The authors also correctly acknowledge in their limitations that "RNA-protein interactions, which can modulate accessibility of the transcript, are not modeled." I suggest referencing relevant studies on this issue, particularly Buxbaum et al. (2014, Science), which would provide important context.
  
  Review 1
3. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Hughes et al present a new single-molecule RNA fluorescence in situ hybridization (smFISH) probe design software, termed "TrueProbes" in this manuscript. They claim that all existing smFISH (and variants) probe design software packages have limitations that ultimately impact experimental performance. The author's claim to address the majority of these limitations in TrueProbes by introducing multiple computational steps to ensure high-quality probe design. The manuscript's goal is clear, and the authors provide some evidence by designing and targeting one gene. Overall, the manuscript lacks rigorous evidence to support the claims, does not demonstrate its suitability for a variety of smFISH-type experiments, and some of the provided quantification data are unclear. While TrueProbes clearly has potential, more data is required, or the authors should tone down the claims.
  
  Strengths:
  
  (1) The problem is well-articulated in the abstract and the introduction.
  
  (2) Figures 3 and 4 follow a consistent color scheme where each probe design method has its own color, which helps the reader visually compare methods.
  
  (3) The authors compared multiple probe design software packages both computationally and experimentally.
  
  (4) TrueProbes does produce visually and quantitatively better results when compared to 2 of the 4 existing smFISH probe design packages (Paintshop and MERFISH panel designer).
  
  (5) The authors introduce a comprehensive steady-state thermodynamic model to help optimally guide probe design.
  
  Weaknesses:
  
  (1) The abstract describes the problem well and introduces the solution (the TrueProbes software), but fails to provide specific ways in which the TrueProbes software performs better. The authors state that "...[TrueProbes] consistently outperformed alternatives across multiple computational metrics and experimental validation assays", but specific, quantitative evidence of improved performance would strengthen the statement.
  
  (2) The text claims that TrueProbes outperforms all other probe design software, but Figure 3 indicates that TrueProbes has neither the greatest number of on-target binding nor the lowest number of off-target binding. The data in Figure 3 does not support the claims made in the text. Specifically, the authors claim that "RNA FISH Experimental Results Demonstrate that Off Target and Binding Affinity Inclusive Probe Design Improve RNA FISH Signal Discrimination" (lines 217-218). However, despite their claim that Stellaris and Oligostan-HT produce more off-target probes when evaluated with the TrueProbes framework, the experiment results are nearly identical. The authors should consider modifying their claims or performing new experiments that more clearly demonstrate their claims.
  
  (3) The bar graphs in Figure 3 do not seem to agree with the probability graphs in Figure 4. For example, Figure 3 indicates that Stellaris probes have higher off-target binding than TrueProbes; however, in Figure 4, their probability graphs lie almost on top of each other.
  
  (4) The authors performed validation for only one gene (ARF4), because "...it had the highest gene expression (in TPM units) and the fewest isoforms among all candidate genes for the Jurkat cell line" (lines 176-177). While the results do look good, this is a minimal use case and does not really showcase the power of their method. One experiment that could be helpful would be two-color (or more) smFISH in tissue, where the chances for off-target binding contributing to higher errors are much greater than in an adherent cell line.
  
  (5) A common strategy for both smFISH and highly multiplexed methods is to use secondary DNA oligos with dye molecules instead of direct conjugation. Given that this is a primary design goal of PaintSHOP and the Zhuang lab's MERFISH probe design code, it would be helpful to demonstrate that TrueProbes can design a two-layer probe strategy for high-quality RNA-FISH labeling.
  
  (6) The authors claim, "For every probe set, TrueProbes can simulate expected smRNA FISH outcomes including optimal probe, RNA, and salt concentrations and optionally account for probe secondary structure, hybridization temperature, multiple targets, fluorophore choice, DNA, nascent RNA, and photon count statistics (Figures S2A, S2B). The model can be used to generate predictions for temperature and cell line sensitivity, multi-target discrimination, multiple fluorophore colocalization; when provided transcript expression levels and probe/background intensity, it can start to generate predictions for spot intensity, background, signal to noise ratio, and false negative rates (Figure S2C)." (lines 156-163). Figure S2 is a flow chart and does not provide evidence for any of these items. The authors should provide evidence for these claims, either as a figure or an example script in their software repository. If that is not possible, then it should be removed.
  
  (7) All thermodynamic equations are performed at steady state. The authors do not justify this assumption, and there is no discussion of the potential impacts of either low molecule numbers or violations of the well-mixed assumption. Can the authors please include a discussion on the potential impacts non non-steady state dynamics?
  
  Review 2
4. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This manuscript introduces a new platform termed "TrueProbes" for designing mRNA FISH probes. In comparison to existing design strategies, the authors incorporate a comprehensive thermodynamic and kinetic model to account for probe states that may contribute to nonspecific background. The authors validate their design pipeline using Jurkat cells and provide evidence of improved probe performance.
  
  Strengths:
  
  A notable strength of TrueProbes is the consideration of genome-wide binding affinities, which aims to minimize off-target signals. The work will be of interest to researchers employing mRNA FISH in certain human cell lines.
  
  Weaknesses:
  
  However, in my view, the experimental validation is not sufficient to justify the broad claims of the platform. Given the number of assumptions in the model, additional experimental comparisons across probe design methods, ideally targeting transcripts with different expression levels, would be necessary to establish the general superiority of this approach.
  
  Review 3
5. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Author response:
  
  Reviewer #1 (Public Review):
  
  The authors describe a new computational pipeline designed to identify smFISH probes with improved RNA detection compared to preexisting approaches. smFISH is a powerful and relatively straightforward technique to detect single RNAs in cells at subcellular resolution, which is critical for understanding gene expression regulation at the RNA level. However, existing methods for designing smFISH oligos suffer from several limitations, including off-target binding that produces high background signals, as well as a restricted number of probes that are sufficiently specific to target shorter-than-average mRNAs. To address these challenges, the authors developed TrueProbes, a computational method that aims to minimize off-target-mediated background fluorescence.
  
  Overall, the study addresses a technically relevant problem. If improved, this would allow researchers to study gene expression regulation more effectively using single-molecule FISH. However, based on the current presentation of data, it is not yet clear that TrueProbes offers significant advantages over preexisting pipelines. In the following section, I describe some concerns, which should be adequately addressed.
  
  Major Comments:
  
  (1) The manuscript currently presents only one example in which different pipelines were tested to generate probes (targeting ARF4). While the images suggest that both TrueProbes and Stellaris outperform the other pipelines, the comparison is potentially misleading because the number of probes used differs substantially. I recommend that the authors include at least three independent examples in which an equal number of probes are designed across pipelines, so that signal-to-noise can be assessed in a controlled and comparable way. This would allow the probe number to be held constant while directly evaluating performance.
  
  This is an important observation. We have already addressed this issue in Figures 3E-G and Supplementary Figure 4E-G, where we plotted the number of OFF-targets for each ON-target probe. If we select longer genes to ensure an equal number of designed probes with strong signals, we will still end up with the same number of ON-target probes. Consequently, Figures 3B-D and 3E-G would show similar trends, albeit with different values on the y-axis. Additionally, we will conduct an analysis using Stellaris at its highest probe design stringency setting to compare the software under its strictest design conditions. Additional experiments are outside the scope of the current manuscript.
  
  (2) It is also unclear how many biological replicates were performed for the ARF4 experiments. If only a single replicate was included, it is difficult to conclude that TrueProbes consistently outperforms other pipelines in a robust and reproducible manner. I suggest the authors include data from at least three biological replicates with appropriate statistical analysis, and ideally extend this to additional smFISH targets as outlined in Comment 1.
  
  Three biological replicates were utilized for the ARF4 experiments. As stated in the original submission, the average data from all three replicates is presented in Figure 4, while the data for each individual replicate can be found in Figure S5. Statistical analyses were conducted for both the pooled data in Figure 4 and the individual data in Figure S5. The results of all statistical calculations are detailed in Supplemental Table 1. We will update the text to clearly indicate the number of biological replicates and the outcomes of the statistical analysis.
  
  (3) No controls are presented to demonstrate that the TrueProbes-designed smFISH spots are specifically detecting ARF4. The current experiment primarily measures signal-to-noise, but it remains possible that some detected spots do not correspond to ARF4 mRNAs. Since one of the major criteria used by TrueProbes is to limit cross-hybridization, the authors should perform ARF4 knockdown experiments and demonstrate that nearly all ARF4 smFISH signal is lost. A similar approach should be applied to the additional examples recommended in Comment 1.
  
  Thank you for your suggestion. Currently, we lack the expertise in our lab to conduct such experiments, so they are beyond the scope of this manuscript. However, we will create additional supplementary figures to demonstrate that the likelihood of false positives is low, based on the assumption that current publicly available BLAST algorithms, genome annotations, and reference transcription expression data are accurate.
  
  We will include a comparison in our supplementary materials showing the off-target RNA that can bind the highest number of probes simultaneously for each software. Additionally, we will perform a correlation analysis to illustrate the relationship between spot intensity for different software and the number of probes they design. This will help us estimate how the number of probes bound to RNA correlates with expected spot intensity ranges.
  
  Using this information, along with autofluorescence background intensity measurements from no-probe controls, we will estimate the minimum number of probes that need to bind to targets to be detected as single spots. If this minimum is higher than the maximum number of simultaneous off-target probe bindings, we anticipate that the detected spot signal will primarily reflect ARF4 rather than other transcripts.
  
  (4) In the limitations of the study, the authors note that "RNA secondary and tertiary structures are not included, which may lead to inaccuracies if binding sites are structurally occluded." However, I am not convinced that this is a true limitation, since formamide in the smFISH protocol should denature secondary structures and allow oligo access to the RNA. I recommend that the authors comment on this point and clarify whether secondary structure poses a practical limitation in smFISH probe design.
  
  Thank you for pointing this out. We will revise the manuscript to clarify: "We did not include RNA secondary and tertiary structures in the model because the use of formamide in RNA-FISH experiments denatures these structures, allowing oligonucleotides to access the RNA."
  
  (5) The authors also correctly acknowledge in their limitations that "RNA-protein interactions, which can modulate accessibility of the transcript, are not modeled." I suggest referencing relevant studies on this issue, particularly Buxbaum et al. (2014, Science), which would provide important context.
  
  Thank you for highlighting the literature that supports this limitation. We will include Buxbaum et al. (2014, Science) and additional studies that discuss how RNA-protein interactions can affect RNA-FISH experiments.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Hughes et al present a new single-molecule RNA fluorescence in situ hybridization (smFISH) probe design software, termed "TrueProbes" in this manuscript. They claim that all existing smFISH (and variants) probe design software packages have limitations that ultimately impact experimental performance. The author's claim to address the majority of these limitations in TrueProbes by introducing multiple computational steps to ensure high-quality probe design. The manuscript's goal is clear, and the authors provide some evidence by designing and targeting one gene. Overall, the manuscript lacks rigorous evidence to support the claims, does not demonstrate its suitability for a variety of smFISH-type experiments, and some of the provided quantification data are unclear. While TrueProbes clearly has potential, more data is required, or the authors should tone down the claims.
  
  We appreciate the reviewer’s thoughtful feedback. We will revise the text to ensure that all claims are backed by computational or experimental evidence. For claims that do not have supporting results, we will relocate them to the discussion section as potential future extensions. Since our probe design is open access, both we and the community can further develop our codes as needed.
  
  Strengths:
  
  (1) The problem is well-articulated in the abstract and the introduction.
  
  (2) Figures 3 and 4 follow a consistent color scheme where each probe design method has its own color, which helps the reader visually compare methods.
  
  (3) The authors compared multiple probe design software packages both computationally and experimentally.
  
  (4) TrueProbes does produce visually and quantitatively better results when compared to 2 of the 4 existing smFISH probe design packages (Paintshop and MERFISH panel designer).
  
  (5) The authors introduce a comprehensive steady-state thermodynamic model to help optimally guide probe design.
  
  We like to thank the reviewer for pointing out the strength of the manuscript.
  
  Weaknesses:
  
  (1) The abstract describes the problem well and introduces the solution (the TrueProbes software), but fails to provide specific ways in which the TrueProbes software performs better. The authors state that "...[TrueProbes] consistently outperformed alternatives across multiple computational metrics and experimental validation assays", but specific, quantitative evidence of improved performance would strengthen the statement.
  
  Thank you for acknowledging the clarity of the abstract and introduction. We will revise the abstract to provide more specific details on how TrueProbes outperforms other software. Additionally, we will include specific computational and experimental metrics that demonstrate TrueProbes' improved performance compared to other software.
  
  (2) The text claims that TrueProbes outperforms all other probe design software, but Figure 3 indicates that TrueProbes has neither the greatest number of on-target binding nor the lowest number of off-target binding. The data in Figure 3 does not support the claims made in the text. Specifically, the authors claim that "RNA FISH Experimental Results Demonstrate that Off Target and Binding Affinity Inclusive Probe Design Improve RNA FISH Signal Discrimination" (lines 217-218). However, despite their claim that Stellaris and Oligostan-HT produce more off-target probes when evaluated with the TrueProbes framework, the experiment results are nearly identical. The authors should consider modifying their claims or performing new experiments that more clearly demonstrate their claims.
  
  In Figure 3, we aim to convey two main points.
  
  The first point is to compare the number of ON-target probes designed by each software using their most stringent design criteria (Figure 3A). Currently, we are using a medium strict design criterion for Stellaris (level 3). As shown in the new supplementary figure XX, when we apply the most stringent design criteria for Stellaris (level 5), the number of ON-target probes decreases to XX probes. This clearly indicates that, based on theoretical calculations, TrueProbes can design more probes than any of its competitors.
  
  The second point is to compare the number of OFF-targets produced by each probe design. To illustrate this, we used two different metrics. In Figures 3B-D, we compare the total number of probes bound to OFF-target RNA. However, since each software generates a different number of ON-target probes, the number of OFF-targets may vary simply due to the differences in ON-target probe counts. Therefore, we introduced a second metric to compare OFF-targets. In Figures 3E-G, we present the number of OFF-targets normalized by the number of ON-targets. Using this metric, TrueProbes shows the lowest number of OFF-targets. We will updat the manuscript to clarify this point.
  
  Regarding the experiments and their comparison to theoretical calculations: The theoretical calculations consider only the reference DNA and RNA genomes along with the oligonucleotide sequences for the probes. We then use a thermodynamic model to identify ON- and OFF-targets. Thus, these theoretical calculations represent an upper bound on the maximum possible number of ON-targets and the minimum number of OFF-targets. All other design software evaluated in this manuscript relies on the same or less reference data and makes certain assumptions. None of these methods quantitatively compare their computational designs with experimental results; they simply design probes based on unverified assumptions, conduct experiments, and present spot data to conclude that their probe designs are effective.
  
  We will update the manuscript to clarify the goals of the theoretical model and its relationship to the experiments. Future work will be necessary to enhance our theoretical model to fully account for additional aspects of RNA-FISH experiments (e.g., formaldehyde crosslinking, hybridization conditions, washing steps) to better predict the experimental data shown in Figure 4. We will also adjuste our claims to accurately reflect the current capabilities of our theoretical framework and its relation to experimental outcomes.
  
  (3) The bar graphs in Figure 3 do not seem to agree with the probability graphs in Figure 4. For example, Figure 3 indicates that Stellaris probes have higher off-target binding than TrueProbes; however, in Figure 4, their probability graphs lie almost on top of each other.
  
  The predictions in Figure 3 regarding the number of probe off-target binding events, based on reference gene expression data, do not necessarily encompass all the information required to predict RNA-FISH signal intensity. Therefore, these predictions should not be expected to translate directly into the experimental results shown in Figure 4, particularly concerning the background signal.
  
  While our software aims to minimize off-target probe binding, this does not automatically lead to a reduction in off-target background signal. Numerous other factors influence the spot background and overall signal-to-noise ratio (SNR) performance, beyond just probe-target binding interactions. Although we strive to minimize off-target background through probe binding, this approach is not designed to directly predict the SNR. Extending the computational analysis of probe binding dynamics to RNA-FISH signal intensity dynamics is beyond the scope of this study.
  
  We have revised our text to clearly separate computational results from experimental results into two distinct sections. We will use different terminology to describe the outcomes of computational performance versus experimental performance, reducing potential confusion between these two aspects. Additionally, we will clarify our conceptual overview in Figure 1 regarding traditional probe design limitations related to sensitivity and specificity. We will specify how the signal from the number of probes bound to ON-target RNA, relative to those bound to OFF-targets and cellular autofluorescence, translates—either linearly or non-linearly—into the signal-to-noise ratio.
  
  (4) The authors performed validation for only one gene (ARF4), because "...it had the highest gene expression (in TPM units) and the fewest isoforms among all candidate genes for the Jurkat cell line" (lines 176-177). While the results do look good, this is a minimal use case and does not really showcase the power of their method. One experiment that could be helpful would be two-color (or more) smFISH in tissue, where the chances for off-target binding contributing to higher errors are much greater than in an adherent cell line.
  
  Thank you for highlighting these valuable experiments. Currently, our lab lacks the expertise to generate tissue samples beyond culturing cells. Additionally, implementing a two-color probe design in tissues containing different cell types with unknown expression levels presents further challenges. Due to these limitations, designing and conducting two-color experiments in tissue samples is beyond the scope of the current manuscript, but we plan to pursue this in the future.
  
  (5) A common strategy for both smFISH and highly multiplexed methods is to use secondary DNA oligos with dye molecules instead of direct conjugation. Given that this is a primary design goal of PaintSHOP and the Zhuang lab's MERFISH probe design code, it would be helpful to demonstrate that TrueProbes can design a two-layer probe strategy for high-quality RNA-FISH labeling.
  
  Thank you for bringing this to our attention. TrueProbes is currently designed and tested specifically for primary smRNA-FISH probes. Our focus is on demonstrating a new approach to designing these probes without the added complexities of secondary probes and multiplexing. Future work will expand on this foundation to incorporate secondary probe detection and transcript multiplexing.
  
  (6) The authors claim, "For every probe set, TrueProbes can simulate expected smRNA FISH outcomes including optimal probe, RNA, and salt concentrations and optionally account for probe secondary structure, hybridization temperature, multiple targets, fluorophore choice, DNA, nascent RNA, and photon count statistics (Figures S2A, S2B). The model can be used to generate predictions for temperature and cell line sensitivity, multi-target discrimination, multiple fluorophore colocalization; when provided transcript expression levels and probe/background intensity, it can start to generate predictions for spot intensity, background, signal to noise ratio, and false negative rates (Figure S2C)." (lines 156-163). Figure S2 is a flow chart and does not provide evidence for any of these items. The authors should provide evidence for these claims, either as a figure or an example script in their software repository. If that is not possible, then it should be removed.
  
  The supplemental information of the article will be updated to include figures that illustrate predictions for each capability currently offered by TrueProbes, along with the scripts used to generate these predictions. Any capabilities that do not have corresponding scripts will be removed from this section and instead referred to as potential improvements or future additions to the TrueProbes framework in the discussion section.
  
  (7) All thermodynamic equations are performed at steady state. The authors do not justify this assumption, and there is no discussion of the potential impacts of either low molecule numbers or violations of the well-mixed assumption. Can the authors please include a discussion on the potential impacts non non-steady state dynamics?
  
  Thermodynamic equations are calculated at steady state because RNA-FISH hybridization reactions typically last from eight to twenty hours. This duration allows probes adequate time to localize to their targets and reach binding equilibrium, based on current estimates of DNA oligonucleotide association and dissociation rate constants. We will address the potential violation of the well-mixed assumption in the assumptions and limitations section, specifically discussing how RNA localization can affect the spatial distribution of both on-target and off-target probes within cells, which may disrupt the well-mixed condition.
  
  Low molecule numbers are not a significant concern, as probe DNA oligonucleotide concentrations in RNA-FISH protocols are much higher than the number of transcripts present in cells, by several orders of magnitude.
  
  The assumptions and limitations section will be revised to clearly state: “Probe hybridization reactions were computed at steady state because most RNA-FISH protocols utilize probe hybridization incubation steps lasting over eight hours, which should provide sufficient time to reach equilibrium based on current estimates of forward and reverse reaction rate constants. Predictions from the equilibrium model may be less accurate for RNA-FISH experiments with shorter hybridization times, where non-steady state dynamics can result in different transient outcomes depending on the duration of hybridization.”
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This manuscript introduces a new platform termed "TrueProbes" for designing mRNA FISH probes. In comparison to existing design strategies, the authors incorporate a comprehensive thermodynamic and kinetic model to account for probe states that may contribute to nonspecific background. The authors validate their design pipeline using Jurkat cells and provide evidence of improved probe performance.
  
  Strengths:
  
  A notable strength of TrueProbes is the consideration of genome-wide binding affinities, which aims to minimize off-target signals. The work will be of interest to researchers employing mRNA FISH in certain human cell lines.
  
  Weaknesses:
  
  However, in my view, the experimental validation is not sufficient to justify the broad claims of the platform. Given the number of assumptions in the model, additional experimental comparisons across probe design methods, ideally targeting transcripts with different expression levels, would be necessary to establish the general superiority of this approach.
  
  We will revise our text to make our claims more specific and clearer, avoiding overgeneralizations and ensuring that all claims are adequately supported by the data we present.
  
  AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.14.670355v1
www.biorxiv.org www.biorxiv.org

Backward Conditioning Reveals Flexibility in Infralimbic Cortex Inhibitory Memories

5
1. Public_Reviews 09 Oct 2025
 
 in eLife
 
 eLife Assessment
 
 This set of experiments provides a valuable finding regarding the need for prior inhibitory training to recruit the infralimbic cortex in extinction learning. The multiple clever behavioral designs supply converging lines of evidence in a compelling manner, but several issues, such as the group sizes and appropriate analysis of data, render the overall strength of support incomplete. With these issues resolved, this manuscript will be of interest to behavioral neuroscientists, especially those interested in learning & memory and/or cortical function.
 
 Summary
2. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The manuscript reports a series of experiments designed to test whether optogenetic activation of infralimbic (IL) neurons facilitates extinction retrieval and whether this depends on animals' prior experience. In Experiment 1, rats underwent fear conditioning followed by either one or two extinction sessions, with IL stimulation given during the second extinction; stimulation facilitated extinction retrieval only in rats with prior extinction experience. Experiments 2 and 3 examined whether backward conditioning (CS presented after the US) could establish inhibitory properties that allowed IL stimulation to enhance extinction, and whether this effect was specific to the same stimulus or generalized to different stimuli. Experiments 5 - 7 extended this approach to appetitive learning: rats received backward or forward appetitive conditioning followed by extinction, and then fear conditioning, to determine whether IL stimulation could enhance extinction in contexts beyond aversive learning and across conditioning sequences. Across studies, the key claim is that IL activation facilitates extinction retrieval only when animals possess a prior inhibitory memory, and that this effect generalizes across aversive and appetitive paradigms.
 
 Strengths:
 
 (1) The design attempts to dissect the role of IL activity as a function of prior learning, which is conceptually valuable.
 
 (2) The experimental design of probing different inhibitory learning approaches to probe how IL activation facilitates extinction learning was creative and innovative.
 
 Weaknesses:
 
 (1) Non-specific manipulation.
 
 ChR2 was expressed in IL without distinction between glutamatergic and GABAergic populations. Without knowing the relative contribution of these cell types or the percentage of neurons affected, the circuit-level interpretation of the results is unclear.
 
 (2) Extinction retrieval test conflates processes
 
 The retrieval test included 8 tones. Averaging across this many tone presentations conflate extinction retrieval/expression (early tones) with further extinction learning (later tones). A more appropriate analysis would focus on the first 2-4 tones to capture retrieval only. As currently presented, the data do not isolate extinction retrieval.
 
 (3) Under-sampling and poor group matching.
 
 Sample sizes appear small, which may explain why groups are not well matched in several figures (e.g., 2b, 3b, 6b, 6c) and why there are several instances of unexpected interactions (protocol, virus, and period). This baseline mismatch raises concerns about the reliability of group differences.
 
 (4) Incomplete presentation of conditioning data.
 
 Figure 3 only shows a single conditioning session despite five days of training. Without the full dataset, it is difficult to evaluate learning dynamics or whether groups were equivalent before testing.
 
 (5) Interpretation stronger than evidence.
 
 The authors conclude that IL activation facilitates extinction retrieval only when an inhibitory memory has been formed. However, given the caveats above, the data are insufficient to support such a strong mechanistic claim. The results could reflect non-specific facilitation or disruption of behavior by broad prefrontal activation. Moreover, there is compelling evidence that optogenetic activation of IL during fear extinction does facilitate subsequent extinction retrieval without prior extinction training (Do-Monte et al 2015, Chen et al 2021), which the authors do not directly test in this study.
 
 Impact:
 
 The role of IL in extinction retrieval remains a central question in the fear learning literature. However, because the test used conflates extinction retrieval with new learning and the manipulations lack cell-type specificity, the evidence presented here does not convincingly support the main claims. The study highlights the need for more precise manipulations and more rigorous behavioral testing to resolve this issue.
 
 Review 1
3. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning, as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning, and its involvement is independent of the motivational value associated with the unconditioned stimulus.
 
 Strengths:
 
 (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.
 
 (2) Very clear representation of groups and experimental design for each figure.
 
 (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.
 
 (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.
 
 Weaknesses:
 
 (1) In Experiment 1, although not statistically significant, it does appear as though the stimulation groups (OFF and ON) differ during Extinction 1. It seems like this may be due to a difference between these groups after the first forward conditioning. Could the authors have prevented this potential group difference in Extinction 1 by re-balancing group assignment after the first forward conditioning session to minimize the differences in fear acquisition (the authors do report a marginally significant effect between the groups that would undergo one vs. two extinction sessions in their freezing during the first conditioning session)?
 
 (2) Across all experiments (except for Experiment 1), the authors state that freezing during the initial conditioning increased across "days". The figures that correspond to this text, however, show that freezing changes across trials. In the methods, the authors report that backward conditioning occurred over 5 days. It would be helpful to understand how these data were analyzed and collated to create the final figures. Was the freezing averaged across the five days for each trial for analyses and figures?
 
 (3) In Experiment 3, the authors report a significant Protocol X Virus interaction. It would be useful if the authors could conduct post-hoc analyses to determine the source of this interaction. Inspection of Figure 4B suggests that freezing during the two different variants of backward conditioning differs between the virus groups. Did the authors expect to see a difference in backward conditioning depending on the stimulus used in the conditioning procedure (light vs. tone)? The authors don't really address this confounding interaction, but I do think a discussion is warranted.
 
 (4) In this same experiment, the authors state that freezing decreased during extinction; however, freezing in the Diff-EYFP group at the start of extinction (first bin of trials) doesn't look appreciably different than their freezing at the end of the session. Did this group actually extinguish their fear? Freezing on the tone test day also does not look too different from freezing during the last block of extinction trials.
 
 (5) The Discussion explored the outcomes of the experiments in detail, but it would be useful for the authors to discuss the implications of their findings for our understanding of circuits in which the IL is embedded that are involved in inhibitory learning and memory. It would also be useful for the authors to acknowledge in the Discussion that although they did not have the statistical power to detect sex differences, future work is needed to explore whether IL functions similarly in both sexes.
 
 Review 2
4. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition.
 
 Strengths:
 
 The experimental designs are very rigorous with an unusual level of behavioral sophistication.
 
 Weaknesses:
 
 (1) More justification for parametric choices (number of days of backwards vs forwards conditioning) could be provided.
 
 (2) The current discussion could be condensed and could focus on broader implications for the literature.
 
 Review 3
5. Public_Reviews 09 Oct 2025
 
 in eLife
 
 Author response:
 
 Reviewer #1 (Public review):
 
 Summary:
 
 The manuscript reports a series of experiments designed to test whether optogenetic activation of infralimbic (IL) neurons facilitates extinction retrieval and whether this depends on animals' prior experience. In Experiment 1, rats underwent fear conditioning followed by either one or two extinction sessions, with IL stimulation given during the second extinction; stimulation facilitated extinction retrieval only in rats with prior extinction experience. Experiments 2 and 3 examined whether backward conditioning (CS presented after the US) could establish inhibitory properties that allowed IL stimulation to enhance extinction, and whether this effect was specific to the same stimulus or generalized to different stimuli. Experiments 5 - 7 extended this approach to appetitive learning: rats received backward or forward appetitive conditioning followed by extinction, and then fear conditioning, to determine whether IL stimulation could enhance extinction in contexts beyond aversive learning and across conditioning sequences. Across studies, the key claim is that IL activation facilitates extinction retrieval only when animals possess a prior inhibitory memory, and that this effect generalizes across aversive and appetitive paradigms.
 
 Strengths:
 
 (1) The design attempts to dissect the role of IL activity as a function of prior learning, which is conceptually valuable.
 
 We thank the Reviewer for their positive assessment.
 
 (2) The experimental design of probing different inhibitory learning approaches to probe how IL activation facilitates extinction learning was creative and innovative.
 
 We thank the Reviewer for their positive assessment.
 
 Weaknesses:
 
 (1) Non-specific manipulation.
 
 ChR2 was expressed in IL without distinction between glutamatergic and GABAergic populations. Without knowing the relative contribution of these cell types or the percentage of neurons affected, the circuit-level interpretation of the results is unclear.
 
 ChR2 was intentionally expressed in the infralimbic cortex (IL) without distinction between local neuronal populations for two reasons. First, this manuscript aimed to uncover some of the features characterizing the encoding of inhibitory memories in the IL, and this encoding likely engages interactions among various neuronal populations within the IL. Second, the hypotheses tested in the manuscript derived from findings that indiscriminately stimulated the IL using the GABAA receptor antagonist picrotoxin, which is best mimicked by the approach taken. We agree that it is also important to determine the respective contributions of distinct IL neuronal populations to inhibitory encoding; however, the global approach implemented in the present experiments represents a necessary initial step. This rationale will be incorporated into the revised manuscript, which will also make reference to the need to identify the relative contributions of the various neuronal populations within the IL.
 
 (2) Extinction retrieval test conflates processes
 
 The retrieval test included 8 tones. Averaging across this many tone presentations conflate extinction retrieval/expression (early tones) with further extinction learning (later tones). A more appropriate analysis would focus on the first 2-4 tones to capture retrieval only. As currently presented, the data do not isolate extinction retrieval.
 
 It is unclear when retrieval of what has been learned across extinction ceases and additional extinction learning occurs. In fact, it is only the first stimulus presentation that unequivocally permits a distinction between retrieval and additional extinction learning, as the conditions for this additional learning have not been fulfilled at that presentation. However, confining evidence for retrieval to the first stimulus presentation introduces concerns that other factors could influence performance. For instance, processing of the stimulus present at the start of the session may differ from that present at the end of the previous session, thereby affecting what is retrieved. Such differences between the stimuli present at the start and end of an extinction session have been long recognized as a potential explanation for spontaneous recovery (Estes, 1955). More importantly, whether the test data presented confound retrieval and additional extinction learning or not, the interpretation remains the same with respect to the effects of a prior history of inhibitory learning on enabling the facilitative effects of IL stimulation. Finally, it is unclear how these facilitative effects could occur in the absence of the subjects retrieving the extinction memory formed under the stimulation. Nevertheless, the revised manuscript will provide the trial-by-trial performance during the post-extinction retrieval tests and discuss this issue.
 
 (3) Under-sampling and poor group matching.
 
 Sample sizes appear small, which may explain why groups are not well matched in several figures (e.g., 2b, 3b, 6b, 6c) and why there are several instances of unexpected interactions (protocol, virus, and period). This baseline mismatch raises concerns about the reliability of group differences.
 
 Efforts were made to match group performance upon completion of each training stage and before IL stimulation. Unfortunately, these efforts were not completely successful due to exclusions following post-mortem analyses. However, we acknowledge that the unexpected interactions deserve further discussion, and this will be incorporated into the revised manuscript (see also comment from Reviewer 2). Although we cannot exclude that sample sizes may have contributed to some of these interactions, we remain confident about the reliability of the main findings reported, especially given their replication across the various protocols. Overall, the manuscript provides evidence that IL stimulation does not facilitate brief extinction in the absence of prior inhibitory experience in five different experiments, replicating previous findings (Lingawi et al., 2018; Lingawi et al., 2017). It also replicates these previous findings by showing that prior experience with either fear or appetitive extinction enables IL stimulation to facilitate subsequent fear extinction. Furthermore, the facilitative effects of such stimulation following fear or appetitive backward conditioning are replicated in the present manuscript.
 
 (4) Incomplete presentation of conditioning data.
 
 Figure 3 only shows a single conditioning session despite five days of training. Without the full dataset, it is difficult to evaluate learning dynamics or whether groups were equivalent before testing.
 
 We apologize, as we incorrectly labeled the X axis for the backward conditioning data set in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. This error will be corrected in the revised manuscript.
 
 (5) Interpretation stronger than evidence.
 
 The authors conclude that IL activation facilitates extinction retrieval only when an inhibitory memory has been formed. However, given the caveats above, the data are insufficient to support such a strong mechanistic claim. The results could reflect non-specific facilitation or disruption of behavior by broad prefrontal activation. Moreover, there is compelling evidence that optogenetic activation of IL during fear extinction does facilitate subsequent extinction retrieval without prior extinction training (Do-Monte et al 2015, Chen et al 2021), which the authors do not directly test in this study.
 
 As noted above, the revised manuscript will show that the interpretations of the main findings stand whether ore the test data confounds retrieval with additional extinction learning. The revised manuscript will also clarify the plotting of the data for the backward conditioning stages. We do agree that further discussion of the unexpected interactions is necessary, and this will also be incorporated into the revised manuscript. However, the various replications of the core findings provide strong evidence for their reliability and the interpretations advanced in the original manuscript. The proposal that the results reflect non-specific facilitation or disruption of behavior seems highly unlikely. Indeed, the present experiments and previous findings (Lingawi et al., 2018; Lingawi et al., 2017) provide multiple demonstrations that IL stimulation fails to produce any facilitation in the absence of prior inhibitory experience with the target stimulus. Although these demonstrations appear inconsistent with previous studies (Do-Monte et al., 2015; Chen et al., 2021), this inconsistency is likely explained by the fact that these studies manipulated activity in specific IL neuronal populations. Previous work has already revealed differences between manipulations targeting discrete IL neuronal populations as opposed to general IL activity (Kim et al., 2016). Importantly, as previously noted, the present manuscript aimed to generally explore inhibitory encoding in the IL that, as we will acknowledge, is likely to engage several neuronal populations within the IL. Adequate statements on these matters will be included in the revised manuscript.
 
 Impact:
 
 The role of IL in extinction retrieval remains a central question in the fear learning literature. However, because the test used conflates extinction retrieval with new learning and the manipulations lack cell-type specificity, the evidence presented here does not convincingly support the main claims. The study highlights the need for more precise manipulations and more rigorous behavioral testing to resolve this issue.
 
 As noted in our responses, the interpretations of the data presented remain identical whether the test data conflate extinction retrieval with additional extinction learning or not. Although we agree that it is important to establish the role of specific IL neuronal populations in extinction learning, this was beyond the scope of the manuscript and the findings reported remain valuable to our understanding of inhibitory encoding within the IL.
 
 Reviewer #2 (Public review):
 
 Summary:
 
 In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning, as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning, and its involvement is independent of the motivational value associated with the unconditioned stimulus.
 
 Strengths:
 
 (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.
 
 We thank the Reviewer for their positive assessment.
 
 (2) Very clear representation of groups and experimental design for each figure.
 
 We thank the Reviewer for their positive assessment.
 
 (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.
 
 We thank the Reviewer for their positive assessment.
 
 (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.
 
 We thank the Reviewer for their positive assessment.
 
 Weaknesses:
 
 (1) In Experiment 1, although not statistically significant, it does appear as though the stimulation groups (OFF and ON) differ during Extinction 1. It seems like this may be due to a difference between these groups after the first forward conditioning. Could the authors have prevented this potential group difference in Extinction 1 by re-balancing group assignment after the first forward conditioning session to minimize the differences in fear acquisition (the authors do report a marginally significant effect between the groups that would undergo one vs. two extinction sessions in their freezing during the first conditioning session)?
 
 As noted (see response to Reviewer 1), efforts were made daily to match group performance across the training stages, but these efforts were ultimately hampered by the necessary exclusions following post-mortem analyses. This will be made explicit in the revised manuscript. Regarding freezing during Extinction 1, as noted by the Reviewer, the difference, which was not statistically significant, was absent across trials during the subsequent forward fear conditioning stage. Likewise, the protocol difference observed during the initial forward fear conditioning was absent in subsequent stages. We are therefore confident that these initial differences (significant or not) did not impact the main findings at test. Importantly, these findings replicate previous work using identical protocols in which no differences were present during the training stages. These considerations will be addressed in the revised manuscript.
 
 (2) Across all experiments (except for Experiment 1), the authors state that freezing during the initial conditioning increased across "days". The figures that correspond to this text, however, show that freezing changes across trials. In the methods, the authors report that backward conditioning occurred over 5 days. It would be helpful to understand how these data were analyzed and collated to create the final figures. Was the freezing averaged across the five days for each trial for analyses and figures?
 
 We apologize, as noted above, we incorrectly labeled the X axis for the backward conditioning data sets in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. The data shown in these Figures use the average of all trials on a given day. This will be clarified in the methods section of the revised manuscript. The labeling errors on the Figures will be corrected.
 
 (3) In Experiment 3, the authors report a significant Protocol X Virus interaction. It would be useful if the authors could conduct post-hoc analyses to determine the source of this interaction. Inspection of Figure 4B suggests that freezing during the two different variants of backward conditioning differs between the virus groups. Did the authors expect to see a difference in backward conditioning depending on the stimulus used in the conditioning procedure (light vs. tone)? The authors don't really address this confounding interaction, but I do think a discussion is warranted.
 
 We agree with the Reviewer that further discussion of the Protocol x Virus interaction that emerged during the backward conditioning and forward conditioning stages of Experiment 3 is warranted. This will be provided in the revised manuscript. Briefly, during both stages, follow-up analyses did not reveal any differences (main effects or interactions) between the two groups trained with the light stimulus (Diff-EYFP and Diff-ChR2). By contrast, the ChR2 group trained with the tone (Back-ChR2) froze more overall than the EYFP group (Back-EYFP), but there were no other significant differences between the two groups. Based on these analyses, the Protocol x Virus interaction appears to be driven by greater freezing in the ChR2 group trained with the tone rather than a difference in the backward conditioning performance based on stimulus identity. Consistent with this, the statistical analyses did not reveal a main effect of Protocol during either the backward conditioning stage or the stimulus trials during the forward conditioning stage. Nevertheless, during this latter stage, a main effect of Protocol emerged during baseline performance, but once again, this seems to be driven by the Back-ChR2 group. Critically, it is unclear how greater stimulus freezing in the Back-ChR2 group during forward conditioning would lead to lower freezing during the post-extinction retrieval test.
 
 (4) In this same experiment, the authors state that freezing decreased during extinction; however, freezing in the Diff-EYFP group at the start of extinction (first bin of trials) doesn't look appreciably different than their freezing at the end of the session. Did this group actually extinguish their fear? Freezing on the tone test day also does not look too different from freezing during the last block of extinction trials.
 
 We confirm that overall, there was a significant decline in freezing across the extinction session shown in Figure 4B. The Reviewer is correct to point out that this decline was modest (if not negligible) in the Diff-EYFP group, which was receiving its first inhibitory training with the target tone stimulus. It is worth noting that across all experiments, most groups that did not receive infralimbic stimulation displayed a modest decline in freezing during the extinction session since it was relatively brief, involving only 6 or 8 tone alone presentations. This was intentional, as we aimed for the brief extinction session to generate minimal inhibitory learning and thereby to detect any facilitatory effect of infralimbic stimulation. This issue will be clarified and explained in the revised version of the manuscript.
 
 (5) The Discussion explored the outcomes of the experiments in detail, but it would be useful for the authors to discuss the implications of their findings for our understanding of circuits in which the IL is embedded that are involved in inhibitory learning and memory. It would also be useful for the authors to acknowledge in the Discussion that although they did not have the statistical power to detect sex differences, future work is needed to explore whether IL functions similarly in both sexes.
 
 In line with the Reviewer’s suggestion (see also Reviewer 3), the revised manuscript will include a discussion of the broader implications of the findings regarding inhibitory brain circuitry and will acknowledge the need to further explore sex differences and IL functions.
 
 Reviewer #3 (Public review):
 
 Summary:
 
 This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition.
 
 Strengths:
 
 The experimental designs are very rigorous with an unusual level of behavioral sophistication.
 
 We thank the Reviewer for their positive assessment.
 
 Weaknesses:
 
 (1) More justification for parametric choices (number of days of backwards vs forwards conditioning) could be provided.
 
 All experimental parameters were based on previously published experiments showing the capacity of the backward conditioning protocols to generate inhibitory learning and the forward conditioning protocols to produce excitatory learning. Although this was mentioned in the methods section, we acknowledge that further explanation is required to justify the need for multiple days of backward training. This will be provided in the revised manuscript.
 
 (2) The current discussion could be condensed and could focus on broader implications for the literature.
 
 The revised manuscript will make an effort to condense the discussion and focus on broader implications for the literature.
 
 References
 
 Chen, Y.-H., Wu, J.-L., Hu, N.-Y., Zhuang, J.-P., Li, W.-P., Zhang, S.-R., Li, X.-W., Yang, J.-M., & Gao, T.-M. (2021). Distinct projections from the infralimbic cortex exert opposing effects in modulating anxiety and fear. J Clin Invest, 131(14), e145692. https://doi.org/10.1172/JCI145692
 
 Do-Monte, F. H., Manzano-Nieves, G., Quiñones-Laracuente, K., Ramos-Medina, L., & Quirk, G. J. (2015). Revisiting the role of infralimbic cortex in fear extinction with optogenetics. J Neurosci, 35(8), 3607-3615. https://doi.org/10.1523/JNEUROSCI.3137-14.2015
 
 Estes, W. K. (1955). Statistical theory of spontaneous recovery and regression. Psychol Rev, 62(3), 145-154. https://doi.org/10.1037/h0048509
 
 Kim, H.-S., Cho, H.-Y., Augustine, G. J., & Han, J.-H. (2016). Selective Control of Fear Expression by Optogenetic Manipulation of Infralimbic Cortex after Extinction. Neuropsychopharmacology, 41(5), 1261-1273. https://doi.org/10.1038/npp.2015.276
 
 Lingawi, N. W., Holmes, N. M., Westbrook, R. F., & Laurent, V. (2018). The infralimbic cortex encodes inhibition irrespective of motivational significance. Neurobiol Learn Mem, 150, 64-74. https://doi.org/10.1016/j.nlm.2018.03.001
 
 Lingawi, N. W., Westbrook, R. F., & Laurent, V. (2017). Extinction and Latent Inhibition Involve a Similar Form of Inhibitory Learning that is Stored in and Retrieved from the Infralimbic Cortex. Cereb Cortex, 27(12), 5547-5556. https://doi.org/10.1093/cercor/bhw322
 
 AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Summary

Review 1

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.02.668258v1
www.biorxiv.org www.biorxiv.org

Enhancer-AAVs allow genetic access to oligodendrocytes and diverse populations of astrocytes across species

3
1. Public_Reviews 09 Oct 2025
  
  in eLife
  
  eLife Assessment
  
  This important study presents convincing findings on creating an exhaustive library of new enhancer-AAVs targeting astrocytes and oligodendrocytes with high potential for both basic and translational work, which will be of value to a large and growing community. However, the outdated description of glial biology in the Introduction, the overstated claims of utility in the Conclusion, and the loose stringency in the criteria used to assemble the library diminish the strengths of the claims. The work will be of interest to neuroscientists working on glial cell biology.
  
  Summary
2. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #1 (Public review):
  
  The goal of this study was to generate a library of new enhancer-driven AAVs in order to selectively and efficiently target astrocytes and oligodendrocytes in rodents. The implied criteria are that such viral vectors should have high specificity for the intended cell type and effectively express in all astrocytes/oligos in the brain or, alternatively, be specific for defined brain regions, layers, or subtypes of astrocytes/oligos. In addition, they should be compatible with intravenous retro-orbital delivery to facilitate experimentation and brain-wide targeting (i.e., show organ specificity and high efficiency in the brain). Ideally, these new AAVs would also maintain their characteristics across disease contexts and show applicability in non-human primates. Tools with such characteristics are generally lacking in studying glial cells and would be extremely useful to scale up and accelerate glial research, allowing targeting of astrocytes/oligos with distinct molecular identity and intersectional strategies.
  
  At present, however, none of the enhancer-AAVs presented in the study seems to meet this combination of criteria, at least not with the level of stringency typically expected in the field. The main reason is that, in its current form, the study does not present one candidate AAV iteratively improved to meet all these criteria; instead, it presents a catalogue of new AAVs with various degrees of specificity, completeness, and mixed characteristics. Therefore, their utility should be interpreted cautiously. Moreover, the way specificity and completeness are intermixed in the analysis makes it difficult to evaluate the actual utility of any given AAV. The study might have been strengthened by focusing on a small set of the most promising candidates (i.e., AiE0890m_3x2C) and validating them thoroughly for expression specificity, completeness, effective cargo expression, ability to allow specific pan-astrocyte or astrocyte-subtype targeting in vivo, and preserved properties in NHPs and in disease, as this would encourage their adoption by the community. Currently, too many AAVs are assessed inconsistently against the desired criteria, with none being evaluated through and through.
  
  The impact of the catalogue is also greatly diminished by the fact that a suite of AAVs with outstanding specificity and efficiency is already available for the study of astrocytes (e.g., 4x6T AAVs) and was not utilized as a standard to benchmark the new library, making it difficult to appreciate the relative benefits of the new AAVs. The inclusion of expression data in NHPs is very significant, but benchmarking against established AAVs would also be needed to fully appreciate their value.
  
  Importantly, readers should also be aware that the study seems noticeably limited in its literacy with glial biology. The introduction and discussion frame the field in a way that seems outdated, creating the impression that the diverse roles of glia in health and disease have not yet been studied, which may inadvertently be perceived as dismissive and stigmatizing.
  
  In summary, the paper introduces potentially useful viral tools and lays the foundations for future multiplexed targeting of distinct glial cell subpopulations in rodents and in non-human primates, which are extremely important directions. Some of the regionally restricted or even sparsely expressed AAVs may prove valuable in enabling subpopulation-specific targeting or molecular profiling strategies, but currently lack full benchmarking. At present, the promises over the utility of the new tools seem overstated, and the library may not yet represent an actionable resource for targeting astrocytes and oligodendrocytes.
  
  Review 1
3. Public_Reviews 09 Oct 2025
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Enhancer elements are regulatory DNA sequences that are capable of driving specific expression patterns. As these elements are generally short and context-independent, enhancers can be used in expression vectors (e.g., packaged in an adeno-associated virus, AAV) to limit expression to target cell populations. This approach was identified as a major strategy for cell-type-specific manipulation in the brain and has been pursued by both standard research studies as well as large-scale efforts led by the BRAIN Initiative. This manuscript describes a major effort to generate enhancer-AAVs targeting astrocytes and oligodendrocytes orchestrated by a large research team led by the Allen Institute for Brain Science. This manuscript parallels other recent publications describing sets of enhancer-AAVs, following rigorous, similar methods with relatively broad testing and application.
  
  To identify and screen candidate enhancers, the scientists prioritized candidates via analysis of single-nucleus accessibility and methylation datasets (i.e., snATAC-seq) and tested them in mice. The scientists prioritized candidate enhancers that exhibited specificity of accessibility in the target cell type. Following selection, the scientists cloned the candidate sequences into AAV vectors with a minimal promoter and reporter gene, packaged the virus, delivered it to the mouse brain, and screened for activity based on reporter expression. Candidates that passed initial screening were further characterized via imaging and sorting, followed by single-cell RNA-seq. This process had around a 50% success rate and yielded 25 astrocyte and 21 oligodendrocyte enhancer-AAVs with the targeted cell-type-specific expression patterns.
  
  The scientists went on to test for subtype-specific activity patterns, finding wide diversity in astrocyte activities across sub-populations and conversely, homogenous oligodendrocyte activation. They optimized a few of these via concatenating the enhancer core sequence to increase expression levels of the reporter gene and showed strong specificity and completeness of cell targeting for a set of these enhancer-AAVs. Following characterization and validation, they then deployed these enhancer-AAVs in a number of demonstration applications to show the utility for basic and translational science. All the constructs developed here are available for public use via Addgene, ensuring that these new tools can be used by other researchers.
  
  There really are no obvious weaknesses in the work presented here, from the generation of the enhancer-AAVs to use in sophisticated validation studies. The enhancer-AAV testing is rigorous and provides critical information necessary for other scientists to select and use these constructs. The applications demonstrate the power of enhancer-AAV approaches. The toolbox presented here may not enable specific targeting of all relevant cellular subtypes or activity states for astrocytes and oligodendrocytes, and future work will be needed to fully understand the activity of the enhancers, identity of the target cell types, and context-dependent utility of these constructs. However, the set of enhancer-AAVs developed here should be transformative for researchers working on accessing and manipulating these cell types and have a major impact on the field.
  
  Review 2
Visit annotations in context

Tags

Summary

Review 2

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.20.558718v2

Public_Reviews

Annotations: 10,000

Joined: March 17, 2021

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators