10,000 Matching Annotations
  1. Jun 2025
    1. Author response:

      Reviewer #1 (Public review):

      Functional lateralization between the right and left hemispheres is reported widely in animal taxa, including humans. However, it remains largely speculative as to whether the lateralized brains have a cognitive gain or a sort of fitness advantage. In the present study, by making use of the advantages of domestic chicks as a model, the authors are successful in revealing that the lateralized brain is advantageous in the number sense, in which numerosity is associated with spatial arrangements of items. Behavioral evidence is strong enough to support their arguments. Brain lateralization was manipulated by light exposure during the terminal phase of incubation, and the left-to-right numerical representation appeared when the distance between items gave a reliable spatial cue. The light-exposure induced lateralization, though quite unique in avian species, together with the lack of intense inter-hemispheric direct connections (such as the corpus callosum in the mammalian cerebrum), was critical for the successful analysis in this study. Specification of the responsible neural substrates in the presumed right hemisphere is expected in future research. Comparable experimental manipulation in the mammalian brain must be developed to address this general question (functional significance of brain laterality) is also expected.

      We sincerely appreciate the Reviewer's insightful feedback and his/her recognition of the key contributions of our study.

      Reviewer #2 (Public review):

      Summary:

      This is the first study to show how a L-R bias in the relationship between numerical magnitude and space depends on brain lateralisation, and moreover, how is modulated by in ovo conditions.

      Strengths:

      Novel methodology for investigating the innateness and neural basis of an L-R bias in the relationship between number and space.

      We would like to thank the Reviewer for their valuable feedback and for highlighting the key contributions of our study.

      Weaknesses:

      I would query the way the experiment was contextualised. They ask whether culture or innate pre-wiring determines the 'left-to-right orientation of the MNL [mental number line]'.

      We thank the Reviewer for raising this point, which has allowed us to provide a more detailed explanation of this aspect. Rather than framing the left-to-right orientation of the mental number line (MNL) as exclusively determined by either cultural influences or innate pre-wiring, our study highlights the role of environmental stimulation. Specifically, prenatal light exposure can shape hemispheric specialization, which in turn contributes to spatial biases in numerical processing. Please see lines 115-118.

      The term, 'Mental Number Line' is an inference from experimental tasks. One of the first experimental demonstrations of a preference or bias for small numbers in the left of space and larger numbers in the right of space, was more carefully described as the spatialnumerical association of response codes - the SNARC effect (Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396).

      We have refined our description of the MNL and SNARC effect to ensure conceptual accuracy in the revised manuscript; please see lines 53-59.

      This has meant that the background to the study is confusing. First, the authors note, correctly, that many other creatures, including insects, can show this bias, though in none of these has neural lateralisation been shown to be a cause. Second, their clever experiment shows that an experimental manipulation creates the bias. If it were innate and common to other species, the experimental manipulation shouldn't matter. There would always be an LR bias. Third, they seem to be asserting that humans have a left-to-right (L-R) MNL. This is highly contentious, and in some studies, reading direction affects it, as the original study by Dehaene et al showed; and in others, task affects direction (e.g. Bachtold, D., Baumüller, M., & Brugger, P. (1998). Stimulus-response compatibility in representational space. Neuropsychologia, 36, 731-735, not cited). Moreover, a very careful study of adult humans, found no L-R bias (Karolis, V., Iuculano, T., & Butterworth, B. (2011), not cited, Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706). Indeed, Rugani et al claim, incorrectly, that the L-R bias was first reported by Galton in 1880. There are two errors here: first, Galton was reporting what he called 'visualised numerals', which are typically referred to now as 'number forms' - spontaneous and habitual conscious visual representations - not an inference from a number line task. Second, Galton reported right-to-left, circular, and vertical visualised numerals, and no simple left-to-right examples (Galton, F. (1880). Visualised numerals. Nature, 21, 252-256.). So in fact did Bertillon, J. (1880). De la vision des nombres. La Nature, 378, 196-198, and more recently Seron, X., Pesenti, M., Noël, M.-P., Deloche, G., & Cornet, J.-A. (1992). Images of numbers, or "When 98 is upper left and 6 sky blue". Cognition, 44, 159-196, and Tang, J., Ward, J., & Butterworth, B. (2008). Number forms in the brain. Journal of Cognitive Neuroscience, 20(9), 1547-1556.

      We sincerely appreciate the opportunity to discuss numerical spatialization in greater detail. We have clarified that an innate predisposition to spatialize numerosity does not necessarily exclude the influence of environmental stimulation and experience. We have proposed an integrative perspective, incorporating both cultural and innate factors, suggesting that numerical spatialization originates from neural foundations while remaining flexible and modifiable by experience and contextual influences. Please see lines 69–75.

      We have incorporated the Reviewer’s suggestions and cited all the recommended papers; please see lines 47–75.

      If the authors are committed to chicks' MN Line they should test a series of numbers showing that the bias to the left is greater for 2 and 3 than for 4, etc. 

      What does all this mean? I think that the paper should be shorn of its misleading contextualisation, including the term 'Mental Number Line'. The authors also speculate, usefully, on why chicks and other species might have a L-R bias. I don't think the speculations are convincing, but at least if there is an evolutionary basis for the bias, it should at least be discussed.

      In the revised version of the manuscript, we have resorted to adopt the Spatial Numerical Association (SNA). We thank the Reviewer for this valuable comment.

      We appreciated the Reviewer’s suggestion regarding the evolutionary basis of lateralization and have included considerations of its relevance in chicks and other species; please see lines 143-151 and 381-386.

      This paper is very interesting with its focus on why the L-R bias exists, and where and why it does not.

      We wish to thank the Reviewer again for his/her work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) In several instances the paper does not address apparent inconsistencies between the prior literature and the findings. For example, the first main finding is that recalled items have more differentiated lateral temporal cortex representations within lists than not recalled items. This seems to be the opposite of the prediction from temporal context models that are used to motivate the paper-context models would predict that greater contextual similarity within a list should lead to greater memory through enhanced temporal clustering in recall. This is what El-Kalliny et al (2019) found, using a highly similar design (free recall, intracranial recordings from the lateral temporal lobe). The authors never address this contradiction in any depth to reconcile it with the previous literature and with the motivating theoretical model. 

      Figure 2 supports the findings from El-Kalliny and colleagues because it shows the relationship of each list item relative to the first item (El-Kalliny et al. 2019). Items encoded adjacent to SP1 show the highest spectral similarity supporting the idea of overlapping context predicted by the Temporal Context Model. However, our figure characterizes how increasing inter-item distance affects spectral similarity. It shows that two items successfully recalled from temporally distant serial positions show reduced spectral similarity. These findings align with the predictions of the temporal context model because two temporally distant items would lack significant contextual overlap and therefore would have more distinct spectral representations.

      El-Kalliny and colleagues do use a similar experimental set-up however the authors define drift differently. They identified patients with a tendency to temporally cluster, and observed those patients tend to drift less between temporally clustered items however they do not specify drift relative to a constant serial position as we do in our analysis. They define drift as spectral change between two adjacent items which is a more relative measure between any two items rather than in relation to a fixed point like SP1. Finally, our analysis focuses only on gamma activity while El-Kalliny and colleagues identified drift across a much broader set of frequency bands.

      (2) The way that the authors conduct the analysis of medial parietal neural similarity at boundaries leads to results that cannot be conclusively interpreted. The authors report enhanced similarity across lists for the first item in each list, which they interpret as reflecting a qualitatively distinct boundary signal. However, this finding can readily be explained by contextual drift if one assumes that whatever happens at the start of each list is similar or identical across lists (for example, a get ready prompt or reminder of instructions). The authors do not include analyses to rule this out, which undermines one of the main findings. 

      Extensions of the temporal context model (Lohnas et al. 2015) predict context at the beginning of a list will be most similar to the end of the prior list. The theory assumes a single-context state, consisting of a recency-weighted average of prior items, that is updated, even across different encoding periods.

      However, our results show a boundary item representation is most similar to the prior lists first item rather than the last item. Our results conflict with the extension of TCM because the shared similarity of boundary items suggests the context state for the first item in the list is not a recency-weighted average of the items presented immediately prior. The same boundary sensitive signal is not present in other regions, namely the hippocampus and lateral temporal cortex. Those regions do not show similarity between items at the beginning of each list.  

      Our main conclusion from these data was that the medial parietal lobe activity seems to be specifically sensitive to task boundaries, defined by the first event or the get ready prompt, while other regions are not.

      (3) Although several previous studies have linked hippocampal fMRI and electrophysiological activity at event boundaries with memory performance, the authors do not find similar relationships between hippocampal activity, event boundaries, and memory There are potential explanations for why this might be the case, including the distinction between item vs. associative memory, which has been a prominent feature of previous work examining this question. However, the authors do not address these potential explanations (or others) to explain their findings' divergence from prior work -this makes it difficult to interpret and to draw conclusions from the data about the hippocampus' mechanistic role in forming event memories.

      The following text was added and revised in the discussion to discuss hippocampal activity shown in our results and its lack of sensitivity to boundaries.  

      “Spectral activity in the medial parietal lobe aligned closely with boundaries. Drift between item pairs seemed to reset at each boundary, leading to renewed similarity after each boundary. This observation aligns with previous work suggesting boundaries reset temporal context.  In the temporal cortex, our findings extend prior studies which suggest the temporal lobe may play a role in associating adjacently presented items (Yaffe et al. 2014, ElKalliny et al 2019). We found items encoded in distant serial positions, but within the same list, drifted significantly more than items from adjacent serial positions (Figure 2C). Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional to the time elapsed between them. However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ben-Yakov et al. 2018, Ezzyat et al.  2014; Griffiths et al. 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al. 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions.”

      (4) There is a similar absence of interpretation with respect to the previous literature for the data showing enhanced boundary-related similarity in the medial parietal cortex. The authors’ interpretation seems to be that they have identified a boundary-specific signal that reflects a large and abrupt change in context, however, another plausible interpretation is that enhanced similarity in the medial parietal cortex is related to a representation of a schema for the task structure that has been acquired across repeated instances. 

      We agree our results could suggest the MPL creates a generalized situational model or schematic of the task. Unfortunately, our behavioral task does not allow us to differentiate between these ideas and pure boundary representation. However, given boundaries are a component in defining situational models, we chose to interpret our results conservatively as a form of boundary representation.  

      (5) The authors do not directly compare their model to other models that could explain how variability in neural activity predicts memory. One example is the neural fatigue hypothesis, which the authors mention, however there are no analyses or data to suggest that their data is better fit by a boundary/contextual drift mechanism as opposed to neural fatigue. 

      The study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of nonrecalled items in all serial positions to demonstrate the lack of boundary representation in first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (6) P2. Line 65 cites Polyn et al (2009b) as an example where ‘random’ boundary insertions improve subsequent memory. However, the boundaries in that study always occurred at the same serial position and were therefore completely predictable and not random.

      The citation was removed from the corresponding sentence.

      (7) P2. Line 74 cites Pu et al. (2022) as an example of medial temporal lobe ‘regional activity’ showing sensitivity to event boundaries; however, this paper reported behavioral and computational modeling results and did not include measurement of neural activity. 

      The citation was removed from the corresponding sentence.

      (8) P.3 Line 117, Hseih et al (2014) and Hseih and Ranganath (2015) are cited as evidence that ‘spectral’ relatedness decreases as a function of distance, but neither of these studies examined ‘spectral’ activity (fMRI univariate and multivariate). The manuscript would benefit from a careful review and updating of how the prior literature is cited, which will increase the impact of the findings for readers. 

      The text has been updated to reflect this distinction by modifying the statement to:  “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (9) Several previous studies have found hippocampal activity at event boundaries correlates with memory performance (Ben-Yakov et al 2011, 2018; Baldassano et al 2017), yet here the authors do not find evidence for hippocampal activity at event boundaries related to memory. Does this difference reflect something important about how the hippocampus vs. medial parietal cortex vs. lateral temporal cortex contribute to memory formation? Currently, there is not much discussion about how to interpret the differences between brain regions. Previous work has suggested that hippocampal pattern similarity at event boundaries specifically supports associative memory across events (Ezzyat & Davachi, 2014; Griffiths & Fuentemilla, 2020; Heusser et al., 2016), which may help explain their findings. In any case the authors could increase the impact of their paper by further situating their findings within the previous literature. 

      We would not suggest there is no boundary-related activity in the hippocampus. Similar to an earlier point made by the reviewer, to clarify our interpretation of regional differences, the following text has been added to the discussion.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) The authors mention neural fatigue as an alternative theory to explain the primacy effect (Serruya et al., 2014), however there are no analyses or data to suggest that their data is better fit by a boundary mechanism as opposed to neural fatigue. Previous studies have shown that gamma activity in the hippocampus changes with serial position and with encoding history (Serruya et al 2014; Lohnas et al 2020). Here, the authors could compare the reported pattern similarity results to control analyses that replicate this prior work, which would strengthen their argument that there is unique information at boundaries that is distinct from a neural fatigue signal. 

      The serial position effects described by Serruya and colleagues describe decreasing HFA with increasing serial position in the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2014). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global neural fatigue model does not account for our results.

      Notably, the authors do not characterize HFA trends in the MPL. Nevertheless, their findings do not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.  

      Next, the neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2015). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (11) For the analyses that examine cross-list similarity (e.g. the medial parietal analysis in Figure 3), how did the authors choose the number of lists over which similarity was calculated? Was the selection of this free parameter cross-validated to ensure that it is not overfitting the data? Given that there were 25 lists per session, using the three succeeding lists seems arbitrary. Why not use every list across the whole session? 

      Given the volume of data, number of patients, and computational time available at our facility, we extended the analysis as far as we could to characterize the observed trend.

      (12) P4. Line 155 says that Figure 3C shows example subject data, but it looks like it is actually Figure 3D. 

      The text was updated to reference the correct figure.

      (13) The t-tests on P.4 Line 159 have two sets of degrees of freedom but should only have one. 

      The t-tests described by Figure 3B represent the mean parameter estimate of the predictor for boundary proximity contrasted by region for all item pairs. The statistical test in this case was an unpaired t-test between parameter estimates for patients with electrodes in each of the regions. The numbers within parentheses represent the sample size, or number of subjects, contributing electrodes to each region.

      Reviewer 2:

      (1) Because this is not a traditional event boundary study, the data are not ideally positioned to demonstrate boundary specific effects. In a typical study investigating event boundary effects, a series of stimuli are presented and within that series occurs an event boundary – for instance, a change in background color. The power of this design is that all aspects between stimuli are strictly controlled – in particular, the timing – meaning that the only difference between boundary-bridging items is the boundary itself. The current study was not designed in this manner, thus it is not possible to fully control for effects of time or that multiple boundaries occur between study lists (study to distractor, distractor to recall, recall to study). Each list in a free recall study can be considered its own “mini” experiment such that the same mechanisms should theoretically be recruited across any/all lists. There are multiple possible processes engaged at the start of a free recall study list which may not be specific to event boundaries per se. For example, and as cited by the authors, neural fatigue/attentional decline (and concurrent gamma power decline) may account for serial position effects. Thus, SP1 on all lists will be similar by virtue of the fact that attention/gamma decrease across serial position, which may or may not be a boundaryspecific effect. In an extreme example, the analyses currently reported could be performed on an independent dataset with the same design (e.g. 12 word delayed free recall) and such analyses could potentially reveal high similarity between SP1-list1 in the current study and SP1-list1 in the second dataset, effects which could not be specifically attributed to boundaries.

      The neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (2) Comparisons of recalled "pairs" does not account for the lag between those items during study or recall, which based on retrieved context theory and prior findings (e.g. Manning et al., 2011), should modulate similarity between item representations. Although the GLM will capture a linear trend, it will not reveal serial position specific effects. It appears that the betas reported for the SP12 analyses are driven by the fact that similarity with SP12 generally increases across serial position, rather a specific effect of "high similarity to SP12 in adjacent lists" (Page 5, excluding perhaps the comparison with list x+1). It is also unclear how the SP12 similarity analyses support the statement that "end-list items are represented more distinctly, or less similarly, to all succeeding items" (Page 5). It is not clear how the authors account for the fact that the same participants do not contribute equally to all ROIs or if the effects are consistent if only participants who have electrodes in all ROIs are included.

      In our study, all pairs are defined by the lag between a reference and target item. The results in Figure 3 show the similarity between each serial position in relation to SP1; Figure 4 shows lag between each serial position relative to SP2 and 3; and Figure 5 shows lag relative to SP12. Each statistical model accounts for the lag by ordering the data by increased inter-item distance. Further, our definition of lag is significantly more rigorous than that used by Manning and colleagues. Our similarity results for Figures 3-5 characterize the change in similarity relative to a constant reference point, such as SP1, rather than a relative reference point, such as +1 lag, which aggregates similarity between pairs such as SP1 to SP2 with SP4 to SP5, which maybe recalled via different memory mechanisms.  

      In Figure 5, we agree your characterization that ‘similarity with SP12 generally increases across serial position’ is a more accurate description of the trend. The text has been updated to reflect this by changing the interpretation to “later serial positions in adjacent lists shared a gradually increasing similarity to SP12.”  

      Next, we clarify the statement "end-list items are represented more distinctly, or less similarly, to all succeeding items". When recalling SP12, the subsequent items recalled exhibit significantly lower similarity to SP12 (see Figure 5D, pink). Consequently, the spectral representation of successfully recalled end-list items appears more distinct from later items in similar serial positions. This stands in contrast to our observations illustrated in Figures 3 and 4, where successfully recalled start-list items demonstrate greater similarity to later items in similar serial positions.

      (3) The authors use the term "perceptual" boundary which is confusing. First, "perceptual boundary" seems to be a specific subset of the broader term "event boundary," and it is unclear why/how the current study is investigating "perceptual" boundaries specifically. Second and relatedly, the current study does not have a sole "perceptual" boundary (as discussed in point 1 above), it is really a combination of perceptual and conceptual since the task is changing (from recalling the words in the previous list to studying the words in the current list OR studying the words in the current list to solving math problems in the current list) in addition to changes in stimulus presentation. 

      We agree with the statement that ‘perceptual’ as a modifier to the boundaries described here does not add significant information. Therefore, we have removed all reference to perceptual boundaries.

      (4) Although the results show that item-item similarity in the gamma band decreases across serial position, it is unclear how the present findings further describe "how gamma activity facilitates contextual associations" (Page 5). As mentioned in point 1 above, such effects could be driven by attentional declines across serial position -- and a concurrent decline in gamma power -- which may be unrelated to, and actually potentially impair, the formation of contextual associations, given evidence from the literature that increased gamma power facilitates binding processes.

      We agree that our study does not elucidate a mechanistic relationship between gamma power and contextual associations. The referenced sentence has been changed to: “how gamma activity is associated with context”.

      Please see our response to point 1 above. In addition, studies demonstrating decreasing gamma power with increasing serial position focus primarily on the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2012). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global attentional decline or neural fatigue model does not account for our results.

      Notably, HFA trends in the MPL are poorly described. Further, gamma power decline does not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.

      (5) Some of the logic and interpretations are inconsistent with the literature. For example, the authors state that "The temporal context model (TCM) suggests that gradual drift in item similarity provides context information to support recovery of individual items" however, this does not seem like an accurate characterization of TCM. According to TCM, context is a recency-weighted average of previous experience. Context "drifts" insofar as information is added to/removed from context. Context drift thus influences item similarity -- it is not that item similarity itself drifts, but that any change in item-item similarity is due to context drift. 

      The current findings do not appear at odds with the conceptualization of drift and context in current version of the context maintenance and retrieval model. Furthermore, the context representation is posited to include information beyond basic item representations. Two items, regardless of their temporal distance, can be associated with similar contexts if related information is included in both context representations, as predicted and shown for multiple forms of relatedness including semantic relatedness (Manning & Kahana, 2012) and task relatedness (Polyn et al., 2012).

      We revised the sentence and encompassing paragraph to describe the temporal context model more accurately and emphasize how our findings align with the stated version of CMR. The revised text is below:  

      “Next, we asked how gamma spectral activity reflects contextual association between items. In the medial parietal lobe, we observed recurring similarity between items distant in time but adjacent to boundaries. This pattern suggests spectral activity may carry information about an item's relationship to a boundary. These observations align with the Context Maintenance and Retrieval model which extends the predictions of TCM to encompass broader relationships among items. Our results demonstrate boundaries as an important aspect of context and specify the spectral and regional properties of these boundary-related contextual features.”

      (6) Lohnas et al. (2020) Neural fatigue influences memory encoding in the human hippocampus, Neuropsychologia, should be cited when discussing neural fatigue

      Thank you for your suggestion. The citation has been added to the text.

      (7) A within-list, not an across list, similarity analysis should be used to test the interpretation that end-of-list items are more distinct than other list items.

      We believe this recommendation refers to the following line in our text: “These findings suggest end-list items are represented more distinctly, or less similarly, to all succeeding items.” Our statement compares list x, SP12 to all succeeding items (in list x+1, x+2, etc.). Therefore, this statement refers to items in the next lists which is why we performed an across list analysis rather than within-list one.

      (8) It is unclear why it is necessary to use PCA to estimate similarity between items.

      PCA was used to reduce the dimensionality of the time-frequency matrix for the gamma band. This technique allowed us to compare predominant trends in gamma between items. In addition, we added a figure showing 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (9) Lags are listed as -4, 4 (Page 8), however with a list length of 12, possible lags should be 11, 11.

      The listed parenthetical statement ‘(-4 to 4)’ referred to Figure 1 where Lag CRP is shown for transitions from -4 to 4. However, we did calculate lag CRP for all possible transitions. Therefore, the referenced phrase was changed to: “Lagged CRP was calculated for all possible transitions (-11 to 11).”

      (10) Hsieh et al. 2014 and Hsieh & Ranganath (2015) are fMRI studies and as such, do not support the statement "Previous work consistent with temporal context models suggests spectral relatedness reduces as a function of distance between words" (Page 3). 

      The statement has been revised to: “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (11) Although statistically one can measure "How item-item similarity is affected by recollection" (Page 3), this is logically backwards, given that similarity during study necessarily precedes performance during free recall. Additionally, it is erroneous to assume that recalled words are "recollected" without additional measurements (e.g. Mickes et al. (2013) Rethinking familiarity: Remember/Know judgments in free recall, JML).

      The statement was changed to “item-item similarity is affected based on successful recall” given recollection cannot be determined in our paradigm.

      Reviewer 3:

      (1) My primary confusion in the current version of this paper is that the analyses don't seem to directly compare the two proposed models illustrated in Fig 1B, i.e. the temporal context model (with smooth drifts between items, including across lists) versus the boundary model (with similarities across all lists for items near boundaries). After examining smooth drift in the within-list analysis (Fig 2), the across-list analyses (Figs 3-5) use a model with two predictors (boundary proximity and list distance), neither of which is a smoothlydrifting context. Therefore there does not appear to be a quantitative analysis supporting the conclusion that in lateral temporal cortex "drift exhibits a relationship with elapsed time regardless of the presences of intervening boundaries" (lines 272-3).

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists.

      However, we agree with the comment that the presented data does not directly support the lateral temporal cortex drifts independent of intervening boundaries. Therefore, we amended the statement to: “We found successfully recalled items encoded in distant serial positions drifted significantly more than items from adjacent serial positions (Figure 2C)”. Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional time elapsed between them.”

      (2) The feature representation used for the neural response to each item is a gamma power time-frequency matrix. This makes it unclear what characteristics of the neural response are driving the observed similarity effects. It appears that a simple overall scaling of the response after boundaries (stronger responses to initial items during the beginning portion of the 1.6s time window) would lead to the increased cosine similarity between initial items, but wouldn't necessarily reflect meaningful differences in the neural representation or context of these items.

      Our study aims to draw the connection between the neural response after boundaries with neural representation and context of these items. Prior studies (Manning et al. 2011, El Kalliny et al. 2017) have interpreted similarity in neural spectra as a memory relevant phenomenon. We use very similar methods to perform our analysis.  

      In addition, we compare the fit of our boundary similarity model to behavioral performance to show increased boundary representation correlates with improved boundary item recall.

      While our study does not specify which time-frequency components underly the increased similarity, we do limit our analysis to the gamma band. Traditional analyses include log-scaled, broadband time-frequency data (eg. 3-100hz) from which we specify the relevance of a much narrower spectral band.  

      Finally, we tried to study which time–frequency components contributed to the increased similarity, but it varied greatly between patients (see Figure 3 – supplementary figure 2D). Hence, we opted to use principal component analyses to compare the features showing the most variation for each given participant. This added analytical step allows us to detect boundary effects across patients despite individual variability in boundary representation.

      (3) The specific form of the boundary proximity models is not well justified. For initial items, a model of e^(1-d) is used (with d being serial position), but it is not stated how the falloff scale of this model was selected (as opposed to e.g. e^((1-d)/2)). For final items, a different model of d/#items is used, which seems to have a somewhat different interpretation (about drift between boundaries, rather than an effect specific to items near a final boundary). The schematic in Fig 1B appears to show a hypothesis which is not tested, with symmetric effects at initial and final boundaries.

      The boundary proximity models were chosen empirically. Our model was intended to quantify a decreasing relationship across many patients. We acknowledge the constants and variables may not definitively describe underlying neural processes.  

      For start- and end-list boundaries, we used different models because primacy and recency effects are unique phenomena. Primacy memory is classically thought to arise from rehearsal during the encoding time (Polyn et al. 2009, Lohnas et al. 2015). Alternatively, recency memory is thought to arise from strong contextual cues of recency items during recall due to their temporal proximity. Therefore, we have a limited basis on which to assume their spectral representation in relation to task boundaries would be symmetric.

      (4) The main text description of Fig 2 only describes drift effects in lateral temporal cortex, but Fig 2 - supplement 1 shows that there is also drift and a significant subsequent memory effect in the other two ROIs as well. There is not a significant memory x drift slope interaction in these regions; are the authors arguing that the lack of this interaction (different drift rates for remembered versus forgotten items) is critical for interpreting the roles of lateral temporal cortex versus medial parietal and hippocampal regions?

      Yes. Fig 2- Supplement 1 shows that drift occurs in both the HC and MPL. However, the interaction term is not significant, which suggests that the rate of drift between recalled and non-recalled items is not significantly different.  

      In contrast, Fig 2C shows that recalled pairs drift at a higher rate than non-recalled pairs. For the LTC, the interaction term is negative in magnitude and statistically significant. This suggests successfully encoded item pairs encoded far apart share more distinct spectral representations, specifically in the LTC. These findings lead to our interpretation in the discussion that “elevated drift rate might allow the representations of recalled items to remain distinct but ordered in memory.”

      (5) The parameter fits for the "list distance" regressor are not shown or analyzed, though they do appear to be important for the observed similarity structure (e.g. Fig 3E). I would interpret this regressor as also being "boundary-related" in the sense that it assumes discrete changes in similarity at boundaries.

      Parameter fits for the ‘list distance’ regressor are now shown in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant.

      (6) To make strong claims about temporal context versus boundary models as implied by Fig 1B, these two regressors should be fit within the same model to explain across-list similarity. The temporal context model could be based on the number of intervening items (as in Fig 1B) or actual time elapsed between items. The relationship between the smoothly drifting temporal context model and the discretely-jumping list distance models should also be clarified.

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. A model which included a ‘temporal context regressor’ would not be able to account for the presence of a boundary effect and would not allow us to demonstrate a boundary representation in the presence of drift. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists. These regressors allow the model to differentiate between intra-list changes (the boundary regressor) verses inter-list changes (the list distance regressor).  

      (7) The features of the time-frequency matrix that are driving similarity between events could be visualized to provide a better understanding of the boundary-related signals. The analysis could also be re-run with reduced versions of the feature space in order to determine the critical components of this signal; for example, responses could be averaged across time to examine only differences across frequencies, or across frequencies to examine purely temporal changes across the 1.6 second window.

      Figure 3 – supplementary figure 2 A-C has been added to show varying the number of principal components (PCs) does not change the trend of boundary sensitivity in the MPL. In addition, we included 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (8) If the authors are considering a space of multiple models as "boundary proximity models" (e.g. linear models and exponential models with different scale factors), this should be part of the model-fitting process rather than a single model being selected posthoc.

      We agree with the reviewer’s suggestion that the most ideal way to fit a model to the trend would be using a model-fitting process. However, due to a limitation on the amount of computational resources available, we were not able to perform it given the size of our dataset.

      (9) The interpretation of region differences in the results in Fig 2 and Fig 2 - supplement 1 should be clarified. 

      In discussion, we have added the following text to clarify our interpretation of the regional differences shown in the mentioned figures.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2018). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) Whether there are significant fits for the list distance regressor, and whether these fits vary across regions, could be stated. The list distance regressor could also be directly compared (in the same model) to a temporal-context regressor, which predicts graded changes in similarity between items rather than the discrete changes between lists.

      We have added parameter fits for the ‘list distance’ regressor in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant. Therefore, our results show very similar stepwise decrease in similarity across lists between regions (list distance regressor; Figure 3 —supplementary figure 1B).

      We could not compare these parameters to a separate model which includes a smoothly drifting ‘temporal-context’ regressor due to the regressors collinearity with any representation of boundary. See our response to Reviewer 3 –comment 6.  

      (11) The authors should clarify their interpretation of the results, and whether they are proposing a tweak to the temporal context model or a substantially different organizational system. 

      In the disucssion we include the following statements to clarify what we suggest regarding the temporal context model.  

      “Our findings suggest a broader scope of contextual association than just prior items, where temporal proximity as well as task structure in the form of boundaries, play intertwined roles in contextual construction. Our data therefore have implications for updated iterations of the temporal context model incorporating (perhaps) specific terms for boundary information. This may in turn provide a more systematic prediction of primacy effects in behavioral data.”  

      (12) Minor typos and corrections: 

      52: using -> use 

      108: patients -> patients'  156: list -> lists 

      The list distance plot is described as "pink" in Fig 3 and Fig 5 - supplement 1, but appears gray in the figures.

      Each of these corrections has been corrected in the text.

    1. eLife Assessment

      This manuscript develops a theoretical model of osmotic pressure adaptation in microbes by osmolyte production and wall synthesis. The prediction of a rapid increase in growth rate on osmotic shock is experimentally validated using fission yeast. By using phenomenological rules rather than detailed molecular mechanisms, the model can potentially apply to a wide range of microbes, providing important insights that would be of interest to the wider community studying the regulation of cell size and mechanics. The level of coarse-graining and the assumptions and limitations of the model have been well described, providing a convincing foundation for making predictions. However, further experimental work on the validity of the core assumptions across a range of microbial organisms is needed to assess the universality of the model.

    2. Reviewer #1 (Public review):

      Summary:

      A theoretical model for microbial osmoresponse was proposed. The model assumes simple phenomenological rules: (i) the change of free water volume in the cell due to osmotic imbalance based on pressure balance, (ii) Osmoregulation that assumes change of the proteome partitioning depending on the osmotic pressure that affects the osmolyte-producing protein production, (iii) The cell-wall synthesis regulation where the change of the turgor pressure to the cell-wall synthesis efficiency to go back to the target turgor pressure, (iv) Effect of Intracellular crowding assuming that the biochemical reactions slows down for more crowding and stops when the protein density (protein mass divided by free water volume) reaches a critical value. The parameter values were found in the literature or obtained by fitting to the experimental data. The authors compare the model behavior with various microorganismcs (E. coli, B. subtils, S. Cerevisiae, S. pombe), and successfully reproduced the overall trend (steady state behavior for many of them, dynamics for S. pombe). In addition, the model predicts non-trivial behavior such as the fast cell growth just after the hypoosmotic shock, which is consistent with experimental observation. The authors further make experimentally testable predictions regarding mutant behavior and transient dynamics.

      The theory assumes simple mechanistic dependence between core variables without going into specific molecular mechanisms of regulations. The simplicity allows the theory to apply to different organisms by adjusting the time scales with parameters, and the model successfully explains broad classes of observed behaviours. Mathematically, the model provides analytical expressions of the parameter dependencies and an understanding of the dynamics through the phase space without being buried in the detail. This theory can serve as a base to discuss the universality and diversity of microbial osmoresponse.

      The coarse-grained nature of the model is the strength of the model in terms of its generality. However, it does not consider various regulations at the molecular level. Hence, certain adaptation features are not considered in the current version of the model. The updated manuscript discusses the pros and cons of the current approach.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Ye et al. have developed a theoretical model of osmotic pressure adaptation by osmolyte production and wall synthesis.

      Strengths:

      They validate their model predictions of a rapid increase in growth rate on osmotic shock experimentally using fission yeast. The study has several interesting insights which are of interest to the wider community of cell size and mechanics.

      Comments on revisions:

      The authors have in the revised manuscript addressed the aspects of the writing that were unclear. , that are listed previously as major and minor comments. We believe the issues raised by this reviewer have been adequately addressed in the manuscript.

    1. eLife Assessment

      This paper presents important information about potential Homo naledi-associated markings discovered on the walls of the Hill Antechamber of the Rising Star Cave system, South Africa. If confirmed, the antiquity, intentionality, and authorship of the reported markings will have profound archaeological implications, as such behaviors are otherwise widely considered to be unique to our species, Homo sapiens. This report concerns preliminary findings and as it stands the study is incomplete, with further work needed in the future to support the claims about the anthropogenic nature, age, and author of the engravings.

    2. Reviewer #3 (Public review):

      In a characteristically bold fashion, Lee Berger and colleagues argue here that markings they have found in a dark isolated space in the Rising Star Cave system are likely over a quarter of a million years old and were made intentionally by Homo naledi, whose remains nearby they have previously reported. As in a European and much later case they reference ('Neanderthal engraved 'art' from the Pyrenees'), the entangled issues of demonstrable intentionality, persuasive age and likely authorship will generate much debate among the academic community of rock art specialists. The title of the paper and the reference to 'intentional designs', however, leave no room for doubt as to where the authors stand, despite an avoidance of the word art, entering a very disputed terrain. Iain Davidson's (2020) 'Marks, pictures and art: their contributions to revolutions in communication', also referenced here, forms a useful and clearly articulated evolutionary framework for this debate. The key questions are: 'are the markings artefactual or natural?', 'how old are they?' and 'who made them?, questions often intertwined and here, as in the Pyrenees, completely inseparable. I do not think that these questions are definitively answered in this paper and I guess from the language used by the authors (may, might, seem etc) that they do not think so either.

      Before considering the specific arguments of the authors to justify the claims of the title, we should recognise the shift in the academic climate of those concerned with 'ancient markings' that has taken place over the past two or three decades. Before those changes, most specialists would probably have expected all early intentional markings to have been made by Homo sapiens after the African diaspora as part of the explosion of innovative behaviours thought to characterise the 'origins of modern humans'. Now, claims for earlier manifestations of such innovations from a wider geographic range are more favourably received, albeit often fiercely challenged as the case for Pyrenean Neanderthal 'art' shows (White et al. 2020). This change in intellectual thinking does not, however, alter the strict requirements for a successful assertion of earlier intentionality by non-sapiens species. We should also note that stone, despite its ubiquity in early human evolutionary contexts, is a recalcitrant material not easily directly dated whether in the form of walling, artefact manufacture or potentially meaningful markings. The stakes are high but the demands no less so.

      Why are the markings not natural? Berger and co-authors seem to find support for the artefactual nature of the markings in their location along a passage connecting chambers in the underground Rising Star Cave system. The presumption is that the hominins passed by the marked panel frequently. I recognise the thinking but the argument is weak. More confidently they note that "In previous work researchers have noted the limited depth of artificial lines, their manufacture from multiple parallel striations, and their association into clear arrangement or pattern as evidence of hominin manufacture (Fernandez-Jalvo et al. 2014)". The markings in the Rising Star Cave are said to be shallow, made by repeated grooving with a pointed stone tool that has left striations within the grooves, and to form designs that are "geometric expressions" including crosshatching and cruciform shapes. "Composition and ordering" are said to be detectable in the set of grooved markings. Readers of this and their texts will no doubt have various opinions about these matters, mostly related to rather poorly defined or quantified terminology. I reserve judgement, but would draw little comfort from the similarities among equally unconvincing examples of early, especially very early, 'designs'. Two or even three half convincing arguments do not add up to one convincing one.

      The authors draw our attention to one very interesting issue: given the extensive grooving into the dolomite bedrock by sharp stone objects, where are these objects? Only one potential 'lithic artefact' is reported, a "tool-shaped rock [that] does resemble tools from other contexts of more recent age in southern Africa, such as a silcrete tool with abstract ochre designs on it that was recovered from Blombos Cave (Henshilwood et al. 2018)", also figured by Berger and colleagues. A number of problems derive from this comparison. First, 'tool-shaped rock' is surely a meaningless term: in a modern toolshed 'tool-shaped' would surely need to be refined into 'saw-shaped', 'hammer-shaped' or 'chisel-shaped' to convey meaning? The authors here seem to mean that the Rising Star Cave object is shaped like the Blombos painted stone fragment? But the latter is a painted fragment not a tool and so any formal similarity is surely superficial and offers no support to the 'tool-ness' of the Rising Star Cave object. Does this mean that Homo naledi took (several?) pointed stone tools down the dark passsageways, used them extensively and, whether worn out or still usable, took them all out again when they left? Not impossible, of course. And the lighting?

      The authors rightly note that the circumstance of the markings "makes it challenging to assess whether the engravings are contemporary with the Homo naledi burial evidence from only a few metres away" and more pertinently, whether the hominins did the markings. Despite this honest admission, they are prepared to hypothesise that the hominin marked, without, it seems, any convincing evidence. If archaeologists took juxtaposition to demonstrate authorship, there would be any number of unlikely claims for the authorship of rock paintings or even stone tools. The idea that there were no entries into this Cave system between the Homo naledi individuals and the last two decades is an assertion not an observation and the relationship between hominins and designs no less so. In fact the only 'evidence' for the age of the markings is given by the age of the Homo naledi remains, as no attempt at the, admittedly very difficult, perhaps impossible, task of geochronological assessment, has been made.

      The claims relating to artificiality, age and authorship made here seem entangled, premature and speculative. Whilst there is no evidence to refute them, there isn't convincing evidence to confirm them.

      References:

      Davidson, I. 2020. Marks, pictures and art: their contribution to revolutions in communication. Journal of Archaeological Method and Theory 27: 3 745-770.

      Henshilwood, C.S. et al. 2018. An abstract drawing from the 73,000-year-old levels at Blombos Cave, South Africa. Nature 562: 115-118.

      Rodriguez-Vidal, J. et al. 2014. A rock engraving made by Neanderthals in Gibralter. Proceedings of the National Academy of Sciences.

      White, Randall et al. 2020. Still no archaeological evidence that Neanderthals created Iberian cave art.

      Comments on latest version:

      The authors have not modified their stance or the authority of their arguments since the original paper.

    3. Reviewer #4 (Public review):

      Thank you for the opportunity to provide a peer-review of this manuscript, which I first reviewed in 2023 under the title of '241,000 to 335,000 Years Old Rock Engravings Made by Homo naledi in the Rising Star Cave system, South Africa'. My review is brief as the authors state they have made "relatively minimal changes", so most of the comments I made in 2023 still stand. Some of the language is a little more temperate but the main issues of this potentially landmark study remain and undermine scientific acceptance of the findings claim. The fact that this is an initial report does not excuse it from the normal conventions of building arguments supported by empirical data. Again, the absence of a rock art expert on the authorial team causes recurring weaknesses still to be evident (would one ask a rock art expert to analyse a new fossil hominin skull for example?). Specifically, there are two major issues that need to be resolved before there is necessary and sufficient cause to assign the term 'rock engravings' to the marks in the Dinaledi chamber. These are authorship and dating.

       Authorship: The assertion that the 'rock engravings' are anthropogenic remains unsupported by empirical evidence, with a number of possible natural factors that could just as likely have caused the marks. Not to use image enhancements - which is standard in most rock art research and has been for some time - is a critical omission. The concerns stated about AI and data standards are not developed and the authors are directed to the literature in this field, for example this 2025 overview - https://www.sciencedirect.com/science/article/pii/S1296207424002516. Again, having a rock art expert would show the AI concern to be valid but easily addressed using Data Standards. In the almost 2 years since the first pre-print was released, there has been ample time for high resolution photographs and scans of the purported 'rock engravings'; analysis of which by relevant experts could properly physically characterise the marks and thus establish more or less likely agents for their production. European-based researchers in particular has utilised this approach on material such as the Blombos ochre and marked bone from Europe and Africa. None of these methods is invasive or destructive.

      To then go on and link Homo naledi to these markings is premature, especially when this landscape has been home to multiple hominins. Most rock art sites do not contain the physical bodily remains of their makers so we assign authorship based on dating (such as for Neanderthal era art in Europe for example); the second critical issue in this report:

       Dating: There is no direct or closely associated chronometric dating of the 'rock engravings' or their immediate context, so the age range claimed is unsupported. Rock art dating is notoriously difficult - and why researchers closely scrutinise dates produced. In this case, however, the chronological context is physically so far removed from these rock markings, as to be misleading at best and need to be discounted until a proper programme of dating has commenced. The sources cited for rock art dating tend to be out of date and it would be standard practice to have a geochronologist assess the rock-marked areas and then establish dating protocols.

      Authorship and dating are cornerstone of archaeological/paleoanthropological work and need to established in the first instance. Until that has been done commensurate with current standards in global rock art research this potentially landmark finding cannot be taken as probable, only as possible. This is a pity as the last decade or so has revolutionised our understanding of the socially complex world multiple hominin species lived in, and marked in utilitarian and symbolic ways. The conditions for acceptance of ancient rock art has thus never been better, but the Dinaledi example needs to revisit research first principles around authorship and dating to be included as a credible part of this larger context. It would have been good to see a commitment to a coherent research programme to this end for this case study.

      I hope these observations are useful. As above I keep them short as there has been minimal change to the 2023 ms, and my detailed comments on that remain with the first version of the work.

    4. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their very constructive and helpful comments on the previous version of this manuscript. They have focused on some important issues and have raised many valuable questions that we expect to answer as research begins on these markings. As has been often the case with preprints, a number of experts beyond the four reviewers and editor have provided comments, questions, and suggestions, and we have taken these on board in our revision of the manuscript. In particular, Martinón-Torres et al. (2024) focused several comments upon this manuscript and raise some points that were not considered by the reviewers, and so we discuss those points here in addition to the reviewer comments.

      Some of us have been engaged in other aspects of the possible cultural activities of Homo naledi. After the discovery of these markings we considered it indefensible to publish further research on the activity of H. naledi within this part of the cave system without making readers aware that the H. naledi skeletal remains occur in a spatial context near markings on cave walls. Of course, the presence of markings leaves many questions open. A spatial context does not answer all questions about the temporal context. The situation of the Dinaledi Subsystem does entail some constraints that would not apply to markings within a more open cave or rock wall, and we discuss those in the text.

      We find ourselves in agreement with most of the reviewers on many points. As reflected by several of the reviewers, and most pointedly in the remarks by reviewer 1, the purpose of this preprint is a preliminary report on the observation of the markings in a very distinctive location. This initial report is an essential step to enable further research to move forward. That research requires careful planning due to the difficulty of working within the Dinaledi Subsystem where the markings are located. This pattern of initial publication followed by more detailed study is common with observations of rock art and other markings identified in South Africa and elsewhere. We appreciate that the reviewers have understood the role of this initial study in that process of research.

      Because of this, the revised manuscript represents relatively minimal changes, and all those at the advice of reviewers. Many thanks to all the reviewers for noting various typographic errors, missed references and other issues that we have done our best to fix in the revised manuscript.

      Expertise of authors. Reviewer 4 mentions that the expertise of the authors does not include previous publication history on the identification of rock art, and other reviewers briefly comment that experts in this area would enhance the description. AF does have several publications on ancient engravings and other markings; LRB has geological training and field experience with rock art. Notwithstanding this, we do take on board the advice to include a wider array of subject experts in this research, and this is already underway.

      Image enhancement. We appreciate the suggestions of some reviewers for possible strategies to use software filters to bring out details that may not be obvious even with our cross-polarization lighting and filtering. These are great ideas to try. In this manuscript we thought that going very far into software editing or image enhancement might be perceived by some readers as excessive manipulation, particularly in an age of AI. In future work we will experiment with the suggested approaches. 

      Natural weathering. In the process of review and commentary by experts and the public there has been broad acceptance that many of the markings illustrated in this paper are artificial and not a product of natural weathering of the dolomite rock. We deeply appreciate this. At the same time, we accept the comments from reviewers that some markings may be difficult to differentiate from natural weathering, and that some natural features that were elaborated or altered may be among the markings we recognize. On pages 3 and 4 we present a description of the process of natural subaerial weathering of dolomite, which we have rooted in several references as well as our own observations of the natural weathering visible on dolomite cave walls in the Rising Star cave system. This includes other cave walls within the Dinaledi Subsystem. We discuss the “elephant skin” patterning of natural dolomite surface weathering, how that patterning emerges, and how that differs from the markings that are the subject of this manuscript.

      Animal claw marks. Martinón-Torres et al. 2024 accept that some of the markings illustrated on Panel A are artificial, but they offer the hypothesis that some of those markings may be consistent with claw marks from carnivores or other mammals. They provide a photo of claw marks within a limestone cave in Europe to illustrate this point. On pages 5 and 6 of the revised manuscript we discuss the hypothesis of claw marks. We discuss the presence of animals in southern Africa that may dig in caves or mark surfaces. However the key aspect of the Malmani dolomite caves is that the hardness of dolomitic limestone rock is much greater than many of the limestone caves in other regions such as Europe and Australia, where claw marks have been noted in rock walls. As we discuss, we have not been able to find evidence of claw marks within the dolomite host bedrock of caves in this region, although carnivores, porcupines, and other animals dig into the soft sediments within and around caves. The form of the markings themselves also counter-indicates the hypothesis that they are claw marks. 

      Recent manufacture. One comment that occurs within the reviews and from other readers of the preprint is that recent human visitors to the cave, either in historic or recent prehistoric times, may have made these marks. We discuss this hypothesis on page 6 of the revised manuscript. The simple answer is that no evidence suggests that any human groups were in the Dinaledi Subsystem between the presence of H. naledi and the entry of explorers within the last 25 years. The list of all explorers and scientific visitors to have entered this portion of the cave system is presented in a table. We can attest that these people did not make the marks. More generally, such marks have not been known to be made by cavers in other contexts within southern Africa.

      Panels B and C. We have limited the text related to these areas, other than indicating that we have observed them. The analysis of these areas and quantification of artificial lines does not match what we have done for the Panel A area and we leave these for future work. 

      Presence of modern humans. We have observed no evidence of modern humans or other hominin populations within the Dinaledi Subsystem, other than H. naledi. Several reviewers raise the question of whether the absence of evidence is evidence of absence of modern humans in this area. This is connected by two of the reviewers to the observation that the investigation of other caves in recent years has shown that markings or paintings were sometimes made by different groups over tens of thousands of years, in some cases including both Neanderthals and modern humans. We have decided it is best for us not to attempt to prove a negative. It is simple enough to say that there is no evidence for modern humans in this area, while there is abundant evidence of H. naledi there.

      Association with H. naledi. Reviewer 2 made an incisive point that the previous version contained some text that appeared contradictory: on the one hand we argued that modern humans were not present in the subsystem due to the absence of evidence of them, yet we accepted that H. naledi may have been present for a longer time than currently established by geochronological methods.

      We appreciate this comment because it helped us to think through the way to describe the context and spatial association of these markings and the skeletal remains, and how it may relate to their timeline. Other reviewers also raised similar questions, whether the context by itself demonstrates an association with H. naledi. We have revised the text, in particular on pages 5 and 7, to simply state that we accept as the most parsimonious alternative at present the hypothesis that the engravings were made by H. naledi, which is the only hominin known to be present in this space.

      Age of H. naledi in the system. At one place in the previous manuscript we indicated that we cannot establish that H. naledi was only active in the cave system within the constraints of the maximum and minimum ages for the Dinaledi Subsystem skeletal remains (viz., 335 ka – 241 ka), because some localities with skeletal material are undated. We have adjusted this paragraph on page 7 to be clear that we are discussing this only to acknowledge uncertainty about the full range of H. naledi use of the cave system.

      Geochronological methods. Several reviewers discuss the issue of geochronology as applied to these markings. This is an area of future investigation for us after the publication of this initial report. As some reviewers note, the prospects for successful placement of these engraved features and other markings with geochronological methods depends on factors that we cannot predict without very high-resolution investigation of the surfaces. We have included greater discussion of the challenges of geochronological placement of engravings on page 6, including more references to previous work on this topic. We also briefly note the ethical problems that may arise as we go further with potentially  invasive, destructive or contact studies of these engravings, which must be carefully considered by not just us, but the entire academy.

      Title. Some reviewers suggested that the title should be rephrased because this paper does not use chronological methods to derive date constraints for the markings. We have rephrased the title to reflect less certainty while hopefully retaining the clear hypothesis discussed in the paper.

    1. eLife Assessment

      This important study uses Mendelian Randomization to provide evidence that early-life reproductive phenotypes (i.e., age at onset of menarche and age at first birth) have a significant impact on numerous health outcomes later in life. The empirical evidence provided by the authors supporting the antagonistic pleiotropy theory is solid. Theories of aging should be empirically tested and this study provides a good first step in that direction.

    2. Reviewer #1 (Public review):

      Summary:

      The present study aims to determine possible associations between reproduction with prevalence of age-related diseases based on the antagonistic pleiotropy hypothesis of ageing predominantly using Mendelian Randomization. The authors provide evidence demonstrated that menarche before the age 11 and childbirth before 21 increases the risk of several diseases, and almost doubled the risk for diabetes, heart failure, and quadrupled the risk of obesity,

      Strengths:

      Large sample size. Many analyses

    3. Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth have a positive causal effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identify 128 fertility-related SNPs that associate with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      Weaknesses:

      The authors report evidence in support of the antagonistic pleiotropy theory in aging and discuss the discuss the disposable soma theory. Although both theories describe distinct mechanisms, separating them in empirical research is complicated and needs further studies in future research.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotropy hypothesis of ageing predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive success is associated with advanced ageing.

      Strengths:

      Large sample size. Many analyses.

      Weaknesses:

      Still a number of doubts with regard to some of the results and their interpretation.

      Reviewer #1 (Recommendations for the authors):

      Thank you for the opportunity to review a revised version.

      I still have serious doubts with regard to a number of datasets presented. For example, the results on essential hypertension and cervical cancer show very small effect sizes, but according to the authors still reach the level of statistical significance. This is unlikely to be accurate. For MR analyses, this is nearly impossible. The analyses of these data and the statistical analysis need to be checked for errors and repeated. While BOLT-LLM might not be relevant here, there might be other things happening here. The authors should therefore always interpret the results also with regard to the observed effect sizes instead of only looking at the p-values (0.999 means that there is a 0.1% lower risk).

      Thank you for your suggestions. We have updated the results for essential hypertension, GAD, and cervical cancer in results, figures, and supplemental tables (lines 65-89, Figure 1, Tables S3-S4).

      Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth may have a positive effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identify 128 fertility-related SNPs that associate with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      The authors addressed the remarks on the previous version very well. Addressing the two points below would further increase the quality of the manuscript.

      (1) In the previous version the authors mentioned that their results are also consistent with the disposable soma theory: "These results are also consistent with the disposable soma theory that suggests aging as an outcome tradeoff between an organism's investment in reproduction and somatic maintenance and repair."

      Although the antagonistic pleiotropy and disposable soma theories describe different mechanisms, both provide frameworks for understanding how genes linked to fertility influence health. The antagonistic pleiotropy theory posits that genes enhancing fertility early in life may have detrimental effects later. In contrast, the disposable soma theory suggests that energy allocation involves a trade-off, where investment in fertility comes at the expense of somatic maintenance, potentially leading to poorer health in later life.

      To strengthen the manuscript, a discussion section should be added to clarify the overlap and distinctions between these two evolutionary theories and suggest directions for future research in disentangling their specific mechanisms.

      Thank you for your suggestions to clarify the overlap and distinctions between the antagonistic pleiotropy and disposable soma theories. While our primary focus is on the antagonistic pleiotropy framework, we acknowledge that the disposable soma theory also provides a relevant perspective on the trade-offs between reproduction and somatic maintenance.

      To address this, we have expanded the discussion section to highlight how both theories contribute to our understanding of the relationship between fertility-related traits and aging-related health outcomes. We also suggested potential future research directions, such as integrating genetic data with biomarkers of somatic to further explore the mechanisms underlying these trade-offs (lines 213-223).

      (2) In response to the question why the authors did not include age at menopause in addition to the already included age at first child and age at menarche the following explanation was provided: "Our manuscript focuses on the antagonistic pleiotropy theory, which posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction (like menarche and childbirth) may have costly consequences later. So, we only included age at menarche and age at first childbirth as exposures in our research."

      It remains, however, unclear why genes beneficial for early survival and reproduction would be reflected only in age at menarche and age at first childbirth, but not in age at menopause. While age at menarche marks the onset of fertility, age at menopause signifies its end. Since evolutionary selection acts directly until reproduction is no longer possible (though indirect evolutionary pressures persist beyond this point), the inclusion of additional fertility-related measures could have strengthened the analysis. A more detailed justification for focusing exclusively on age at menarche and first childbirth would enhance the clarity and rigor of the manuscript.

      Thank you for your question regarding the age at menopause in our analysis. Our decision was based on the theoretical framework of antagonistic pleiotropy, which emphasizes early-life reproductive advantages that may have trade-offs later in life. Age at menarche and age at first childbirth are direct markers of early reproductive investment, which align closely with this framework.

      While age at menopause marks the cessation of reproductive capability, its evolutionary role is distinct. The selective pressures acting on menopause are complex and may involve post-reproductive contributions rather than direct reproductive fitness benefits. Moreover, the genetic architecture of menopause may be influenced by different biological pathways compared to early reproductive traits.

      Nonetheless, we acknowledge that including age at menopause could provide additional insights into reproductive aging. Several papers1,2 were already published regarding age at menopause and age-related outcomes, including diabetes, AD, osteoporosis, cancers, and cardiovascular diseases.

      Reviewing Editor (Recommendations for the authors):

      Above/below you will find the remaining comments from the reviewers. One of the main issues remaining is that some of the data seems to be incorrectly analysed and some of the findings may not be correct. To clarify this a lot more, I asked the reviewer for some details and received the following:

      - In Figure 1B one of their main outcomes is "age of menopause", but they report the data as an odds ratio. This is not correct and should be fixed (it seems the authors can run the right analysis, but just reported it with the wrong heading in the figure). This likely also applies to the outcome "facial aging". Also the heading in Figure 1A should be Beta instead of OR.

      We have updated the figures to ensure that the beta values of continuous outcomes and odds ratio values of categorical outcomes are presented in Figure 1.

      - With essential hypertension, GAD and cervical cancer, the estimates are so small that they need to re-review their results. The current MR analysis is not sufficiently powered to have such small confidence intervals. Essential hypertension was based on data from UK biobank, although I was also unable to find what program was used to generate the GWAS results, I have strong thoughts this was also BOLT-LLM. Same for cervical cancer. Both datasets used familial-related samples, so they are very likely derived with BOLT-LLM.

      I hope this will help to solve this issue.

      Based on published paper, gastrointestinal or abdominal disease (GAD) (GWAS ID: ebi-a-GCST90038597) is after BOLT-LLM. Based on MRC IEU UK Biobank GWAS pipeline, version 1 and 2, essential hypertension (GWAS ID: ukb-b-12493) and cervical cancer (GWAS ID: ukb-b-8777) are after BOLT-LLM. We have updated the MR analysis results and figures (lines 65-89, Figure 1, Tables S3-S4) as well as the following IPA analysis (lines 106-162 and 255-280, Figures 2-3).

      (1) Magnus, M. C., Borges, M. C., Fraser, A. & Lawlor, D. A. Identifying potential causal effects of age at menopause: a Mendelian randomization phenome-wide association study. Eur J Epidemiol 37, 971-982 (2022). https://doi.org:10.1007/s10654-022-00903-3

      (2) Zhang, X., Huangfu, Z. & Wang, S. Review of mendelian randomization studies on age at natural menopause. Front Endocrinol (Lausanne) 14, 1234324 (2023). https://doi.org:10.3389/fendo.2023.1234324

    1. eLife Assessment

      This study on the effect of the trophic factor BDNF upon dental cells is an understudied subject that is relevant to dental regeneration and repair. Given that the topic is new and has not been covered previously, the report is a useful foray into a new area of investigation, although several experimental results could be strengthened. The connection of BDNF and dental health is a solid attempt in potentially translating trophic factor signaling clinically, which has been stymied in past efforts.

    2. Joint Public Review:

      This work employs both in vitro and in vivo methods to investigate the contribution of BDNF/TrkB signaling to enhancing differentiation and dentin-repair capabilities of dental pulp stem cells in the context of exposure to a variety of inflammatory cytokines. A particular emphasis of the approach is employment of dental pulp stem cells in which BDNF expression has been enhanced using CRISPR technology. Transplantation of such cells are proposed to improve dentin regeneration in a mouse model of tooth decay. The study provides several interesting findings, including demonstrating that exposure to several cytokines/inflammatory agents increases the quantity of activated phospho-Trk B in dental pulp stem. One issue that was not covered is the involvement of the p75 neurotrophin receptor which is also highly sensitive to inflammation and injury. The conclusions could be further augmented by demonstrating the specificity of the antibodies via immunoblot methods, both in the presence and absence of BDNF and other neurotrophins, NT-3 and NT-4, which can also bind to the TrkB receptor.

    1. eLife Assessment

      A combination of molecular dynamics simulation and state-of-the-art statistical post-processing techniques provided valuable insight into GPCR-ligand dynamics. This manuscript provides solid evidence for differences in the binding/unbinding of classical cannabinoid drugs from new psychoactive substances. The results could aid in mitigating the public health threat these drugs pose.

    2. Reviewer #1 (Public review):

      This manuscript presents insights into biased signaling in GPCRs, namely cannabinoid receptors. Biased signaling is of broad interest in general, and cannabinoid signaling is particular relevant for understanding the impact of new drugs that target this receptor. Mechanistic insight from work like this could enable new approaches to mitigate the public health impact of new psychoactive drugs. Towards that end, this manuscript seeks to understand how new psychoactive substances (NPS, e.g. MDMB-FUBINACA) elicit more signaling through β-arrestin than classical cannabinoids (e.g. HU-210). The authors use an interesting combination of simulations and machine learning.

      The caption for Figure 3 doesn't explain the color scheme, so its not obvious what the start and end states of the ligand are.

      For the metadynamics simulations were multiple Gaussian heights/widths tried to see what, if any, impact that has on the unbinding pathway? That would be useful to help ensure all the relevant pathways were explored.

      It would be nice to acknowledge previous applications of metadynamics+MSMs and (separately) TRAM, such as Simulation of spontaneous G protein activation... (Sun et al. eLife 2018) and Estimation of binding rates and affinities... (Ge and Voelz JCP 2022).

      What is KL divergence analysis between macrostates? I know KL divergence compares probability distributions, but its not clear what distributions are being compared.

      I suggest being more careful with the language of universality. It can be "supported" but "showing" or "proving" its universal would require looking at all possible chemicals in the class.

      Comments on revisions:

      The authors provided appropriate responses to the comments above.

    3. Reviewer #2 (Public review):

      Summary:

      The investigation provides a computational as well as biochemical insights into the (un)binding mechanisms of a pair of psychoactive substances into cannabinoid receptors. A combination of molecular dynamics simulation and a set of state-of-the art statistical post-processing techniques were employed to exploit GPCR-ligand dynamics.

      Strengths:

      The strength of the manuscript lies in usage and comparison of TRAM as well as Markov state modelling (MSM) for investigating ligand binding kinetics and thermodynamics. Usually MSMs have been more commonly used for this purpose. But as the authors have pointed out, implicit in the usage of MSMs lie the assumption of detailed balance, which would not hold true for many cases especially those with skewed binding affinities. In this regard, the author's usage of TRAM which harnesses both biased and unbiased simulations for extracting the same, provides a more appropriate way-out.

      Weaknesses:

      (1) While the authors have used TRAM (by citing MSM to be inadequate in these cases), the thermodynamic comparisons of both techniques provide similar values. In this case, one would wonder what advantage TRAM would hold in this particular case.

      (2) The initiation of unbiased simulations from previously run biased metadynamics simulations would almost surely introduce hysteresis in the analysis. The authors need to address these issues.

      (3) The choice of ligands in the current work seems very forced and none of the results compare directly with any experimental data. An ideal case would have been to use the seminal D.E. Shaw research paper on GPCR/ligand binding as a benchmark and then show how TRAM, using much lesser biased simulation times, would fare against the experimental kinetics or even unbiased simulated kinetics of the previous report

      (4) The method section of the manuscript seems to suggest all the simulations were started from a docked structure. This casts doubt on the reliability of the kinetics derived from these simulations that were spawned from docked structure, instead of any crystallographic pose. Ideally, the authors should have been more careful in choosing the ligands in this work based on the availability of the crystallographic structures.

      (5) The last part of using a machine learning-based approach to analyse allosteric interaction seems to be very much forced, as there are numerous distance-based more traditional precedent analyses that do a fair job of identifying an allosteric job.

      (6) While getting busy with the methodological details of TRAM vs MSM, the manuscript fails to share with sufficient clairty what the distinctive features of two ligand binding mechanisms are.

      Comments on revisions:

      The authors have addressed most of the queries of the reviewer in an adequate manner. However, The current code availability section just provides the link to Python files to generate the plots. It is not very useful in its current form. The code availability section should provide a proper GitHub page that shows the usage of TRAM for the readers to execute. While Pyemma has been cited for TRAM, a python note book to reproduce the TRAM would be very instructive.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This manuscript presents insights into biased signaling in GPCRs, namely cannabinoid receptors. Biased signaling is of broad interest in general, and cannabinoid signaling is particularly relevant for understanding the impact of new drugs that target this receptor. Mechanistic insight from work like this could enable new approaches to mitigate the public health impact of new psychoactive drugs. Towards that end, this manuscript seeks to understand how new psychoactive substances (NPS, e.g. MDMB-FUBINACA) elicit more signaling through βarrestin than classical cannabinoids (e.g. HU-210). The authors use an interesting combination of simulations and machine learning. 

      We thank the reviewer for the comments. We have provided point by point response to the reviewer’s comment below and incorporated the suggestions in our revised manuscript. Modified parts of manuscripts are highlighted in yellow.   

      Comments:

      (1) The caption for Figure 3 doesn't explain the color scheme, so it's not obvious what the start and end states of the ligand are. 

      We thank the reviewer to point this out. We have added the color scheme in the figure caption. 

      (2) For the metadynamics simulations were multiple Gaussian heights/widths tried to see what, if any, impact that has on the unbinding pathway? That would be useful to help ensure all the relevant pathways were explored.  

      We thank the reviewer for the suggestion. We agree with the reviewer that gaussian height/width may impact unbinding pathway. However, we like to point out that we used a well-tempered version of the metadynamics. In well-tempered metadynamics, the effective gaussian height decreases as bias deposition progresses. Therefore, we believe that the gaussian height/width should have minimal impact on the unbinding pathway. To address the reviewer's suggestion, we conducted additional well-tempered metadynamics simulations varying key parameters such as bias height, bias factor, and the deposition rate, all of which can influence the sampling space. Parameter values for bias height, bias factor and deposition rate that we originally used in the paper are 0.4 kcal/mol, 15 and 1/5 ps<sup>-1</sup>, respectively. We explored different values for these parameters and projected the sampled space on top of previously sampled region (Figure S4). We observed that new simulations sample similar unbinding pathway in the extracellular direction and discover similar space in the binding pocket as well. 

      Results and Discussion (Page 10)

      “We also performed unbinding simulations using well-tempered metadynamics parameters (bias height, bias deposition rate and bias factor) to confirm the existence of alternative pathways (Figure S4). However, the simulations show that ligands follow the similar pathway for all

      metadynamics runs.”

      (3) It would be nice to acknowledge previous applications of metadynamics+MSMs and (separately) TRAM, such as the Simulation of spontaneous G protein activation... (Sun et al. eLife 2018) and Estimation of binding rates and affinities... (Ge and Voelz JCP 2022). 

      We appreciate the reviewer's feedback. We have incorporated additional citations of studies demonstrating the use of TRAM as an estimator for both kinetics and thermodynamics (e.g. Ligand binding: Ge, Y. and Voelz, V.A., JCP, 2022[1]; Peptide-protein binding kinetics: Paul, F. et al., Nat. Commun., 2017[2], Ge, Y. et al., JCIM, 2021[3]). Additionally, we have included references to studies where biased simulations were initially used to explore the conformational space, and the results were then employed to seed unbiased simulations for building a Markov state model. (Metadynamics: Sun, X. et al., elife, 2018[4]; Umbrella Sampling: Abella, J. R. et al., PNAS, 2020[5]; Replica Exchange: Paul, F. et al., Nat. Commun., 2017[2]).

      (4) What is KL divergence analysis between macrostates? I know KL divergence compares probability distributions, but it is not clear what distributions are being compared. 

      We apologize for this confusion. The KL divergence analysis was performed on the probability distributions of the inverse distances between residue pairs from any two macrostates. Each macrostate was represented by 1000 frames that were selected proportional to the TRAM stationary density. All possible pair-wise inverse distances were calculated per frame for the purpose of these calculations. Although KL divergence is inherently asymmetric, we symmetrized the measurement by calculating the average. Per-residue K-L divergence, which is shown in the main figures as color and thickness gradient, was calculated by taking the sum of all pairs corresponding to the residue. We have included a detailed discussion of K-L divergence in Methods section.  We have also modified the result section to add a brief discussion of K-L divergence methodology.

      Results and Discussion (Page 15)

      “We further performed Kullback-Leibler divergence (K-L divergence) analysis between inverse distance of residue pairs of two macrostates to highlight the protein region that undergoes high conformational change with ligand movement.”

      Methods (Page 33)

      “Kullback–Leibler divergence (K-L divergence) analysis was performed to show the structural differences in protein conformations in different macrostates[4,114] . In this study, this technique was used to calculate the difference in the pairwise inverse distance distributions between macrostates. Each macrostate was represented by 1000 frames that were selected proportional to their TRAM weighted probabilities. Although K-L divergence is an asymmetric measurement, for this study, we used a symmetric version of the K-L divergence by taking the average between two macrostates. Per residue contribution of K-L divergence was calculated by taking the sum of all the pairwise distances corresponding to that residue. This analysis was performed by inhouse Python code.”  

      (5) I suggest being more careful with the language of universality. It can be "supported" but "showing" or "proving" its universal would require looking at all possible chemicals in the class. 

      We thank the reviewer for the suggestion. In response, we have revised the manuscript to ensure that the language reflects that our findings are based on observations from a limited set of ligands, namely one NPS and one classical cannabinoid. We have replaced references to ligand groups (such as NPS or classical cannabinoid) with the specific ligand names (such as MDMB-FUBINACA or HU-210) to avoid claims of universality and prevent any potential confusion.

      Results and Discussion (Page 19)

      “In this work, we trained the network with the NPS (MDMB-FUBINACA), and classical cannabinoid (HU-210) bound unbiased trajectories (Method Section). Here, we compared the allosteric interaction weights between the binding pocket and the NPxxY motif which involves in triad interaction formation. Results show that each binding pocket residue in MDMBFUBINACA bound ensemble shows higher allosteric weights with the NPxxY motif, indicating larger dynamic interactions between the NPxxY motif and binding pocket residues(Figure S9).  The probability of triad formation was estimated to observe the effect of the difference in allosteric control. TRAM weighted probability calculation showed that MDMB-FUBINACA bound CB1 has the higher probability of triad formation (Figure 8A). Comparison of the pairwise interaction of the triad residues shows that interaction between Y397<sup>7.53</sup>-T210<sup>3.46</sup> is relatively more stable in case of MDMB-FUBINACA bound CB1, while other two inter- actions have similar behavior for both systems (Figures S10A, S10B, and S10C). Therefore, higher interaction between Y397<sup>7.53</sup> and T210<sup>3.46</sup> in MDMB-FUBINACA bound receptor causes the triad interaction to be more probable. 

      Furthermore, we also compared TM6 movement for both ligand bound ensemble which is another activation metric involved in both G-protein and β-arrestin binding. Comparison of TM6 distance from the DRY motif of TM3 shows similar distribution for HU-210 and MDMBFUBINACA (Figure 8B). These observations support that NPS binding causes higher β-arrestin signaling by allosterically controlling triad interaction formation.” 

      Reviewer #2 (Public Review): 

      Summary: 

      The investigation provides computational as well as biochemical insights into the (un)binding mechanisms of a pair of psychoactive substances into cannabinoid receptors. A combination of molecular dynamics simulation and a set of state-of-the art statistical post-processing techniques were employed to exploit GPCR-ligand dynamics. 

      Strengths: 

      The strength of the manuscript lies in the usage and comparison of TRAM as well as Markov state modelling (MSM) for investigating ligand binding kinetics and thermodynamics. Usually, MSMs have been more commonly used for this purpose. But as the authors have pointed out, implicit in the usage of MSMs lies the assumption of detailed balance, which would not hold true for many cases especially those with skewed binding affinities. In this regard, the author's usage of TRAM which harnesses both biased and unbiased simulations for extracting the same, provides a more appropriate way out. 

      Weaknesses: 

      (1) While the authors have used TRAM (by citing MSM to be inadequate in these cases), the thermodynamic comparisons of both techniques provide similar values. In this case, one would wonder what advantage TRAM would hold in this particular case. 

      We thank the reviewer for the comment. While we agree that the thermodynamic comparisons between MSM and TRAM provide similar values in this instance, we would like to emphasize the underlying reasoning behind our choice of TRAM.

      MSM can struggle to accurately estimate thermodynamic and kinetic properties in cases where local state reversibility (detailed balance) is not easily achieved with unbiased sampling. This is especially relevant in ligand unbinding processes, which often involve overcoming high free energy barriers. TRAM, by incorporating biased simulation data (such as umbrella sampling) in addition to unbiased data, can better achieve local reversibility and provide more robust estimates when unbiased sampling is insufficient.

      The similarity in thermodynamic estimates between MSM and TRAM in our study can be attributed to the relatively long unbiased sampling period (> 100 µs) employed. With sufficient sampling, MSM can approach detailed balance, leading to results comparable to those from TRAM. However, as we demonstrated in our manuscript (Figure 4D), when the amount of unbiased sampling is reduced, the uncertainties in both the thermodynamics and kinetics estimates increase significantly for MSM compared to TRAM. Thus, while MSM and TRAM perform similarly under the conditions of extensive sampling, TRAM's advantage lies in its robustness when unbiased sampling is limited or difficult to achieve. 

      (2) The initiation of unbiased simulations from previously run biased metadynamics simulations would almost surely introduce hysteresis in the analysis. The authors need to address these issues. 

      We thank the reviewer for the comment. We acknowledge that biased simulations could potentially introduce hysteresis or result in the identification of unphysical pathways. However, we believe this issue is mitigated using well-tempered metadynamics, which gradually deposit a decaying bias. This approach enables the simulation to explore orthogonal directions of collective variable (CV) space, reducing the likelihood of hysteresis effects(Invernizzi, M. and Parrinello, M., JCTC, 2019[6]).

      Furthermore, there is precedent for using metadynamics-derived pathways to initiate unbiased simulations for constructing Markov State Models (MSMs). This methodology has been successfully applied in studying G-protein activation (Sun, X. et al., elife, 2018[4]).

      Additional support to our observation can be found in two independent binding/unbinding studies of ligands from cannabinoid receptors, which have discovered similar pathway using different CVs (Saleh, et al., Angew. Chem., 2018[7]; Hua, T. et al., Cell, 2020[8]).   

      (3) The choice of ligands in the current work seems very forced and none of the results compare directly with any experimental data. An ideal case would have been to use the seminal D.E. Shaw research paper on GPCR/ligand binding as a benchmark and then show how TRAM, using much lesser biased simulation times, would fare against the experimental kinetics or even unbiased simulated kinetics of the previous report 

      We would like to address the reviewer's concerns regarding the choice of ligands, lack of direct experimental comparison, and the use of TRAM, and clarify our rationale point by point:

      Ligand Choice: The ligands selected for this study were chosen due to their relevance and well characterized binding properties. MDMB-FUBINACA is well-known NPS ligand with documented binding properties. This ligand is still the only NPS ligand with experimentally determined CB1 bound structure (Krishna Kumar, K. et al., Cell, 2019[9]). Similarly, the classical cannabinoid (HU-210) used in this study has established binding characteristics and is one of earliest known synthetic classical cannabinoid. Therefore, these ligands serve as representative compounds within their respective categories, making them suitable for our comparative analysis.

      Experimental Comparison: We have indeed compared our simulation results to experimental data, particularly focusing on binding free energies. In the result section, we have shown that the relative binding free energy estimated from our simulation aligns closely with the experimentally measured values. Additionally, Absolute binding energy estimates are also within ~3 kcal/mol of the experimentally predicted value.

      TRAM Performance: TRAM estimated free energies, and rates have been benchmarked against experimental predictions for various studies along with our study (Peptide-protein binding: Paul, F. et al., Nat. Commun., 2017[2]; Ligand unbinding: Wu, H. et al., PNAS, 2016[10]) . As the primary goal of this study is to compare ligand unbinding mechanism, we believe benchmarking against other datasets, such as the D.E. Shaw GPCR/ligand binding paper, is not essential for this work.

      (4) The method section of the manuscript seems to suggest all the simulations were started from a docked structure. This casts doubt on the reliability of the kinetics derived from these simulations that were spawned from docked structure, instead of any crystallographic pose. Ideally, the authors should have been more careful in choosing the ligands in this work based on the availability of the crystallographic structures. 

      We thank the reviewer for the comment. We would like to clarify that we indeed used an experimentally derived pose for one of the ligands (MDMB-FUBINACA) as the cryo-EM structure of MDMB-FUBINACA bound to the protein was available (PDB ID: 6N4B) (Krishna Kumar K. et al., Cell, 2019[9]). However, as the cryo-EM structure had missing loops, we modeled these regions using Rosetta. We apologize for this confusion and have modified our method section to make this point clearer. 

      Regarding HU-210, we acknowledge that a crystallographic or cryo-EM structure for this specific ligand was not available. We selected HU-210 because it is most commonly used example of classical cannabinoid in the literature with extensively studied thermodynamic properties. Importantly, our docking results for HU-210 align closely with previously experimentally determined poses for other classical cannabinoids (Figure S11) and replicate key polar interactions, such as those with S383<sup>7.39</sup>, which are characteristic of this class of compounds. 

      System Preparation (Page 22)

      “Modeling of this membrane proximal region was also performed Remodel protocol of Rosetta loop modeling. A distance constraint is added during this modeling step between C98N−term and C107N−term to create the disulfide bond between the residues. [74,76] 

      As the cryo-EM structure of MDMB-FUBINACA was known, ligand coordinate of MDMB- FUBINACA was added to the modeled PDB structure. The “Ligand Reader & Modeler” module of CHARMM-GUI was used for ligand (e.g., MDMB-Fubinaca) parameterization using CHARMM General Force Field (CGenFF).[77]”

      (5) The last part of using a machine learning-based approach to analyze allosteric interaction seems to be very much forced, as there are numerous distance-based more traditional precedent analyses that do a fair job of identifying an allosteric job. 

      We thank the reviewer for the valuable comment. Neural relational inference method, which leverages a VAE (Variational Autoencoder) architecture, attempts to reconstruct the conformation (X) at time t + τ based on the conformation at time t. In doing so, it captures the non-linear dynamic correlations between residues in the VAE latent space. We chose this method because it is not reliant on specific metrics such as distance or angle, making it potentially more robust in predicting allosteric effects between the binding pocket residues and the NPxxY motif.

      In response to the reviewer's suggestion, we have also performed a more traditional allosteric analysis by calculating the mutual information between the binding pocket residues and the NPxxY motif. Mutual information was computed based on the backbone dihedral angles, as this provides a metric that is independent of the relative distances between residues. Our results indicate that the mutual information between the binding pocket residues and the NPxxY motif is indeed higher for the NPS binding simulation (Figure S11).

      Method

      Mutual information calculation

      Mutual information was calculated on same trajectory data as NRI analysis. Python package MDEntropy was used for estimating mutual information between backbone dihedral angles of two residues. 

      Results and Discussion (Page 21)

      “To further validate our observations, we estimated allosteric weights between the binding pocket and the NPxxY motif by calculating mutual information between residue movements. Mutual information analysis reaffirms that allosteric weights between these residues are indeed higher for the MDMB-FUBINACA bound ensemble (Figure S11).”

      Mutual Information Estimation (Page 37)

      “Mutual information between dynamics of residue pairs was computed based on the backbone dihedral angles, as this provides a metric that is independent of the relative distances between residues. The calculations were done on same trajectory data as NRI analysis. Python package MDEntropy was used for estimating mutual information between backbone dihedral angles of two residues.[124]”

      (6) While getting busy with the methodological details of TRAM vs MSM, the manuscript fails to share with sufficient clarity what the distinctive features of two ligand binding mechanisms are. 

      We thank the reviewer for the insightful comment. In the manuscript, we discussed that the overall ligand (un)binding pathways are indeed similar for both ligands. Therefore, they interact with similar residues during the unbinding process. However, we have focused on two key differences in unbinding mechanism between the two ligands:

      (1) MDMB-FUBINACA exhibits two distinct unbinding mechanisms. In one, the linked portion of the ligand exits the receptor first. In the other mechanism, the ligand rotates within the pocket, allowing the tail portion to exit first. By contrast, for HU-210, we observe only a single unbinding mechanism, where the benzopyran ring leads the ligand out of the receptor. We have highlighted these differences in the Figure 6 and 7 and talked about the intermediate states appear along these different unbinding mechanisms. For further clarification of these differences, we have added arrows in the free energy landscapes to highlight these distinct pathways.

      (2) In the bound state, a significant difference is observed in the interaction profiles. HU-210, a classical cannabinoid, forms strong polar interactions with TM7, while MDMB-FUBINACA shows weaker polar interactions with this region.

      We have discussed these differences in the Results and Discussion section (Page 13-18) & conclusion section (Page 23-24).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      (1) The authors should choose at least one case where the ligand's crystallographic pose is known and show how TRAM works in comparison to MSM or experimental report. 

      We thank the reviewer for the comment. We have used the experimentally determined cryo-EM pose for one of the ligands (i.e. MDMB-FUBINACA).  We have modified the manuscript to avoid confusion. (Please refer to the response of comment 4 of reviewer 2)

      (2) The authors should consider existing traditional methods that are used to detect allostery and compare their machine-learning-based approach to show its relevance. 

      We appreciate the reviewer’s comment. We have performed the traditional analysis by calculating mutual information between residue dynamics. We have shown that the traditional analysis matches with Machine learning based NRI calculation. (Please refer to the response of comment 5 of reviewer 2)

      (3) Figure 3 doesn't provide a guide on the pathway of ligand. Without a proper arrow, it is difficult to surmise what is the start and end of the pathway. The figures should be improved. 

      We appreciate the reviewer’s suggestion. In response, we have revised Figure 3 to clearly indicate the ligand’s unbinding pathway by adding directional arrows and labeling the bound pose. Additionally, we have updated the figure caption to better clarify the color scheme used in the illustration. 

      (4) The Figure 5 presentation of free energetics has a very similar shape for the two ligands. More clarity is required on how these two ligands are different. 

      We thank the reviewer for the comment. While the overall shapes of the free energy profiles for the two ligands are indeed similar, this is expected as both ligands dissociate from the same pocket and follow a comparable pathway. However, key differences in their unbinding mechanisms arise due to variations in the ligand motion within the pocket. Specifically, the intermediate metastable minima in the free energy landscapes reflect these differences. For instance, in the NPS unbinding free energy landscape, the intermediate metastable state I1 corresponds to a conformation where the NPS ligand maintains a polar interaction with TM7, while the tail of the ligand has shifted away from TM5. This intermediate state is absent in the classical cannabinoid unbinding pathway, where no equivalent conformation appears in the landscape.  

      (6) Page 30: TICA is wrongly expressed as 'Time-independent component analysis'. It is not a time-independent process. Rather it is 'Time structured independent component analysis'. 

      We thank the reviewer for pointing this out. TICA should be expressed as Time-lagged independent component analysis or Time-structure independent component analysis. We have used the first expression and modified the manuscript accordingly.  

      (7) The manuscript's MSM theory part is quite well-known which can be removed and appropriate papers can be cited. 

      We thank the reviewer for the comment. We have removed the theory discussion of MSM and cited relevant papers.

      “Markov State Model

      Markov state model (MSM) is used to estimate the thermodynamics and kinetics from the unbiased simulation.[56,91] MSM characterizes a dynamic process using the transition probability matrix and estimates its relevant thermodynamics and kinetic properties from the eigendecomposition of this matrix. This matrix is usually calculated using either maximum likelihood or Bayesian approach.[56,97] The prevalence of MSM as a post-processing technique for MD simulations was due to its reliance on only local equilibration of MD trajectories to predict the global equilibrium properties.[92,93] Hence, MSM can combine information from distinct short trajectories, which can only attain the local equilibrium.[94–96]  

      The following steps are taken for the practical implementation of the MSM from the MD data. [4,17,98–100]”

      (8) A proper VAMP score-based analysis should be provided to show confidence in MSM's clustering metric and other hyperparameters. 

      We thank the reviewer for the recommendation. VAMP-2 score based analysis had been discussed in the method section.  We estimated VAMP-2 score of MSM built with different cluster number and input TIC dimensions (Figure S15). Model with best VAMP-2 was selected for comparison with TRAM result.

    1. eLife Assessment

      This fundamental study investigates the role of polyunsaturated fatty acids (PUFAs) in physiology and membrane biology, using a unique model to perform a thorough genetic screen that demonstrates that PUFA synthesis defects cannot be compensated for by mutations in other pathways. These findings are supported by compelling evidence from a high quality genetic screen, functional validation of their hits, and lipid analyses. This study will appeal to researchers in membrane biology, lipid metabolism, and C. elegans genetics.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the roles of polyunsaturated fatty acids (PUFAs) in animal physiology and membrane function. A C. elegans strain carrying the fat-2(wa17) mutation possess a very limited ability to synthesize PUFAs and there is no dietary input because the E. coli diet consumed by lab grown C. elegans does not contain any PUFAs. The fat-2 mutant strain was characterized to confirm that the worms grow slowly, have rigid membranes, and have a constitutive mitochondrial stress response. The authors showed that chemical treatments or mutations known to increase membrane fluidity did not rescue growth defects. A thorough genetic screen was performed to identify genetic changes to compensate for the lack of PUFAs. The newly isolated suppressor mutations that compensated for FAT-2 growth defects included intergenic suppressors in the fat-2 gene, as well as constitutive mutations in the hypoxia sensing pathway components EGL-9 and HIF-1, and loss of function mutations in ftn-2, a gene encoding the iron storage protein ferritin. Taken together, these mutations lead to the model that increased intracellular iron, an essential cofactor for fatty acid desaturases, allows the minimally functional FAT-2(wa17) enzyme to be more active, resulting in increased desaturation and increased PUFA synthesis.

      Strengths:

      (1) This study provides new information further characterizing fat-2 mutants. The authors measured increased rigidity of membranes compared to wild type worms, however this rigidity is not able to be rescued with other fluidity treatments such as detergent or mutants. Rescue was only achieved with polyunsaturated fatty acid supplementation.<br /> (2) A very thorough genetic suppressor screen was performed. In addition to some internal fat-2 compensatory mutations, the only changes in pathways identified that are capable of compensating for deficient PUFA synthesis was the hypoxia pathway and the iron storage protein ferritin. Suppressor mutations included an egl-9 mutation that constitutively activates HIF-1, and Gain of function mutations in hif-1 that are dominant. This increased activity of HIF conferred by specific egl-9 and hif-1 mutations lead to decreased expression of ftn-2. Indeed, loss of ftn-2 leads to higher intracellular iron. The increased iron apparently makes the FAT-2 fatty acid desaturase enzyme more active, allowing for the production of more PUFAs.<br /> (3) The mutations isolated in the suppressor screen show that the only mutations able to compensate for lack of PUFAs were ones that increased PUFA synthesis by the defective FAT-2 desaturase, thus demonstrating the essential need for PUFAs that cannot be overcome by changes in other pathways. This is a very novel study, taking advantage of genetic analysis of C. elegans, and it confirms the observations in humans that certain essential PUFAs are required for growth and development.<br /> (4) Overall, the paper is well written, and the experiments were carried out carefully and thoroughly. The conclusions are well supported by the results.

      Weaknesses:

      Overall, there are not many weaknesses. The main one I noticed is that the lipidomic analysis shown in Figs 3C, 7C, S1 and S3. Whie these data are an essential part of the analysis and provide strong evidence for the conclusions of the study, it is unfortunate that the methods used did not enable the distinction between two 18:1 isomers. These two isomers of 18:1 are important in C. elegans biology, because one is a substrate for FAT-2 (18:1n-9, oleic acid) and the other is not (18:1n-7, cis vaccenic acid). Although rarer in mammals, cis-vaccenic acid is the most abundant fatty acid in C. elegans and is likely the most important structural MUFA. The measurement of these two isomers is not essential for the conclusions of the study, but the manuscript should include a comment about the abundance of oleic vs vaccenic acid in C. elegans (authors can find this information, even in the fat-2 mutant, in other publications of C. elegans fatty acid composition). Otherwise, readers who are not familiar with C. elegans might assume the 18:1 that is reported is likely to be mainly oleic acid, as is common in mammals.

      Other suggestions to authors to improve the paper:<br /> (1) The title could be less specific; it might be confusing to readers to include the allele name in the title.<br /> (2) There are two errors in the pathway depicted in Figure 1A. The16:0-16:1 desaturation can be performed by FAT-5, FAT-6, and FAT-7. The 18:0-18:1 desaturation can only be performed by FAT-6 and FAT-7

    3. Reviewer #2 (Public review):

      Summary:

      The authors use a genetic screen in C. elegans to investigate the physiological roles of polyunsaturated fatty acids (PUFAs). They screen for mutations that rescue fat-2 mutants, which have strong reductions in PUFAs. As a result, either mutations in fat-2 itself, or mutations in genes involved in the HIF-1 pathway, were found to rescue fat-2 mutants. Mutants in the HIF-1 pathway rescue fat-2 mutants by boosting its catalytic activity (via upregulated Fe2+). Thus, the authors show that in the context of fat-2 mutation, the sole genetic means to rescue PUFA insufficiency is to restore PUFA levels.

      Strengths:

      As C. elegans can produce PUFAs de novo as essential lipids, the genetic model is well suited to study the fundamental roles of PUFAs. The genetic screen finds mutations in convergent pathways, suggesting that it has reached near-saturation. The authors extensively validate the results of the screening and provide sufficient mechanistic insights to show how PUFA levels are restored in HIF-1 pathway mutants. As many of the mutations found to rescue fat-2 mutants are of gain-of-function, it is unlikely that similar discoveries could have been made with other approaches like genome-wide CRISPR screenings, making the current study distinctive. Consequently, the study provides important messages. First, it shows that PUFAs are essential for life. The inability to genetically rescue PUFA deficiency, except for mutations that restore PUFA levels, suggests that they have pleiotropic essential functions. In addition, the results suggest that the most essential functions of PUFAs are not in fluidity regulation, which is consistent with recent reviews proposing that the importance of unsaturation goes beyond fluidity (doi: 10.1016/j.tibs.2023.08.004 and doi: 10.1101/cshperspect.a041409). Thus, the study provides fundamental insights about how membrane lipid composition can be linked to biological functions.

      Weaknesses:

      The authors did a lot of efforts to answer the questions that arose through peer review, and now all the claims seem to be supported by experimental data. Thus, I do not see obvious weaknesses. Of course, it remains still unclear what PUFAs do beyond fluidity regulation, but this is something that cannot be answered from a single study. I just have one final proposition to make.

      I still do not agree with the answer to my previous comment 6 regarding Figure S2E. The authors claim that hif-1(et69) suppresses fat-2(wa17) in a ftn-2 null background (in Figure S2 legend for example). To claim so, they would need to compare the triple mutant with fat-2(wa17);ftn-2(ok404) and show some rescue. However, we see in Figure 5H that ftn-2(ok404) alone rescues fat-2(wa17). Thus, by comparing both figures, I see no additional effect of hif-1(et69) in an ftn-2(ok404) background. I actually think that this makes more sense, since the authors claim that hif-1(et69) is a gain-of-function mutation that acts through suppression of ftn-2 expression. Thus, I would expect that without ftn-2 from the beginning, hif-1(et69) does not have an additional effect, and this seems to be what we see from the data. Thus, I would suggest that the authors reformulate their claims regarding the effect of hif-1(et69) in the ftn-2(ok404) background, which seems to be absent (consistently with what one would expect).

    1. eLife Assessment

      This paper provides a useful new theory of the hallucinatory effects of psychedelics. The authors present convincing evidence that a computational model trained with the Wake-Sleep algorithm can reproduce some features of hallucinations by varying the strength of top-down connections in the model, but discussion of the model's relationships to the psychedelics and sleep literatures is incomplete. The work will be of interest to researchers studying hallucinations or offline activity and learning more broadly.

    2. Reviewer #1 (Public review):

      Bredenberg et al. aim to model some of the visual and neural effects of psychedelics via the Wake-Sleep algorithm. This is an interesting study with findings that go against certain mainstream ideas in psychedelic neuroscience (that I largely agree with). I cannot speak to the math in this manuscript, but it seems like quite a conceptual leap to set a parameter of the model in between wake and sleep and state that this is a proxy to acute psychedelic effects (point #20). My other concerns below are related to the review of the psychedelic literature:

      (1) Page 1, Introduction, "...they are agonists for the 5-HT2a serotonin receptor commonly expressed on the apical dendrites of cortical pyramidal neurons..." It is a bit redundant to say "5-HT2A serotonin receptor," as serotonin is already captured by its abbreviation (i.e., 5-HT).

      While psychedelic research has focused on 5-HT2A expression on cortical pyramidal cells, note that the 5-HT2A receptor is also expressed on interneurons in the medial temporal lobe (entorhinal cortex, hippocampus, and amygdala) with some estimates being >50% of these neurons (https://doi.org/10.1016/j.brainresbull.2011.11.006, https://doi.org/10.1007/s00221-013-3512-6, https://doi.org/10.7554/eLife.66960, https://doi.org/10.1016/j.mcn.2008.07.005, https://doi.org/10.1038/npp.2008.71, https://doi.org/10.1038/s41386-023-01744-8, https://doi.org/10.1016/j.brainres.2004.03.016, https://doi.org/10.1016/S0022-3565(24)37472-5, https://doi.org/10.1002/hipo.22611, https://doi.org/10.1016/j.neuron.2024.08.016). However, with ~1:4 ratio of inhibitory to excitatory neurons in the brain (https://doi.org/10.1101/2024.09.24.614724), this can make it seem as if 5-HT2A expression is negligible in the MTL. I think it might be important to mention these receptors, as this manuscript discusses replay.

      I see now that Figure 1 mentions that PV cells also express 5-HT2A receptors. This should probably be mentioned earlier.

      (2) Page 1, Introduction, "They have further been used for millennia as medicine and in religious rituals..." This might be a romanticization of psychedelics and indigenous groups, as anthropological evidence suggests that intentional psychedelic use might actually be more recent (see work by Manvir Singh and Andy Letcher).

      (3) When discussing oneirogens, it could be worth differentiating psychedelics from kappa opioid agonists such as ibogaine and salvinorin A, another class of hallucinogens that some refer to as "oneirogens" (similar to how "psychedelic" is the colloquial term for 5-HT2A agonists). Note that studies have found the effects of Salvia divinorum (which contains salvinorin A) to be described more similarly to dreams than psychedelics (https://doi.org/10.1007/s00213-011-2470-6). This makes me wonder why the present study is more applicable to 5-HT2A psychedelics than other kappa opioid agonists or other classes of hallucinogens (e.g., NMDA antagonists, muscarinic antagonists, GABAA agonists).

      (4) Page 2, Introduction, "Replay sequences have been shown to be important for learning during sleep [14, 15, 16, 17, 18]: we propose that mechanisms supporting replay-dependent learning during sleep are key to explaining the increases in plasticity caused by psychedelic drug administration." I'm not sure I follow the logic of this point. Dreams happen during REM sleep, whereas replay is most prominent during non-REM sleep. Moreover, while it's not clear what psychedelics do to hippocampal function, most evidence would suggest they impair it. As mentioned, most 5-HT2A receptors in the hippocampus seem to be on inhibitory neurons, and human and animal work finds that psychedelics impair hippocampal-dependent memory encoding (https://doi.org/10.1037/rev0000455, https://doi.org/10.1037/rev0000455, https://doi.org/10.3389/fnbeh.2014.00180, https://doi.org/10.1002/hipo.22712). One study even found that psilocin impairs hippocampal-dependent memory retrieval (https://doi.org/10.3389/fnbeh.2014.00180). Note that this is all in reference to the acute effects (psychedelics may post-acutely enhance hippocampal-dependent memory, https://doi.org/10.1007/s40265-024-02106-4).

      (5) Page 2, Introduction, "In total, our model of the functional effect of psychedelics on pyramidal neurons could provide a explanation for the perceptual psychedelic experience in terms of learning mechanisms for consolidation during sleep..." In contrast to my previous point, I think this could be possible. Three datasets have found that psychedelics may enhance cortical-dependent memory encoding (i.e., familiarity; https://doi.org/10.1037/rev0000455, https://doi.org/10.1037/rev0000455), and two studies found that post-encoding administration of psychedelics retroactively enhanced memory that may be less hippocampal-dependent/more cortical-dependent (https://doi.org/10.1016/j.neuropharm.2012.06.007, https://doi.org/10.1016/j.euroneuro.2022.01.114). Moreover, and as mentioned below, 5 studies have found decoupling between the hippocampus and the cortex (https://doi.org/10.3389/fnhum.2014.00020, https://doi.org/10.1002/hbm.22833, https://doi.org/10.1016/j.celrep.2021.109714, https://doi.org/10.1162/netn_a_00349, https://doi.org/10.1038/s41586-024-07624-5), something potentially also observed during REM sleep that is thought to support consolidation (https://doi.org/10.1073/pnas.2123432119). These findings should probably be discussed.

      (6) Page 2, Introduction, "In this work, we show that within a neural network trained via Wake-Sleep, it is possible to model the action of classical psychedelics (i.e. 5-HT2a receptor agonism)..." Note that 5-HT2A agonism alone is not sufficient to explain the effects of psychedelics, given that there are 5-HT2A agonists that are non-hallucinogenic (e.g., lisuride).

      (7) Page 2, Introduction, "...by shifting the balance during the wake state from the bottom-up pathways to the top-down pathways, thereby making the 'wake' network states more 'dream-like'." I could have included this in the previous point, but I felt that this idea deserved its own point. There has been a rather dogmatic assertion that psychedelics diminish top-down processing and/or enhance bottom-up processing, and I appreciate that the authors have not accepted this as fact. However, because this is an unfortunately prominent idea, I think it ought to be fleshed out more by first mentioning that it's one of the tenets of REBUS. REBUS has become a popular model of psychedelic drug action, but it's largely unfalsifiable (it's based on two unfalsifiable models, predictive processing and integrated information theory), so the findings from this study could tighten it up a bit. Second, there have now been a handful of studies that have attempted to study directionality in information flow under psychedelics, and the findings are rather mixed including increased bottom-up/decreased top-down effects (https://doi.org/10.7554/eLife.59784, https://doi.org/10.1073/pnas.1815129116; note that the latter "bottom-up" effect involves subcortical-cortical connections in which it's less clear what's actually "higher-/lower-level"), increased top-down/decreased bottom-up effects (https://doi.org/10.1038/s41380-024-02632-3, https://doi.org/10.1016/j.euroneuro.2016.03.018), or both (https://doi.org/10.1016/j.neuroimage.2019.116462, https://doi.org/10.1016/j.neuropharm.2017.10.039), though most of these studies are aggregating across largely inhomogeneous states (i.e., resting-state). Lastly, and somewhat problematically, facilitated top-down processing is also an idea proposed in psychosis that's based partially on findings with acute ketamine administration (note that all hallucinations to some degree might rely on top-down facilitation, as a hallucination involves a high-level concept that impinges on lower-level sensory areas; see work by Phil Corlett). While psychosis and the effects of ketamine have some similarities with psychedelics, there are certainly differences, and I think the goal of this manuscript is to uniquely describe 5-HT2A psychedelics (again, I'm left wondering why tweaking alpha in the Wake-Sleep algorithm is any more applicable to psychedelics than other hallucinogenic conditions).

      (8) Figure 2 equates alpha with a "psychedelic dose," but this is a bit misleading, as neither the algorithm nor an individual was administered a psychedelic. Alpha is instead a hypothetical proxy for a psychedelic dose. Moreover, if the model were recapitulating the effects of psychedelics, shouldn't these images look more psychedelic as alpha increases (e.g., they may look like images put through the DeepDream algorithm).

      (9) Page 11, Methods, "...and the gate α ensures that learning only occurs during sleep mode... The (1 − α) gate in this case ensures that plasticity only occurs during the Wake mode." Much of the math escapes me, so perhaps I'm misunderstanding these statements, but learning and plasticity certainly happen during both wake and sleep, making me wonder what is meant by these statements. Moreover, if plasticity is simply neural changes, couldn't plasticity be synonymous with neural learning? Perhaps plasticity and learning are meant to refer to different types of neural changes. It might be worth clarifying this, as a general problem in psychedelic research is that psychedelics are described as facilitating plasticity when brains are changing at every moment (hence not experiencing every moment as the same), and psychedelics don't impact all forms of plasticity equally. For example, psychedelics may not necessarily enhance neurogenesis or the addition of certain receptor types, and they impair certain forms of learning (i.e., episodic memory encoding). What is typically meant by plasticity enhancements induced by psychedelics (and where there's the most evidence) is dendritic plasticity (i.e., the growth of dendrites and spines). Whatever is meant by "plasticity" should be clarified in its first instance in this manuscript.

      (10) Page 12, Methods, "During training, neural network activity is either dominated entirely by bottom-up inputs (Wake, α = 0) or by top-down inputs (Sleep, α = 1)." Again, I could be misunderstanding the mathematical formulation, but top-down inputs operate during wake, and bottom-up inputs can operate during sleep (people can wake up or even incorporate noise from their environments into sleep.

      (11) Page 4, Results, "Thus, we can capture the core idea behind the oneirogen hypothesis using the Wake-Sleep algorithm, by postulating that the bottom-up basal synapses are predominantly driving neural activity during the Wake phase (when α is low)." However, several pieces of evidence (and the first circuit model of psychedelic drug action) suggest that psychedelics enhance functional connectivity and potentially even effective connectivity from the thalamus to the cortex (https://doi.org/10.1093/brain/awab406). Note that psychedelics may not equally impact all subcortical structures. REBUS proposes the opposite of the current study, that psychedelics facilitate bottom-up information flow, with one of the few explicit predictions being that psychedelics should facilitate information flow from the hippocampus to the default mode network. However, as mentioned earlier, 5 studies have found that psychedelics diminish functional connectivity between the hippocampus and cortex (including the DMN but also V1).

      (12) Page 4, Results, "...and have an excitatory effect that positively modulates glutamatergic transmission..." Note that this may not be brainwide. While psychedelics were found to increase glutamatergic transmission in the cortex, they were also found to decrease hippocampal glutamate (consistent with inhibition of the hippocampus, https://doi.org/10.1038/s41386-020-0718-8).

      (13) Page 5, "...which are similar to the 'breathing' and 'rippling' phenomena reported by psychedelic drug users at low doses..." Although it's sometimes unclear what is meant by "low doses," the breathing/rippling effect of psychedelics occurs at moderate and high doses as well.

      (14) I watched the videos, and it's hard for me to say there was some stark resemblance to psychedelic imagery. In contrast, for example, when the DeepDream algorithm came out, it did seem to capture something quite psychedelic.

      (15) Page 5, "This form of strongly correlated tuning has been observed in both cortex and the hippocampus." If this has been observed under non-psychedelic conditions, what does this tell us about this supposed model of psychedelics?

      (16) Page 6, with regards to neural variability, "...but whether this phenomenon [increased variability] is general across tasks and cortical areas remains to be seen." First, is variability here measured as variance? In fMRI datasets that have been used to support the Entropic Brain Hypothesis, note that variance tends to decrease, though certain measures of entropy increase (e.g., Figure 4A here https://doi.org/10.1073/pnas.1518377113 shows global variance decreases, and this reanalysis of those data https://doi.org/10.1002/hbm.23234 finds some entropy increases). Thus, variance and entropy should not be confused (in theory, one could cycle through several more brain states that are however, similar to each other, which would produce more entropy with decreased variance). Second, and perhaps more problematically for the EBH, is that the entropy effects of psychedelics completely disappear when one does a task, and unfortunately, the authors of these findings have misinterpreted them. What they'll say is that engaging in boring cognitive tasks or watching a video decreases entropy under psychedelics, but what you can see in Figure 1b of https://doi.org/10.1021/acschemneuro.3c00289 and Figure 4b of https://doi.org/10.1038/s41586-024-07624-5 is that entropy actually increases under sober conditions when you do a task. That is, it's a rather boring finding. Essentially, when resting in a scanner while sober, many may actually rest (including falling asleep, especially when subjects are asked to keep their eyes closed), and if you perform a task, brain activity should become more complex relative to doing nothing/falling asleep. When under a psychedelic, one can't fall asleep and thus, there's less change (though note that both of the above studies found numerical increases when performing tasks). Lastly, again I should note that the findings of the present study actually go against EBH/REBUS, given that the findings are increased top-down effects when EBH/REBUS predicts decreased top-down/increased bottom-up effects.

      (17) Page 6, "Because psychedelic drug administration increases influence of apical dendritic inputs on neural activity in our model, we found that silencing apical dendritic activity reduced across stimulus neural variability more as the psychedelic drug dose increases." I again want to point out that alpha is not the equivalent of a psychedelic dose here, but rather a parameter in the model that is being proposed as a proxy.

      (18) Page 8, "Experimentally, plasticity dynamics which could, theoretically, minimize such a prediction error have been observed in cortex [66, 67], and it has also been proposed that behavioral timescale plasticity in the hippocampus could subserve a similar function [68]. We found that plasticity rules of this kind induce strong correlations between inputs to the apical and basal dendritic compartments of pyramidal neurons, which have been observed in the hippocampus and cortex [55, 56]." Note that the plasticity effects of psychedelics are sometimes not observed in the hippocampus or are even observed as decreases (reviewed in https://doi.org/10.1038/s41386-022-01389-z).

      (19) Page 9, as is mentioned, REBUS proposes that there should be a decrease in top-down effects under psychedelics, which goes against what is found here, but as I describe above, the effects of psychedelics on various measures of directionality have been quite mixed.

      (20) Unless I'm misunderstanding something, it seems to be a bit of a jump to infer that simply changing alpha in your model is akin to psychedelic dosing. Perhaps if the model implemented biologically plausible 5-HT2A expression and/or its behavior were constrained by common features of a psychedelic experience (e.g., fractal-like visuals imposed onto perception, inability to fall asleep, etc.), I'd be more inclined to see the parallels between alpha and psychedelics dosing. However, it would still need to recapitulate unique effects of psychedelics (e.g., impairments in hippocampal-dependent memory with sparing/facilitation of cortical memory). At the moment, it seems like whatever the model is doing is applicable to any hallucinogenic drug or even psychosis.

    3. Reviewer #2 (Public review):

      This work is a nice contribution to the literature in articulating a specific, testable theory of how psychedelics act to generate hallucinations and plasticity. The connection to replay, however - including in the title, abstract, and framing throughout the paper - is not well fleshed out.

      In particular, the paper's framing seems to conflate replay, dreams, and top-down processing, but these are not one and the same. Picard-Delano et al. TICS 2023 provides a useful review of the differences between replay and dreams. One key point is that most replay has been observed during NREM sleep, but our canonically bizarre / vivid dreams occur during REM. Top-down connections have also been proposed to be used for many processes aside from replay. The paper would benefit from much more precision and nuance on these points.

      I believe the paper is missing demonstrations or speculation about how plasticity under various doses of psychedelics relates to changes in performance, which would be an important link to the replay-dependent learning literature.

      Are there renderings available for 'ripple' effects of psychedelics that could be included, to allow readers to compare the model's hallucinations to humans'? Short of this, it would be useful to have a more detailed description of what rippling is. (For those readers without firsthand knowledge!) It is currently difficult to assess how close the match is.

    4. Author response:

      We thank the reviewers for the valuable and constructive reviews. Thanks to these, we believe the article will be considerably improved. We have organized our response to address points that are relevant to both reviewers first, after which we address the unique concerns of each individual reviewer separately. We briefly paraphrase each concern and provide comments for clarification, outlining the precise changes that we will make to the text.

      Common Concerns (Reviewer 1 & Reviewer 2):

      Can you clarify how NREM and REM sleep relate to the oneirogen hypothesis?

      Within the submission draft we tried to stay agnostic as to whether mechanistically similar replay events occur during NREM or REM sleep; however, upon a more thorough literature review, we think that there is moderately greater evidence in favor of Wake-Sleep-type replay occurring during REM sleep which is related to classical psychedelic drug mechanisms of action.

      First, we should clarify that replay has been observed during both REM and NREM sleep, and dreams have been documented during both sleep stages, though the characteristics of dreams differ across stages, with NREM dreams being more closely tied to recent episodic experience and REM dreams being more bizarre/hallucinatory (see Stickgold et al., 2001 for a review). Replay during sleep has been studied most thoroughly during NREM sharp-wave ripple events, in which significant cortical-hippocampal coupling has been observed (Ji & Wilson, 2007). However, it is critical to note that the quantification methods used to identify replay events in the hippocampal literature usually focus on identifying what we term ‘episodic replay,’ which involves a near-identical recapitulation of neural trajectories that were recently experienced during waking experimental recordings (Tingley & Peyrach, 2020). In contrast, our model focuses on ‘generative replay,’ where one expects only a statistically similar reproduction of neural activity, without any particular bias towards recent or experimentally controlled experience. This latter form of replay may look closer to the ‘reactivation’ observed in cortex by many studies (e.g. Nguyen et al., 2024), where correlation structures of neural activity similar to those observed during stimulus-driven experience are recapitulated. Under experimental conditions in which an animal is experiencing highly stereotyped activity repeatedly, over extended periods of time, these two forms of replay may be difficult to dissociate.

      Interestingly, though NREM replay has been shown to couple hippocampal and cortical activity, a similar study in waking animals administered psychedelics found hippocampal replay without any obvious coupling to cortical activity (Domenico et al., 2021). This could be because the coupling was not strong enough to produce full trajectories in the cortex (psychedelic administration did not increase ‘alpha’ enough), and that a causal manipulation of apical/basal influence in the cortex may be necessary to observe the increased coupling. Alternatively, as Reviewer 1 noted, it may be that psychedelics induce a form of hippocampus-decoupled replay, as one would expect from the REM stage of a recently proposed complementary learning systems model (Singh et al., 2022). 

      Evidence in favor of a similarity between the mechanism of action of classical psychedelics and the mechanism of action of memory consolidation/learning during REM sleep is actually quite strong. In particular, studies have shown that REM sleep increases the activity of soma-targeting parvalbumin (PV) interneurons and decreases the activity of apical dendrite-targeting somatostatin (SOM) interneurons (Niethard et al., 2021), that this shift in balance is controlled by higher-order thalamic nuclei, and that this shift in balance is critical for synaptic consolidation of both monocular deprivation effects in early visual cortex (Zhou et al., 2020) and for the consolidation of auditory fear conditioning in the dorsal prefrontal cortex (Aime et al., 2022). These last studies were not discussed in the present manuscript–we will add them, in addition to a more nuanced description of the evidence connecting our model to NREM and REM replay.

      Can you explain how synaptic plasticity induced by psychedelics within your model relates to learning at a behavioral level?

      While the Wake-Sleep algorithm is a useful model for unsupervised statistical learning, it is not a model of reward or fear-based conditioning, which likely occur via different mechanisms in the brain (e.g. dopamine-dependent reinforcement learning or serotonin-dependent emotional learning). The Wake-Sleep algorithm is a ‘normative plasticity algorithm,’ that connects synaptic plasticity to the formation of structured neural representations, but it is not the case that all synaptic plasticity induced by psychedelic administration within our model should induce beneficial learning effects. According to the Wake-Sleep algorithm, plasticity at apical synapses is enhanced during the Wake phase, and plasticity at basal synapses is enhanced during the Sleep phase; under the oneirogen hypothesis, hallucinatory conditions (increased ‘alpha’) cause an increase in plasticity at both apical and basal sites. Because neural activity is in a fundamentally aberrant state when ‘alpha’ is increased, there are no theoretical guarantees that plasticity will improve performance on any objective: psychedelic-induced plasticity within our model could perhaps better be thought of as ‘noise’ that may have a positive or negative effect depending on the context.

      In particular, such ‘noise’ may be beneficial for individuals or networks whose synapses have become locked in a suboptimal local minimum. The addition of large amounts of random plasticity could allow a system to extricate itself from such local minima over subsequent learning (or with careful selection of stimuli during psychedelic experience), similar to simulated annealing optimization approaches. If our model were fully validated, this view of psychedelic-induced plasticity as ‘noise’ could have relevance for efforts to alleviate the adverse effects of PTSD, early life trauma, or sensory deprivation; it may also provide a cautionary note against repeated use of psychedelic drugs within a short time frame, as the plasticity changes induced by psychedelic administration under our model are not guaranteed to be good or useful in-and-of themselves without subsequent re-learning and compensation.

      We should also note that we have deliberately avoided connecting the oneirogen hypothesis model to fear extinction experimental results that have been observed through recordings of the hippocampus or the amygdala (Bombardi & Giovanni, 2013; Jiang et al., 2009; Kelly et al., 2024; Tiwari et al., 2024). Both regions receive extensive innervation directly from serotonergic synapses originating in the dorsal raphe nucleus, which have been shown to play an important role in emotional learning (Lesch & Waider, 2012); because classical psychedelics may play a more direct role in modulating this serotonergic innervation, it is possible that fear conditioning results (in addition to the anxiolytic effects of psychedelics) cannot be attributed to a shift in balance between apical and basal synapses induced by psychedelic administration. We will provide a more detailed review of these results in the text, as well as more clarity regarding their relation to our model.

      Reviewer 1 Concerns:

      Is it reasonable to assign a scalar parameter ‘alpha’ to the effects of classical psychedelics? And is your proposed mechanism of action unique to classical psychedelics? E.g. Could this idea also apply to kappa opioid agonists, ketamine, or the neural mechanisms of hallucination disorders?

      We will clarify that within our model ‘alpha’ is a parameter that reflects the balance between apical and basal synapses in determining the activity of neurons in the network. For the sake of simplicity we used a single ‘alpha’ parameter, but realistically, each neuron would have its own ‘alpha’ parameter, and different layers or individual neurons could be affected differentially by the administration of any particular drug; therefore, our scalar ‘alpha’ value can be thought of as a mean parameter for all neurons, disregarding heterogeneity across individual neurons.

      There are many different mechanisms that could theoretically affect this ‘alpha’ parameter, including: 5-HT2a receptor agonism, kappa opioid receptor binding, ketamine administration, or possibly the effects of genetic mutations underlying the pathophysiology of complex developmental hallucination disorders. We focused exclusively on 5-HT2a receptor agonism for this study because the mechanism is comparatively simple and extensively characterized, but similar mechanisms may well be responsible for the hallucinatory symptoms of a variety of drugs and disorders.

      Can you clarify the role of 5-HT2a receptor expression on interneurons within your model?

      While we mostly focused on the effects of 5-HT2a receptors on the apical dendrites of pyramidal neurons, these receptors are also expressed on soma-targeting parvalbumin (PV) interneurons. This expression on PV interneurons is consistent with our proposed psychedelic mechanism of action, because it could lead to a coordinated decrease in the influence of somatic and proximal dendritic inputs while increasing the influence of apical dendritic inputs. We will elaborate on this point, and will move the discussion earlier in the text.

      Discussions of indigenous use of psychedelics over millenia may amount to over-romanticization.

      We will take great care to conduct a more thorough literature review to reevaluate our statement regarding indigenous psychedelic use (including the citations you suggested), and will either provide a more careful statement or remove this discussion from our introduction entirely, as it has little bearing on the rest of the text. The Ethics Statement will also be modified accordingly.

      You isolate the 5-HT2a agonism as the mechanism of action underlying ‘alpha’ in your model, but there exist 5-HT2a agonists that do not have hallucinatory effects (e.g. lisuride). How do you explain this?

      Lisuride has much-reduced hallucinatory effects compared to other psychedelic drugs at clinical doses (though it does indeed induce hallucinations at high doses; Marona-Lewicka et al., 2002), and we should note that serotonin (5-HT) itself is pervasive in the cortex without inducing hallucinatory effects during natural function. Similarly, MDMA is a partial agonist for 5-HT2a receptors, but it has much-reduced perceptual hallucination effects relative to classical psychedelics (Green et al., 2003) in addition to many other effects not induced by classical psychedelics.

      Therefore, while we argue that 5-HT2a agonism induces an increase in influence of apical dendritic compartments and a decrease in influence of basal/somatic compartments, and that this change induces hallucinations, we also note that there are many other factors that control whether or not hallucinations are ultimately produced, so that not all 5-HT2a agonists are hallucinogenic. We will discuss two such factors in our revision: 5-HT receptor binding affinity and cellular membrane permeability.

      Importantly, many 5-HT2a receptor agonists are also 5-HT1a receptor agonists (e.g. serotonin itself and lisuride), while MDMA has also been shown to increase serotonin, norepinephrine, and dopamine release (Green et al., 2003). While 5-HT2a receptor agonism has been shown to reduce sensory stimulus responses (Michaiel et al., 2019), 5-HT1a receptor agonism inhibits spontaneous cortical activity (Azimi et al., 2020); thus one might expect the net effect of administering serotonin or a nonselective 5-HT receptor agonist to be widespread inhibition of a circuit, as has been observed in visual cortex (Azimi et al., 2020). Therefore, selective 5-HT2a agonism is critical for the induction of hallucinations according to our model, though any intervention that jointly excites pyramidal neurons’ apical dendrites and inhibits their basal/somatic compartments across a broad enough area of cortex would be predicted to have a similar effect. Lisuride has a much higher binding affinity for 5-HT1a receptors than, for instance, LSD (Marona-Lewicka et al., 2002).

      Secondly, it has recently been shown that both the head-twitch effect (a coarse behavioral readout of hallucinations in animals) and the plasticity effects of psychedelics are abolished when administering 5-HT2a agonists that are impermeable to the cellular membrane because of high polarity, and that these effects can be rescued by temporarily rendering the cellular membrane permeable (Vargas et al., 2023). This suggests that the critical hallucinatory effects of psychedelics (apical excitation according to our model) may be mediated by intracellular 5-HT2a receptors. Notably, serotonin itself is not membrane permeable in the cortex.

      Therefore, either of these two properties could play a role in whether a given 5-HT2a agonist induces hallucinatory effects. We will provide a considerably extended discussion of these nuances in our revision.

      Your model proposes that an increase in top-down influence on neural activity underlies the hallucinatory effects of psychedelics. How do you explain experimental results that show increases in bottom-up functional connectivity (either from early sensory areas or the thalamus)?

      Firstly, we should note that our proposed increase in top-down influence is a causal, biophysical property, not necessarily a statistical/correlative one. As such, we will stress that the best way to test our model is via direct intervention in cortical microcircuitry, as opposed to correlative approaches taken by most fMRI studies, which have shown mixed results with regard to this particular question. Correlative approaches can be misleading due to dense recurrent coupling in the system, and due to the coarse temporal and spatial resolution provided by noninvasive recording technologies (changes in statistical/functional connectivity do not necessarily correspond to changes in causal/mechanistic connectivity, i.e. correlation does not imply causation).

      There are two experimental results that appear to contradict our hypothesis that deserve special consideration in our revision. The first shows an increase in directional thalamic influence on the distributed cortical networks after psychedelic administration (Preller et al., 2018). To explain this, we note that this study does not distinguish between lower-order sensory thalamic nuclei (e.g. the lateral and medial geniculate nuclei receiving visual and auditory stimuli respectively) and the higher-order thalamic nuclei that participate in thalamocortical connectivity loops (Whyte et al., 2024). Subsequent more fine-grained studies have noted an increase in influence of higher order thalamic nuclei on the cortex (Pizzi et al., 2023; Gaddis et al., 2022), and in fact extensive causal intervention research has shown that classical psychedelics (and 5-HT2a agonism) decrease the influence of incoming sensory stimuli on the activity of early sensory cortical areas, indicating decoupling from the sensory thalamus (Evarts et al., 1955; Azimi et al., 2020; Michaiel et al. 2019). The increased influence of higher-order thalamic nuclei is consistent with both the cortico-striatal-thalamo-cortical (CTSC) model of psychedelic action as well as the oneirogen hypothesis, since higher-order thalamic inputs modulate the apical dendrites of pyramidal neurons in cortex (Whyte et al., 2024).

      The second experimental result notes that DMT induces traveling waves during resting state activity that propagate from early visual cortex to deeper cortical layers (Alamia et al., 2020). There are several possibilities that could explain this phenomenon: 1) it could be due to the aforementioned difficulties associated with directed functional connectivity analyses, 2) it could be due to a possible high binding affinity for DMT in the visual cortex relative to other brain areas, or 3) it could be due to increases in apical influence on activity caused by local recurrent connectivity within the visual cortex which, in the absence of sensory input, could lead to propagation of neural activity from the visual cortex to the rest of the brain. This last possibility is closest to the model proposed by (Ermentrout & Cowan, 1979), and which we believe would be best explained within our framework by a topographically connected recurrent network architecture trained on video data; a potentially fruitful direction for future research.

      Shouldn’t the hallucinations generated by your model look more ‘psychedelic,’ like those produced by the DeepDream algorithm?

      We believe that the differences in hallucination visualization quality between our algorithm and DeepDream are mostly due to differences in the scale and power of the models used across these two studies. We are confident that with more resources (and potentially theoretical innovations to improve the Wake-Sleep algorithm’s performance) the produced hallucination visualizations could become more realistic, but we believe this falls outside the scope of the present study.

      We note that more powerful generative models trained with backpropagation are able to produce surreal images of comparable quality (Rezende et al., 2014; Goodfellow et al., 2020; Vahdat & Kautz, 2020), though these have not yet been used as a model of psychedelic hallucinations. However, the DeepDream model operates on top of large pretrained image processing models, and does not provide a biologically mechanistic/testable interpretation of its hallucination effects. When training smaller models with a local synaptic plasticity rule (as opposed to backpropagation), the hallucination effects are less visually striking due to the reduced quality of our trained generative model, though they are still strongly tied to the statistics of sensory inputs, as quantified by our correlation similarity metric (Fig. 5b). We will provide a more detailed explanation of this phenomenon when we discuss our model limitations in our revised manuscript.

      Your model assumes domination by entirely bottom-up activity during the ‘wake’ phase, and domination entirely by top-down activity during ‘sleep,’ despite experimental evidence indicating that a mixture of top-down and bottom-up inputs influence neural activity during both stages in the brain. How do you explain this?

      Our use of the Wake-Sleep algorithm, in which top-down inputs (Sleep) or bottom-up inputs (Wake) dominate network activity is an over-simplification made within our model for computational and theoretical reasons. Models that receive a mixture of top-down and bottom-up inputs during ‘Wake’ activity do exist (in particular the closely related Boltzmann machine (Ackley et al., 1985)), but these models are considerably more computationally costly to train due to a need to run extensive recurrent network relaxation dynamics for each input stimulus. Further, these models do not generalize as cleanly to processing temporal inputs. For this reason, we focused on the Wake-Sleep algorithm, at the cost of some biological realism, though we note that our model should certainly be extended to support mixed apical-basal waking regimes. We will make sure to discuss this in our ‘Model Limitations’ section.

      Your model proposes that 5-HT2a agonism enhances glutamatergic transmission, but this is not true in the hippocampus, which shows decreases in glutamate after psychedelic administration.

      We should note that our model suggests only compartment specific increases in glutamatergic transmission; as such, our model does not predict any particular directionality for measures of glutamatergic transmission that includes signaling at both apical and basal compartments in aggregate, as was measured in the provided study (Mason et al., 2020).

      You claim that your model is consistent with the Entropic Brain theory, but you report increases in variance, not entropy. In fact, it has been shown that variance decreases while entropy increases under psychedelic administration. How do you explain this discrepancy?

      Unfortunately, ‘entropy’ and ‘variance’ are heavily overloaded terms in the noninvasive imaging literature, and the particularities of the method employed can exert a strong influence on the reported effects. The reduction in variance reported by (Carhart-Harris et al., 2016) is a very particular measure: they are reporting the variance of resting state synchronous activity, averaged across a functional subnetwork that spans many voxels; as such, the reduction in variance in this case is a reduction in broad, synchronous activity. We do not have any resting state synchronous activity in our network due to the simplified nature of our model (particularly an absence of recurrent temporal dynamics), so we see no reduction in variance in our model due to these effects.

      Other studies estimate ‘entropy’ or network state disorder via three different methods that we have been able to identify. 1) (Carhart-Harris et al., 2014) uses a different measure of variance: in this case, they subtract out synchronous activity within functional subnetworks, and calculate variability across units in the network. This measure reports increases in variance (Fig. 6), and is the closest measure to the one we employ in this study. 2) (Lebedev et al., 2016) uses sample entropy, which is a measure of temporal sequence predictability. It is specifically designed to disregard highly predictable signals, and so one might imagine that it is a measure that is robust to shared synchronous activity (e.g. resting state oscillations). 3) (Mediano et al., 2024) uses Lempel-Ziv complexity, which is, similar to sample entropy, a measure of sequence diversity; in this case the signal is binarized before calculation, which makes this method considerably different from ours. All three of the preceding methods report increases in sequence diversity, in agreement with our quantification method. Our strongest explanation for why the variance calculation in (Carhart-Harris et al., 2016) produces a variance reduction is therefore due to a reduction in low-rank synchronous activity in subnetworks during resting state.

      As for whether the entropy increase is meaningful: we share Reviewer 1’s concern that increases in entropy could simply be due to a higher degree of cognitive engagement during resting state recordings, due to the presence of sensory hallucinations or due to an inability to fall asleep. This could explain why entropy increases are much more minimal relative to non-hallucinating conditions during audiovisual task performance (Siegel et al., 2024; Mediano et al., 2024). However, we can say that our model is consistent with the Entropic Brain Theory without including any form of ‘cognitive processing’: we observe increases in variability during resting state in our model, but we observe highly similar distributions of activity when averaging over a wide variety of sensory stimulus presentations (Fig. 5b-c). This is because variability in our model is not due to unstructured noise: it corresponds to an exploration of network states that would ordinarily be visited by some stimulus. Therefore, when averaging across a wide variety of stimuli, the distribution of network states under hallucinating or non-hallucinating conditions should be highly similar.

      One final point of clarification: here we are distinguishing Entropic Brain Theory from the REBUS model–the oneirogen hypothesis is consistent with the increase in entropy observed experimentally, but in our model this entropy increase is not due to increased influence of bottom-up inputs (it is due instead to an increase in top-down influence). Therefore, one could view the oneirogen hypothesis as consistent with EBT, but inconsistent with REBUS.

      You relate your plasticity rule to behavioral-timescale plasticity (BTSP) in the hippocampus, but plasticity has been shown to be reduced in the hippocampus after psychedelic administration. Could you elaborate on this connection?

      When we were establishing a connection between our ‘Wake-Sleep’ plasticity rule and BTSP learning, the intended connection was exclusively to the mathematical form of the plasticity rule, in which activity in the apical dendrites of pyramidal neurons functions as an instructive signal for plasticity in basal synapses (and vice versa): we will clarify this in the text. Similarly, we point out that such a plasticity rule tends to result in correlated tuning between apical and basal dendritic compartments, which has been observed in hippocampus and cortex: this is intended as a sanity check of our mapping of the Wake-Sleep algorithm to cortical microcircuitry, and has limited further bearing on the effects of psychedelics specifically.

      Reduction in plasticity in the hippocampus after psychedelic administration could be due to a complementary learning systems-type model, in which the hippocampus becomes partly decoupled from the cortex during REM sleep (Singh et al., 2022); were this to be the case, it would not be incompatible with our model, which is mostly focused on the cortex. Notably, potentiating 5HT-2a receptors in the ventral hippocampus does not induce the head-twitch response, though it does produce anxiolytic effects (Tiwari et al., 2024), indicating that the hallucinatory and anxiolytic effects of classical psychedelics may be partly decoupled. 

      Reviewer 2 Concerns:

      Could you provide visualizations of the ‘ripple’ phenomenon that you’re referring to?

      We will do this! For now, you can get a decent understanding of what the ‘ripple effect’ looks like from the ‘eyes closed’ hallucination condition for networks trained on CIFAR10 (Fig. 2d). The ripple effect that we are referring to is very similar, except it is superimposed on a naturalistic image under ordinary viewing conditions; to give a higher quality visualization of the ripple phenomenon itself, we will subtract out the static contribution of the image itself, leaving only the ripple phenomenon.

      Could you provide a more nuanced description of alternative roles for top-down feedback, beyond being used exclusively for learning as depicted in your model?

      For the sake of simplicity, we only treat top-down inputs in our model as a source of an instructive teaching signal, the originator of generative replay events during the Sleep phase, and as the mechanism of hallucination generation. However, as discussed in a response to a previous question, in the cortex pyramidal neurons receive and respond to a mixture of top-down and bottom-up processing.

      There are a variety of theories for what role top-down inputs could play in determining network activity. To name several, top-down input could function as: 1) a denoising/pattern completion signal (Kadkhodaie & Simoncelli, 2021), 2) a feedback control signal (Podlaski & Machens, 2020), 3) an attention signal (Lindsay, 2020), 4) ordinary inputs for dynamic recurrent processing that play no specialized role distinct from bottom-up or lateral inputs except to provide inputs from higher-order association areas or other sensory modalities (Kar et al., 2019; Tugsbayar et al., 2025). Though our model does not include these features, they are perfectly consistent with our approach.

      In particular, denoising/pattern completion signals in the predictive coding framework (closely related to the Wake-Sleep algorithm) also play a role as an instructive learning signal (Salvatori et al., 2021); and top-down control signals can play a similar role in some models (Gilra & Gerstner, 2017; Meulemans et al., 2021). Thus, options 1 and 2 are heavily overlapping with our approach, and are a natural consequence of many biologically plausible learning algorithms that minimize a variational free energy loss (Rao & Ballard, 1997; Ackley et al., 1985). Similarly, top-down attentional signals can exist alongside top-down learning signals, and some models have argued that such signals can be heavily overlapping or mutually interchangeable (Roelfsema & van Ooyen, 2005). Lastly, generic recurrent connectivity (from any source) can be incorporated into the Wake-Sleep algorithm (Dayan & Hinton, 1996), though we avoided doing this in the present study due to an absence of empirical architecture exploration in the literature and the computational complexity associated with training on time series data.

      To conclude, there are a variety of alternative functions proposed for top-down inputs onto pyramidal neurons in the cortex, and we view these additional features as mutually compatible with our approach; for simplicity we did not include them in our model, but we believe that these features are unlikely to interfere with our testable predictions or empirical results.

    1. eLife Assessment

      This study proposes a potentially useful improvement on a popular fMRI method for quantifying representational similarity in brain measurements by focusing on representational strength at the single trial level and adding linear mixed effects modeling for group-level inference. The manuscript demonstrates increased sensitivity with no loss of precision compared to more classic versions of the method. However, the framing of the work with respect to these prior approaches is incomplete, several assumptions are insufficiently motivated, and it is unclear to what extent the approach would generalize to other paradigms.

    2. Reviewer #1 (Public review):

      Summary:

      The paper presents a novel method for RSA, called trial-level RSA (tRSA). The method first constructs a trial x trial representation dissimilarity matrix using correlation distances, assuming that (as in the empirical example) each trial has a unique stimulus. Whereas "classical RSA" correlates the entire upper triangular matrix of the RDM / RSM to a model RDM / RSM, tRSA first calculates the correlation to the model RDM per row, and then averages these values. The paper claims that tRSA has increased sensitivity and greater flexibility than classical RSA.

      Strengths & Weaknesses:

      I have to admit that it took a few hours of intense work to understand this paper and to even figure out where the authors were coming from. The problem setting, nomenclature, and simulation methods presented in this paper do not conform to the notation common in the field, are often contradictory, and are usually hard to understand. Most importantly, the problem that the paper is trying to solve seems to me to be quite specific to the particular memory study in question, and is very different from the normal setting of model-comparative RSA that I (and I think other readers) may be more familiar with.

      Main issues:

      (1) The definition of "classical RSA" that the authors are using is very narrow. The group around Niko Kriegeskorte has developed RSA over the last 10 years, addressing many of the perceived limitations of the technique. For example, cross-validated distance measures (Walther et al. 2016; Nili et al. 2014; Diedrichsen et al. 2021) effectively deal with an uneven number of trials per condition and unequal amounts of measurement noise across trials. Different RDM comparators (Diedrichsen et al. 2021) and statistical methods for generalization across stimuli (Schütt et al. 2023) have been developed, addressing shortcomings in sensitivity. Finally, both a Bayesian variant of RSA (Pattern component modelling, (Diedrichsen, Yokoi, and Arbuckle 2018) and an encoding model (Naselaris et al. 2011) can effectively deal with continuous variables or features across time points or trials in a framework that is very related to RSA (Diedrichsen and Kriegeskorte 2017). The author may not consider these newer developments to be classical, but they are in common use and certainly provide the solution to the problems raised in this paper in the setting of model-comparative RSA in which there is more than one repetition per stimulus.

      (2) The stated problem of the paper is to estimate "representational strength" in different regions or conditions. With this, the authors define the correlation of the brain RDM with a model RDM. This metric conflates a number of factors, namely the variances of the stimulus-specific patterns, the variance of the noise, the true differences between different dissimilarities, and the match between the assumed model and the data-generating model. It took me a long time to figure out that the authors are trying to solve a quite different problem in a quite different setting from the model-comparative approach to RSA that I would consider "classical" (Diedrichsen et al. 2021; Diedrichsen and Kriegeskorte 2017). In this approach, one is trying to test whether local activity patterns are better explained by representation model A or model B, and to estimate the degree to which the representation can be fully explained. In this framework, it is common practice to measure each stimulus at least 2 times, to be able to estimate the variance of noise patterns and the variance of signal patterns directly. Using this setting, I would define 'representational strength" very differently from the authors. Assume (using LaTeX notation) that the activity patterns $y_j,n$ for stimulus j, measurement n, are composed of a true stimulus-related pattern ($u_j$) and a trial-specific noise pattern ($e_j,n$). As a measure of the strength of representation (or pattern), I would use an unbiased estimate of the variance of the true stimulus-specific patterns across voxels and stimuli ($\sigma^2_{u}$). This estimator can be obtained by correlating patterns of the same stimuli across repeated measures, or equivalently, by averaging the cross-validated Euclidean distances (or with spatial prewhitening, Mahalanobis distances) across all stimulus pairs. In contrast, the current paper addresses a specific problem in a quite specific experimental design in which there is only one repetition per stimulus. This means that the authors have no direct way of distinguishing true stimulus patterns from noise processes. The trick that the authors apply here is to assume that the brain data comes from the assumed model RDM (a somewhat sketchy assumption IMO) and that everything that reduces this correlation must be measurement noise. I can now see why tRSA does make some sense for this particular question in this memory study. However, in the more common model-comparative RSA setting, having only one repetition per stimulus in the experiment would be quite a fatal design flaw. Thus, the paper would do better if the authors could spell the specific problem addressed by their method right in the beginning, rather than trying to set up tRSA as a general alternative to "classical RSA".

      (3) The notation in the paper is often conflicting and should be clarified. The actual true and measured activity patterns should receive a unique notation that is distinct from the variances of these patterns across voxels. I assume that $\sigma_ijk$ is the noise variances (not standard deviation)? Normally, variances are denoted with $\sigma^2$. Also, if these are variances, they cannot come from a normal distribution as indicated on page 10. Finally, multi-level models are usually defined at the level of means (i.e., patterns) rather than at the level of variances (as they seem to be done here).

      (4) In the first set of simulations, the authors sampled both model and brain RSM by drawing each cell (similarity) of the matrix from an independent bivariate normal distribution. As the authors note themselves, this way of producing RSMs violates the constraint that correlation matrices need to be positive semi-definite. Likely more seriously, it also ignores the fact that the different elements of the upper triangular part of a correlation matrix are not independent from each other (Diedrichsen et al. 2021). Therefore, it is not clear that this simulation is close enough to reality to provide any valuable insight and should be removed from the paper, along with the extensive discussion about why this simulation setting is plainly wrong (page 21). This would shorten and clarify the paper.

      (5) If I understand the second simulation setting correctly, the true pattern for each stimulus was generated as an NxP matrix of i.i.d. standard normal variables. Thus, there is no condition-specific pattern at all, only condition-specific noise/signal variances. It is not clear how the tRSA would be biased if there were a condition-specific pattern (which, in reality, there usually is). Because of the i.i.d. assumption of the true signal, the correlations between all stimulus pairs within conditions are close to zero (and only differ from it by the fact that you are using a finite number of voxels). If you added a condition-specific pattern, the across-condition RSA would lead to much higher "representational strength" estimates than a within-condition RSA, with obvious problems and biases.

      (6) The trial-level brain RDM to model Spearman correlations was analyzed using a mixed effects model. However, given the symmetry of the RDM, the correlations coming from different rows of the matrix are not independent, which is an assumption of the mixed effect model. This does not seem to induce an increase in Type I errors in the conditions studied, but there is no clear justification for this procedure, which needs to be justified.

      (7) For the empirical data, it is not clear to me to what degree the "representational strength" of cRSA and tRSA is actually comparable. In cRSA, the Spearman correlation assesses whether the distances in the data RSM are ranked in the same order as in the model. For tRSA, the comparison is made for every row of the RSM, which introduces a larger degree of flexibility (possibly explaining the higher correlations in the first simulation). Thus, could the gains presented in Figure 7D not simply arise from the fact that you are testing different questions? A clearer theoretical analysis of the difference between the average row-wise Spearman correlation and the matrix-wise Spearman correlation is urgently needed. The behavior will likely vary with the structure of the true model RDM/RSM.

      (8) For the real data, there are a number of additional sources of bias that need to be considered for the analysis. What if there are not only condition-specific differences in noise variance, but also a condition-specific pattern? Given that the stimuli were measured in 3 different imaging runs, you cannot assume that all measurement noise is i.i.d. - stimuli from the same run will likely have a higher correlation with each other.

      (9) The discussion should be rewritten in light of the fact that the setting considered here is very different from the model-comparative RSA in which one usually has multiple measurements per stimulus per subject. In this setting, existing approaches such as RSA or PCM do indeed allow for the full modelling of differences in the "representational strength" - i.e., pattern variance across subjects, conditions, and stimuli. Cross-validated distances provide a powerful tool to control for differences in measurement noise variances and possible covariances in measurement noise across trials, which has many distinct advantages and is conceptually very different from the approach taken here. One of the main limitations of tRSA is the assumption that the model RDM is actually the true brain RDM, which may not be the case. Thus, in theory, there could be a different model RDM, in which representational strength measures would be very different. These differences should be explained more fully, hopefully leading to a more accessible paper.

      References:

      Diedrichsen, J., Berlot, E., Mur, M., Schütt, H. H., Shahbazi, M., & Kriegeskorte, N. (2021). Comparing representational geometries using whitened unbiased-distance-matrix similarity. Neurons, Behavior, Data and Theory, 5(3). https://arxiv.org/abs/2007.02789

      Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508.

      Diedrichsen, J., Yokoi, A., & Arbuckle, S. A. (2018). Pattern component modeling: A flexible approach for understanding the representational structure of brain activity patterns. NeuroImage, 180, 119-133.

      Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage, 56(2), 400-410.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS Computational Biology, 10(4), e1003553.

      Schütt, H. H., Kipnis, A. D., Diedrichsen, J., & Kriegeskorte, N. (2023). Statistical inference on representational geometries. ELife, 12. https://doi.org/10.7554/eLife.82566

      Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. NeuroImage, 137, 188-200.

    3. Reviewer #2 (Public review):

      Summary:

      This methods paper proposes two changes to classic RSA, a popular method to probe neural representation in neuroimaging experiments: computing RSA at row/column level of RDM, and using mixed linear modeling to compute second-level statistics, using the individual row/columns to estimate a random effect of stimulus. The benefit of the new method is demonstrated using simulations and a re-analysis of a prior fMRI dataset on object perception and memory encoding.

      Strengths:

      (1) The paper is clearly written and features clear illustrations of the proposed method.

      (2) The combination of simulation and real data works well, with the same factors being examined in both simulations and real data, resulting in a convincing demonstration of the benefits of tRSA in realistic experimental scenarios.

      (3) I find the author's claim that tRSA is a promising approach to perform more complete modeling of cogneuro data, but also to conceptualize representation at the single trial/event level (cf Discussion section on P42), quite appealing.

      Weaknesses:

      (1) While I generally welcome the contribution (see above), I take some issue with the accusatory tone of the manuscript in the Introduction. The text there (using words such as 'ignored variances', 'errouneous inferences', 'one must', 'not well-suited', 'misleading') appears aimed at turning cRSA in a 'straw man' with many limitations that other researchers have not recognized but that the new proposed method supposedly resolves. This can be written in a more nuanced, constructive manner without accusing the numerous users of this popular method of ignorance.

      (2) The described limitations are also not entirely correct, in my view: for example, statistical inference in cRSA is not always done using classic parametric statistics such as t-tests (cf Figure 1): the rsatoolbox paper by Nili et al. (2014) outlines non-parametric alternatives based on permutation tests, bootstrapping and sign tests, which are commonly used in the field. Nor has RSA ever been conducted at the row/column level (here referred to by the authors as 'trial level'; cf King et al., 2018).

      (3) One of the advantages of cRSA is its simplicity. Adding linear mixed effects modeling to RSA introduces a host of additional 'analysis parameters' pertaining to the choice of the model setup (random effects, fixed effects, interactions, what error terms to use) - how should future users of tRSA navigate this?

      (4) Here, only a single real fMRI dataset is used with a quite complicated experimental design for the memory part; it's not clear if there is any benefit of using tRSA on a simpler real dataset. What's the benefit of tRSA in classic RSA datasets (e.g., Kriegeskorte et al., 2008), with fixed stimulus conditions and no behavior?

      (5) The cells of an RDM/RSM reflect pairwise comparisons between response patterns (typically a brain but can be any system; cf Sucholutsky et al., 2023). Because the response patterns are repeatedly compared, the cells of this matrix are not independent of one another. Does this raise issues with the validity of the linear mixed effects model? Does it assume the observations are linearly independent?

      (6) The manuscript assumes the reader is familiar with technical statistical terms such as Type I/II error, sensitivity, specificity, homoscedasticity assumptions, as well as linear mixed models (fixed effects, random effects, etc). I am concerned that this jargon makes the paper difficult to understand for a broad readership or even researchers currently using cRSA that might be interested in trying tRSA.

      (7) I could not find any statement on data availability or code availability. Given that the manuscript reuses prior data and proposes a new method, making data and code/tutorials openly available would greatly enhance the potential impact and utility for the community.

      References

      King, M. L., Groen, I. I., Steel, A., Kravitz, D. J., & Baker, C. I. (2019). Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images. NeuroImage, 197, 368-382.

      Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., ... & Bandettini, P. A. (2008). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 1126-1141.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS computational biology, 10(4), e1003553.

      Sucholutsky, I., Muttenthaler, L., Weller, A., Peng, A., Bobu, A., Kim, B., ... & Griffiths, T. L. (2023). Getting aligned on representational alignment. arXiv preprint arXiv:2310.13018.

    1. eLife Assessment

      In this valuable study, Wandler et al. provide convincing theoretical evidence for alternate mechanisms of rhythm generation by CPGs. Their model shows that cell-type-specific connectivity and a dominant inhibitory drive could underlie rhythm generation. Excitatory input could act to enhance the frequency range of these rhythms. This modeling study could motivate further experimental investigation of these mechanisms to understand CPG rhythmogenesis.

    2. Reviewer #1 (Public review):

      This study explores the connectivity patterns that could lead to fast and slow undulating swim patterns in larval zebrafish using a simplified theoretical framework. The authors show that a pattern of connectivity based only on inhibition is sufficient to produce realistic patterns with a single frequency. Two such networks, coupled with inhibition but with distinct time constants, can produce a range of frequencies. Adding excitatory connections further increases the range of obtainable frequencies, albeit at the expense of sudden transitions in the mid-frequency range.

      Strengths:

      (1) This is an eloquent approach to answering the question of how spinal locomotor circuits generate coordinated activity using a theoretical approach based on moving bump models of brain activity.

      (2) The models make specific predictions on patterns of connectivity while discounting the role of connectivity strength or neuronal intrinsic properties in shaping the pattern.

      (3) The models also propose that there is an important association between cell-type-specific intersegmental patterns and the recruitment of speed-selective subpopulations of interneurons.

      (4) Having a hierarchy of models creates a compelling argument for explaining rhythmicity at the network level. Each model builds on the last and reveals a new perspective on how network dynamics can control rhythmicity. I liked that each model can be used to probe questions in the next/previous model.

      Major Issues:

      (1) How is this simplified model representative of what is observed biologically? A bump model does not naturally produce oscillations. How would the dynamics of a rhythm generator interact with this simplistic model?

      (2) Would this theoretical construct survive being expressed in a biophysical model? It seems that it should, but even a simple biological model with the basic patterns of connectivity shown here would greatly increase confidence in the biological plausibility of the theory.

      (3) How stable is this model in its output patterns? Is it robust to noise? Does noise, in fact, smooth out the abrupt transitions in frequency in the middle range?

      (4) All figure captions are inadequate. They should have enough information for the reader to understand the figure and the point that was meant to be conveyed. For example, Figure 1 does not explain what the red dot is, what is black, what is white, or what the gradations of gray are. Or even if this is a representative connectivity of one node, or if this shows all the connections? The authors should not leave the reader guessing.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to show that connectivity patterns within spinal circuits composed of specific excitatory and inhibitory connectivity and with varying degrees of modularity could achieve tail beats at various frequencies as well as proper left-right coordination and rostrocaudal propagation speeds.

      Strengths:

      The model is simple, and the connectivity patterns explored are well supported by the literature.

      The conclusions are intuitive and support many experimental studies on zebrafish spinal circuits for swimming. The simulations provide strong support for the sufficiency of connectivity patterns to produce and control many hallmark features of swimming in zebrafish.

      Weaknesses:

      I only have two minor suggestions:

      (1) Figure 1A, if I interpret Figure 1B correctly, should there not be long descending projections as well that don't seem to be illustrated?

      (2) Page 5, It would be good to define what is meant by slow and fast here, as this definition changes with age in zebrafish (what developmental age)?

    4. Reviewer #3 (Public review):

      Summary:

      Central pattern generator (CPG) circuits underly rhythmic motor behaviors. To date, it is thought that these CPG networks are rather local and multiple CPG circuits are serially connected to allow locomotion across the entire body. Distributed CPG networks that incorporate long-range connections have not been proposed, although such connectivity has been experimentally shown for several different spinal populations. In this manuscript, the authors use this existing literature on long-range spinal interneuron connectivity to build a new computational model that reproduces basic features of locomotion like left-right alternation, rostrocaudal propagation, and independent control of frequency and amplitude. Interestingly, the authors show that a model solely based on inhibitory neurons can recapitulate these basic locomotor features. Excitatory sources were then added that increased the dynamic range of frequencies generated. Finally, the authors were also able to reproduce experimentally observed consequences of cell-type-specific ablations, showing that local and long-range, cell-type-specific connectivity could be sufficient for generating locomotion.

      Strengths:

      This work is novel, providing an interesting alternative to distributed CPGs to the local networks traditionally predicted. It shows cell type cell-type-specific network connectivity is as important, if not more than intrinsic cell properties for rhythmogenesis and that inhibition plays a crucial role in shaping locomotor features. Given the importance of local CPGs in understanding motor control, this alternative concept will be of broad interest to the larger motor control field, including invertebrate and vertebrate species.

      Weaknesses:

      I have the following minor concerns/clarifications:

      (1) The authors describe a single unit as a neuron, be it excitatory or inhibitory, and the output of the simulation is the firing rate of these neurons. Experimentally and in other modeling studies, motor neurons are incorporated in the model, and the output of the network is based on motor neuron firing rate, not the interneurons themselves. Why did the authors choose to build the model this way?

      (2) In the single population model (Figure 1), the authors use ipsilateral inhibitory connections that are long-range in an ascending direction. Experimentally, these connections have been shown to be local, while long-range ipsilateral connections have been shown to be descending. What were the reasons the authors chose this connectivity? Do the authors think local ascending inhibitions contribute to rostrocaudal propagation, and how?

      (3) In the two-population model, the authors show independent control of frequency and rhythm, as has been reported experimentally. However, in these previous experimental studies, frequency and amplitude are regulated by different neurons, suggesting different networks dedicated to frequency and amplitude control. However, in the current model, the same population with the same connections can contribute to frequency or amplitude depending on relative tonic drive. Can the authors please address these differences either by changes in the model or by adding to the Discussion?

      (4) It would be helpful to add a paragraph in the Discussion on how these results could be applicable to other model systems beyond zebrafish. Cell intrinsic rhythmogenesis is a popular concept in the field, and these results show an interesting and novel alternative. It would help to know if there is any experimental evidence suggesting such network-based propagation in other systems, invertebrates, or vertebrates.

    1. eLife Assessment

      This important work compares the size of two brain areas, the amygdala and the hippocampus, across 12 species belonging to the Macaca genus. The authors find, using a convincing methodological approach, that amygdala - but not hippocampal - volume varies with social tolerance grade, with high tolerance species showing larger amygdala than low tolerance species of macaques. Interestingly, their findings also suggest an inverted developmental effect, with intolerant species showing an increase in amygdala volume across the lifespan, compared to tolerant species exhibiting the opposite trend. Overall, this paper offers new insights into the neural basis of social and emotional processing.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the potential link between amygdala volume and social tolerance in multiple macaque species. Through a comparative lens, the authors considered tolerance grade, species, age, sex, and other factors that may contribute to differing brain volumes. They found that amygdala, but not hippocampal, volume differed across tolerance grades, such that high-tolerance species showed larger amygdala than low-tolerance species of macaques. They also found that less tolerant species exhibited increases in amygdala volume with age, while more tolerant species showed the opposite. Given their wide range of species with varied biological and ecological factors, the authors' findings provide new evidence for changes in amygdala volume in relation to social tolerance grades. Contributions from these findings will greatly benefit future efforts in the field to characterize brain regions critical for social and emotional processing across species.

      Strengths:

      (1) This study demonstrates a concerted and impressive effort to comparatively examine neuroanatomical contributions to sociality in monkeys. The authors impressively collected samples from 12 macaque species with multiple datapoints across species age, sex, and ecological factors. Species from all four social tolerance grades were present. Further, the age range of the animals is noteworthy, particularly the inclusion of individuals over 20 years old - an age that is rare in the wild but more common in captive settings.

      (2) This work is the first to report neuroanatomical correlates of social tolerance grade in macaques in one coherent study. Given the prevalence of macaques as a model of social neuroscience, considerations of how socio-cognitive demands are impacted by the amygdala are highly important. The authors' findings will certainly inform future studies on this topic.

      (3) The methodology and supplemental figures for acquiring brain MRI images are well detailed. Clear information on these parameters is crucial for future comparative interpretations of sociality and brain volume, and the authors do an excellent job of describing this process in full.

      Weaknesses:

      (1) The nature vs. nurture distinction is an important one, but it may be difficult to draw conclusions about "nature" in this case, given that only two data points (from grades 3 and 4) come from animals under one year of age (Method Figure 1D). Most brains were collected after substantial social exposure-typically post age 1 or 1.5-so the data may better reflect developmental changes due to early life experience rather than innate wiring. It might be helpful to frame the findings more clearly in terms of how early experiences shape development over time, rather than as a nature vs. nurture dichotomy.

      (2) It would be valuable to clarify how the older individuals, especially those 20+ years old, may have influenced the observed age-related correlations (e.g., positive in grades 1-2, negative in grades 3-4). Since primates show well-documented signs of aging, some discussion of the potential contribution of advanced age to the results could strengthen the interpretation.

      (3) The authors categorize the behavioral traits previously described in Thierry (2021) into 3 self-defined cognitive requirements, however, they do not discuss under what conditions specific traits were assigned to categories or justify why these cognitive requirements were chosen. It is not fully clear from Thierry (2021) alone how each trait would align with the authors' categories. Given that these traits/categories are drawn on for their neuroanatomical hypotheses, it is important that the authors clarify this. It would be helpful to include a table with all behavioral traits with their respective categories, and explain their reasoning for selecting each cognitive requirement category.

      (4) One of the main distinctions the authors make between high social tolerance species and low tolerance species is the level of complex socio-cognitive demands, with more tolerant species experiencing the highest demands. However, socio-cognitive demands can also be very complex for less tolerant species because they need to strategically balance behaviors in the presence of others. The relationships between socio-cognitive demands and social tolerance grades should be viewed in a more nuanced and context-specific manner.

      (5) While the limitations section touches on species-related considerations, the issue of individual variability within species remains important. Given that amygdala volume can be influenced by factors such as social rank and broader life experience, it might be useful to further emphasize that these factors could introduce meaningful variation across individuals. This doesn't detract from the current findings but highlights the importance of considering life history and context when interpreting subcortical volumes-particularly in future studies.

    3. Reviewer #2 (Public review):

      Summary:

      This comparative study of macaque species and the type of social interaction is both ambitious and inevitably comes with a lot of caveats. The overall conclusion is that more intolerant species have a larger amygdala. There are also opposing development profiles regarding amygdala volume depending on whether it is a tolerant or intolerant species.

      To achieve any sort of power, they have combined data from 4 centres, which have all used different scanning methods, and there are some resolution differences. The authors have also had to group species into 4 classifications - again to assist with any generalisations and power. They have focussed on the volumes of two structures, the amygdala and the hippocampus, which seems appropriate. Neither structure is homogeneous and so it may well be that a targeted focus on specific nuclei or subfields would help (the authors may well do this next) - but as the variables would only increase further along with the number of potential comparisons, alongside small group numbers, it seems only prudent to treat these findings are preliminary. That said, it is highly unlikely that large numbers of macaque brains will become available in the near future.

      This introduction is by way of saying that the study achieves what it sets out to do, but there are many reasons to see this study as preliminary. The main message seems to be twofold: (1) that more intolerant species have relatively larger amygdalae, and (2) that with development, there is an opposite pattern of volume change (increasing with age in intolerant species and decreasing with age in tolerant species). Finding 1 is the opposite of that predicted in Table 1 - this is fine, but it should be made clearer in the Discussion that this is the case, otherwise the reader may feel confused. As I read it, the authors have switched their prediction in the Discussion, which feels uncomfortable.

      It is inevitable that the data in a study of this complexity are all too prone to post hoc considerations, to which the authors indulge. In the case of Grade 1 species, the individuals have a lot to learn, especially if they are not top of the hierarchy, but at the same time, there are fewer individuals in the troop, making predictions very tricky. As noted above, I am concerned by the seemingly opposite predictions in Table 1 and those in the Discussion regarding tolerance and amygdala volume. (It may be that the predictions in Table 1 are the opposite of how I read them, in which case the Table and preceding text need to align.)

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species.

      There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus, which remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The results look quite strong, although the authors could bring up some more clarity in their replies regarding the data they are working with. From one figure to the other, we switch from model-calculated ratio to model-predicted volume. Note that if one was to sample a brain at age 20 in all the grades according to the model-predicted volumes, it would not seem that the difference for amygdala would differ much across grades, mostly driven with Grade 1 being smaller (in line with the main result), but then with Grade 2 bigger than Grade 3, and then Grade 4 bigger once again, but not that different from Grade 2.

      Overall, despite this, I think the results are pretty strong, the correlations are not to be contested, but I also wonder about their real meaning and implications. This can be seen under 3 possible aspects:

      (1) Classification of the social grade

      While it may be familiar to readers of Thierry and collaborators, or to researchers of the macaque world, there is no list included of the 18 behavioral traits used to define the three main cognitive requirements (socio-cognitive demands, predictability of the environment, inhibitory control). It would be important to know which of the different traits correspond to what, whether they overlap, and crucially, how they are realized in the 12 study species, as there could be drastic differences from one species to the next. For now, we can only see from Table S1 where the species align to, but it would be a good addition to have them individually matched to, if not the 18 behavioral traits, at least the 3 different broad categories of cognitive requirements.

      (2) Issue of nature vs nurture

      Another way to look at the debate between nature vs nurture is to look at phylogeny. For now, there is no phylogenetic tree that shows where the different grades are realized. For example, it would be illuminating to know whether more related species, independently of grades, have similar amygdala or hippocampus sizes. Then the question will go to the details, and whether the grades are realized in particular phylogenetic subdivisions. This would go in line with the general point of the authors that there could be general species differences.

      With respect to nurture, it is likely more complicated: one needs to take into account the idiosyncrasies of the life of the individual. For example, some of the cited literature in humans or macaques suggests that the bigger the social network, the bigger the brain structure considered. Right, but this finding is at the individual level with a documented life history. Do we have any of this information for any of the individuals considered (this is likely out of the scope of this paper to look at this, especially for individuals that did not originate from CdP)?

      (3) Issue of the discussion of the amygdala's function

      The entire discussion/goal of the paper, states that the amygdala is connected to social life. Yet, before being a "social center", the amygdala has been connected to the emotional life of humans and non-humans alike. The authors state L333/34 that "These findings challenge conventional expectations of the amygdala's primary involvement in emotional processes and highlight the complexity of the amygdala's role in social cognition". First, there is no dichotomy between social cognition and emotion. Emotion is part of social cognition (unless we and macaques are robots). Second, there is nowhere in the paper a demonstration that the differences highlighted here are connected to social cognition differences per se. For example, the authors have not tested, say, if grade 4 species are more afraid of snakes than grade 1 species. If so, one could predict they would also have a bigger amygdala, and they would probably also find it in the model. My point is not that the authors should try to correlate any kind of potential aspect that has been connected to the amygdala in the literature with their data (see for example the nice review by Domínguez-Borràs and Vuilleumier, https://doi.org/10.1016/B978-0-12-823493-8.00015-8), but they should refrain from saying they have challenged a particular aspect if they have not even tested it. I would rather engage the authors to try and discuss the amygdala as a multipurpose center, that includes social cognition and emotion.

      Strengths:

      Methods & breadth of species tested.

      Weaknesses:

      Interpretation, which can be described as 'oriented' and should rather offer additional views.

    1. eLife Assessment

      This is a useful tool for code-less analysis of patterns in cell migratory behaviours in vivo using intravital microscopy data and allows correlation with spatial features of the tumour microenvironment. There is a need for these tools to make quantitative analysis, comparison and interpretation of complex cell tracking data more accessible and evidence is provided of its applicability to tracks generated by both proprietary and open tracking software. However, it is incomplete due to limitations imposed by the assumptions that apply to the statistical tests employed.

    2. Reviewer #1 (Public review):

      Summary:

      Intravital microscopy (IVM) is a powerful tool that facilitates live imaging of individual cells over time in vivo in their native 3D tissue environment. Extracting and analysing multi-parametric data from IVM images however is challenging, particularly for researchers with limited programming and image analysis skills. In this work, Rios-Jimenez and Zomer et al have developed a 'zero-code' accessible computational framework (BEHAV3D-Tumour Profiler) designed to facilitate unbiased analysis of IVM data to investigate tumour cell dynamics (via the tool's central 'heterogeneity module' ) and their interactions with the tumour microenvironment (via the 'large-scale phenotyping' and 'small-scale phenotyping' modules). It is designed as an open-source modular Jupyter Notebook with a user-friendly graphical user interface and can be implemented with Google Colab, facilitating efficient, cloud-based computational analysis at no cost. Demo datasets are also available on the authors GitHub repository to aid user training and enhance the usability of the developed pipeline.

      To demonstrate the utility of BEHAV3D-TP, they apply the pipeline to timelapse IVM imaging datasets to investigate the in vivo migratory behaviour of fluorescently labelled DMG cells in tumour bearing mice. Using the tool's 'heterogeneity module' they were able to identify distinct single-cell behavioural patterns (based on multiple parameters such as directionality, speed, displacement, distance from tumour edge) which was used to group cells into distinct categories (e.g. retreating, invasive, static, erratic). They next applied the framework's 'large-scale phenotyping' and 'small-scale phenotyping' modules to investigate whether the tumour microenvironment (TME) may influence the distinct migratory behaviours identified. To achieve this, they combine TME visualisation in vivo during IVM (using fluorescent probes to label distinct TME components) or ex vivo after IVM (by large-scale imaging of harvested, immunostained tumours) to correlate different tumour behavioural patterns with the composition of the TME. They conclude that this tool has helped reveal links between TME composition (e.g. degree of vascularisation, presence of tumour-associated macrophages) and the invasiveness and directionality of tumour cells, which would have been challenging to identify when analysing single kinetic parameters in isolation.

      The authors also evaluated the BEHAV3D TP heterogeneity module using available IVM datasets of distinct breast cancer cell lines transplanted in vivo, as well as healthy mammary epithelial cells to test its usability in non-tumour contexts where the migratory phenotypes of cells may be more subtle. This generated data is consistent with that produced during the original studies, as well as providing some additional (albeit preliminary) insights above that previously reported. Collectively, this provides some confidence in BEHAV3D TP's ability to uncover complex, multi-parametric cellular behaviours that may be missed using traditional approaches.

      Overall, this computational framework appears to represent a useful and comparatively user-friendly tool to analyse dynamic multi-parametric data to help identify patterns in cell migratory behaviours, and to assess whether these behaviours might be influenced by neighbouring cells and structures in their microenvironment. When combined with other methods, it therefore has the potential to be a valuable addition to a researcher's IVM analysis 'tool-box'.

      Strengths:

      - Figures are clearly presented, and the manuscript is easy to follow.<br /> - The pipeline appears to be intuitive and user-friendly for researchers with limited computational expertise. A detailed step-by-step video and demo datasets are also included to support its uptake.<br /> - The different computational modules have been tested using relevant datasets, including imaging data of normal and tumour cells in vivo.<br /> - All code is open source, and the pipeline can be implemented with Google Colab.<br /> - The tool combines multiple dynamic parameters extracted from timelapse IVM images to identify single-cell behavioural patterns and to cluster cells into distinct groups sharing similar behaviours, and provides avenues to map these onto in vivo or ex vivo imaging data of the tumour microenvironment

      Weaknesses:

      - The tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence and displacement) from intravital images. To use the tool researchers must first extract dynamic cellular parameters from their IVM datasets using other software including Imaris, which is expensive and therefore not available to all. Nonetheless, the authors have developed their tool to facilitate the integration of other data formats generated by open-source Fiji plugins (e.g. TrackMate, MTrackJ, ManualTracking) which will help ensure its accessibility to a broader range of researchers.<br /> - The analysis provides only preliminary evidence in support of the authors conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. The authors acknowledge this however, and conclusions are appropriately tempered in the absence of additional experiments and controls.

    3. Reviewer #2 (Public review):

      Summary:

      The authors produce a new tool, BEHAV3D to analyse tracking data and to integrate these analyses with large and small scale architectural features of the tissue. This is similar to several other published methods to analyse spatio-temporal data, however, the connection to tissue features is a nice addition, as is the lack of requirement for coding. The tool is then used to analyse tracking data of tumour cells in diffuse midline glioma. They suggest 7 clusters exist within these tracks and that they differ spatially. They ultimately suggest that there these behaviours occur in distinct spatial areas as determined by CytoMAP.

      Strengths:

      - The tool appears relatively user-friendly and is open source. The combination with CytoMAP represents a nice option for researchers.

      - The identification of associations between cell track phenotype and spatial features is exciting and the diffuse midline glioma data nicely demonstrates how this could be used.

      Weaknesses:

      - The revision has dealt with many concerns, however, the statistics generated by the process are still flawed. While the statistics have been clarified within the legends and this is a great improvement in terms of clarity the underlying assumptions of the tests used are violated. The problem is that individual imaging positions or tracks are treated as independent and then analysed by ANOVA. As separate imaging positions within the same mouse are not independent, nor are individual cells within a single mouse, this makes the statistical analyses inappropriate. For a deeper analysis of this that is feasible within a review please see Lord, Samuel J., et al. "SuperPlots: Communicating reproducibility and variability in cell biology." The Journal of cell biology 219.6 (2020): e202001064. Ultimately, while this is a neat piece of software facilitating the analysis of complex data, the fact that it will produce flawed statistical analysis is a major problem. This problem is compounded by the fact that much imaging analysis has been analysed in this inappropriate manner in the past, leading to issues of interpretation and ultimately reproducibility.

    4. Reviewer #3 (Public review):

      The manuscript by Rios-Jimenez developed a software tool, BEHAV3D Tumor Profiler, to analyze 3D intravital imaging data and identify distinctive tumor cell migratory phenotypes based on the quantified 3D image data. Moreover, the heterogeneity module in this software tool can correlate the different cell migration phenotypes with variable features of the tumor microenvironment. Overall, this is a useful tool for intravital imaging data analysis and its open-source nature makes it accessible to all interested users.

      Strengths:

      An open-source software tool that can quantify cell migratory dynamics from intravital imaging data and identify distinctive migratory phenotypes that correlate with variable features of the tumor microenvironment.

      Weaknesses:

      Motility is only one tumor cell feature and is probably not sufficient to characterize and identify the heterogeneity of the tumor cell population that impacts their behaviors in the complex tumor microenvironment (TME). For instance, there are important non-tumor cell types in the TME, and the interaction dynamics of tumor cells with other cell types, e.g., fibroblasts and distinct immune cells, play a crucial role in regulating tumor behaviors. BEHAV3D-TP focuses on only motility feature analysis, and cannot be applied to analyze other tumor cell dynamic features or cell-cell interaction dynamics.

    5. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their positive and constructive comments on the manuscript. In the revised manuscript we addressed these comments, which we believe have improved the quality of our work.

      In summary:

      (1) We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work, which is to provide an analytical framework for IVM data after segmentation and tracking. Developing open-source segmentation and tracking tools represents a substantial undertaking in its own right, which has been comprehensively explored in other studies (e.g. https://doi.org/10.4049/jimmunol.2100811; https://doi.org/10.7554/eLife.60547; https://doi.org/10.1016/j.media.2022.102358; https://doi.org/10.1038/s41592024-02295-6 - now cited in our revised manuscript). 

      In our analyses, we used data processed with Imaris, a commercial software that, despite its limitations, is widely used by the intravital microscopy community due to its user-friendly platform for 3D image visualization and analysis. Nevertheless, recognizing the need for compatibility with tracking data from various pipelines, we have modified our tool to accept other data formats, such as those generated by open-source Fiji plugins like TrackMate, MTrackJ, ManualTracking (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input). These updates are available in our GitHub repository and are described in the revised manuscript. 

      (2) We appreciate the reviewer #3 suggestion to incorporate additional features into our analytical pipeline. In response, we have already updated the GitHub repository to allow users to input and select which features (dynamic, morphological, or spatial) they wish to include in the analysis (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readmeov-file#feature-selection ). In the revised manuscript, we highlighted this new functionality and provided examples using alternative datasets to demonstrate the application of these features.

      (3)  We appreciate the constructive feedback of reviewers #1 and #2 regarding the statistical analysis and interpretation of the data presented in Figures 3 and 4. We understand the importance of clarity and rigor in data analysis and presentation, and we addressed the concerns raised in the revised version of the manuscript.

      (4) We appreciate reviewer #1's suggestion regarding the inclusion of demo data, as we believe it would greatly enhance the usability of our pipeline. We acknowledge that this was an oversight on our part. To address this, we have now added demos to our GitHub repository (https://github.com/imAIgene-

      Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0/demo_datasets). In the revised manuscript, we referenced this addition and present new figures with examples of these demo’s processing different IVM dataset (2D/3D, different tumors and healthy tissues). Additionally, we have provided processed DMG IVM movie samples in an imaging repository.

      (5) Finally, we made some small changes to the manuscript based on the reviewers’ feedback.

      Below we provide a point-by-point response to the reviewers’ comments

      Reviewer #1 (Public review):

      Comment #1: A key limitation of the pipeline is that it does not overcome the main challenges and bottlenecks associated with processing and extracting quantitative cellular data from timelapse and longitudinal intravital images. This includes correcting breathing-induced movement artifacts, automated registration of longitudinal images taken over days/weeks, and accurate, automated segmentation and tracking of individual cells over time. Indeed, there are currently no standardised computational methods available for IVM data processing and analysis, with most laboratories relying on custom-built solutions or manual methods. This isn't made explicit in the manuscript early on (described below), and the researchers rely on expensive software packages such as IMARIS for image processing and data extraction to feed the required parameters into their pipeline. This limitation unfortunately reduces the likely impact of BEHAV3D-TP on the IVM field. 

      As highlighted above, the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence, and displacement) from intravital images. Indeed, to use the tool researchers must first extract dynamic cellular parameters from their IVM datasets, requiring access to expensive software (e.g. IMARIS as used here) and/or above-average computational expertise to develop and use custom-made open-source solutions. This limitation is not made explicit or discussed in the text.

      We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work and represent a substantial undertaking in their own right. Several studies (e.g., Diego Ulisse Pizzagalli et al., J Immunol (2022); Aby Joseph et al., eLife (2020); Molina-Moreno et al., Medical Image Analysis (2022); Hidalgo-Cenalmor et al., Nat Methods (2024); Ershov et al., Nat Methods (2022)) have comprehensively addressed these topics, and we now reference them in the revised manuscript to provide readers with relevant background.

      The objective of our manuscript is not to develop a complete segmentation or tracking pipeline but rather to introduce an analytical framework capable of extracting enhanced insights from the data generated by existing tools. This goal arises from our observations of the field: despite significant investment in image processing, researchers often rely on simplistic approaches, such as averaging single parameters across conditions, which can obscure tumor heterogeneity and spatial behavioral dynamics within the tumor microenvironment.

      Our current tool focuses on providing this much-needed analytical capability. For our analysis we used Imaris, a widely utilized software in the intravital microscopy (IVM) community, known for its intuitive 3D visualization and analysis platform despite certain limitations. 

      In our own literature search of recent IVM studies published by leading laboratories in high-impact journals, we found that close to half used Imaris, while the remainder primarily relied on manual workflows with Fiji plugins. Thus, we consider it valuable to offer a pipeline compatible with such commonly used software, given its prevalence in the field.

      However, following the suggestion of the reviewer, and to enhance the tool’s flexibility and compatibility, we have expanded the pipeline to accept data formats generated by open-source Fiji plugins, such as TrackMate, MTrackJ, and ManualTracking. These updates are detailed in the revised manuscript and are implemented in our GitHub repository (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ), where we also provide several demos using TrackMate and Imaris processed data. This addition demonstrates our tool's capability to integrate with segmented and tracked datasets from diverse platforms, increasing its applicability to a broader range of researchers using both commercial and open-source pipelines.

      Comment #2: The number of cells (e.g. per behavioural cluster), and the number of independent mice, represented in each result figure, is not included in the figure legends and are difficult to ascertain from the methods.

      We appreciate the reviewer's constructive feedback regarding the clarity of the number and type of replicates used in our analyses. In the revised manuscript, we have included detailed information in the figure legends and the number of independent mice represented in each figure legend to ensure transparency. Regarding the number

      of cells, we have indicated the total number of processed cells in Figure 2b legend (953 cells). Additionally, we have now included figures (Sup Fig 4c, Sup Fig 5e-g, Fig 5c,e, Sup Fig 6 c,d) for each cluster, where individual dots represent the individual cell tracks with color indicating the position and the shape indicating individual mice.

      Comment #3: The data used to test the pipeline in this manuscript is currently not available, making it difficult to assess its usability. It would be important to include this for researchers to use as a 'training dataset'.

      As stated above we acknowledge that this was an oversight on our part and thank the reviewer for pointing this out. To address this, we have now added demo data to our GitHub repository (BEHAV3D_Tumor_Profiler/demo_datasets at main · imAIgeneDream3D/BEHAV3D_Tumor_Profiler · GitHub). In the revised manuscript we have referenced this addition in the Data availability section. Since we included now processing with Fiji as well, we provide 4 demo datasets (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler/tree/main/demo_datasets), one processed with Imaris in 3D; and one with CellPose2.0 and Trackmate in 2D; one processed with µSAM and Trackmate in 3D and one manually processed with MtrackJ in 2D . Moreover, we now provide Imaris-processed DMG IVM movie samples in an open-source repository.

      Comment #4: Precisely how the BEHAV3D-TP large-scale phenotyping module can map large-scale spatial phenotyping data generated using LSR-3D imaging data and Cytomap to 3D intravital imaging movies is unclear. Further details in the text and methods would be beneficial to aid understanding.

      We appreciate the reviewer’s comment and in the revised manuscript we have now provided details in the methods section “Tumor large-scale spatial phenotyping with Cytomap” to clarify how the BEHAV3D-TP module maps LSR-3D and Cytomap data to 3D intravital imaging movies:

      “To map the assigned regions onto IVM movies, a 3D image of the cluster distribution within the tumor was generated and exported for each sample (Figure Supplement 5a). Next, regions within the IVM movies were visually matched to the corresponding regions identified by the Large-Scale Phenotyping module of Cytomap (Figure 3c). For each mouse, at least one or two representative positions per matched region type were selected, cropped, and analyzed to assess tumor cell behavior, following the previously described cell tracking methodology (Imaris Cell tracking).”

      Moreover, we updated Figure 3 c to further clarify these steps.

      Comment #5: The analysis provides only preliminary evidence in support of the authors' conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. Conclusions should therefore be tempered in the absence of additional experiments and controls. 

      We appreciate the reviewer’s comment and acknowledge that our conclusions should be tempered due to the preliminary nature of our evidence. In the revised version of the manuscript we have revised our conclusions accordingly and emphasize the necessity for additional experiments and controls to further validate our findings on DMG cell migratory behaviors and their relationship with the tumor microenvironment.

      In discussion: “While our findings suggest that microenvironmental factors may influence tumor cell migration, further studies will be necessary to establish causal relationships. Additional experimental validation, such as macrophage ablation experiments, could help clarify the specific contributions of these factors.”

      Reviewer #1 (Recommendations for the authors): 

      (1) To test the ability of the pipeline to identify relevant patterns of migratory behaviours additional 'control' experiments would be helpful e.g. comparing non-invasive vs invasive tumour cell lines, artificially controlling migratory behaviours of cells such as implanting beads soaked in factors that would attract/repel cells? 

      (2) Does the pipeline work well for a variety of cell types/contexts? e.g. can it identify and cluster more subtle migratory behaviours such as non-tumour cells during tissue development or regeneration conditions? 

      We appreciate the reviewer’s valuable suggestions. In the revised manuscript, we have included additional examples demonstrating the capability of our pipeline to investigate heterogeneous cell behavior across two additional experimental setups:

      (1) We have now evaluated our BEHAV3D TP heterogeneity module using IVM data from breast cancer cell lines with varying migratory capacities (DOI: 10.1016/j.yexcr.2019.04.009). In these datasets, our pipeline extends beyond predefined characteristics based solely on speed, enabling the identification of distinct cell populations. Notably, our analysis reveals that the breast cancer lines exhibit different proportions of different migratory behaviors such as Fast, Intermediate, Very slow and Static (Supplementary Fig 1).

      (2) We have now evaluated our BEHAV3D TP heterogeneity module using IVM data from healthy breast epithelial cells (DOI: 10.1016/j.celrep.2024.115073), where we identify distinct morhophynamic epithelial cell populations in the terminal end but of the mammary gland that have a distinct distribution among Hormone receptor (HR) + and HR- terminal end but cells.

      (3) To support biological conclusions could the authors show that ablating tumourassociated macrophages or vasculature alters the migratory patterns of nearby tumour cells? 

      We appreciate the reviewer's suggestion regarding the potential effects of ablating tumor-associated macrophages or vasculature on the migratory patterns of nearby tumor cells. While these experiments would functionally validate the observations made by our method, we would like to clarify that the primary focus of our study was on the development and application of computational tools for behavioral analysis and thus we consider that delving deeper in understanding the biology behind our observation is out of the scope of the current study. However, as mentioned previously, we have carefully tempered our conclusions to acknowledge the limitations of our current study. In the revised manuscript, we explicitly highlight that experiments involving the ablation of tumor-associated macrophages or vasculature would be crucial for further understanding the biological relevance of our findings.

      Minor corrections to text: 

      (4) Line 63 - are references formatted correctly?

      Thank you for pointing out this error. We have corrected it in the revised manuscript.

      (5) Lines 161 -162 - 'intravitally imaged' used twice in a sentence.

      Thank you for pointing out the typo. We have corrected it in the revised manuscript.

      Reviewer #2 (Public review):

      Comment#1: The strength of democratizing this kind of analysis is undercut by the reliance upon Imaris for segmentation, so it would be nice if this was changed to an open-source option for track generation.

      As noted in our previous response to Reviewer #1, we would like to point out that although Imaris is a commercial software, it is widely used in the intravital microscopy community due to its user-friendly interface. We conducted a literature review to evaluate this aspect and below we include references from leading laboratories in the IVM field that utilize Imaris. One of its key advantages, which we also utilized, is semi-automated data tracking that allows for manual corrections in 3D—a process that can be more challenging in other open-source software with less effective data visualization.

      However, we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have updated our tool to support 2D and 3D data formats generated by open-source Fiji plugins like TrackMate, MTrackJ, and ManualTracking, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In the revised manuscript, we describe the new functionality and demonstrate the operation of the BEHAV3D-TP heterogeneity module across various IVM datasets, processed in both 2D and 3D with different processing pipelines (Supplementary Fig 1-3). This includes CellPose 2.0 and the novel 'Segment Anything' model, followed by TrackMate tracking, applied to both tumor and healthy IVM data. Moreover we have developed a new web application that integrates morphological and tracking information from Segment Anything segmentation and Trackmate tracking, depicted in Supplementary Fig 3 a (https://morphotrack-merger.streamlit.app/ ). Additionally, we have updated the introduction to better clarify the scope of our study and include references to existing image processing solutions.

      Comment#2: The main issue is with the interpretation of the biological data in Figure 3 where ANOVA was used to analyse the proportional distribution of different clusters. Firstly the n is not listed so it is unclear if this represents an n of 3 where each mouse is an individual or whether each track is being treated as a test unit. If the latter this is seriously flawed as these tracks can't be treated as independent. Also, a more appropriate test would be something like a Chi-squared test or Fisher's exact test. Also, no error bars are included on the stacked bar graphs making interpretation impossible. Ultimately this is severely flawed and also appears to show very small differences which may be statistically different but may not represent biologically important findings. This would need further study.

      We appreciate the reviewer’s insightful comments regarding the interpretation of the biological data in Figure 3. 

      To clarify, each imaged position is considered an independent biological replicate (n = 18 from a total of 6 mice). We acknowledge that the description of the statistical methods and the experimental units was not sufficiently clear in the previous version. In our original submission, we used an ANOVA to test whether the proportion of each behavioral cluster differed across the tumor microenvironment regions. Post hoc pairwise comparisons were performed using Tukey’s test, with the results shown in Supplementary Figure 2d (currently Fig 3d). However, we agree with the reviewer that this approach may be misleading when paired with stacked bar plots that lack error bars, as it can obscure individual variability and does not explicitly represent statistical uncertainty.

      In the revised manuscript, we present the data as boxplots with individual data points, where each dot represents an imaged position, and the shape corresponds to a specific mouse. In Figure 3 d the y-axis displays the normalized percentage of each cluster across TME regions, expressed as z-scores. This normalization corrects for inter-mouse variability and facilitates a comparison of the relative distribution of clusters across TME regions, independent of the overall abundance differences between mice. We performed an ANOVA with Tukey's post hoc test for each individual behavioral cluster to assess differences across TME regions. Additionally, for transparency, in Supplementary Figure 5 d we provide the raw percentage values. The legends provide the number of positions and mice included in the analysis. 

      Comment#3:  Figure 4 has similar statistical issues in that the n is not listed and, again, it is unclear whether they are treating each cell track as independent which, again, would be inappropriate. The best practice for this type of data would be the use of super plots as outlined in Lord et al. (2020) JCI - SuperPlots: Communicating reproducibility and variability in cell biology.

      We appreciate the reviewer’s comments and suggestions regarding Figure 4. In this case as we are comparing overall the behavioral clusters features, each individual cell is treated as a unit. In the revised manuscript, we have clarified this point in the figure legend and incorporated plots in Figure 4c and 4e, indicating the mouse and imaging position each data point originates from. This enhances the visualization of reproducibility and variability in our data, demonstrating that the results are consistent across multiple mice and positions and are not driven by a single mouse or imaging position.

      Comment#4: The main issue that this raises is that the large-scale phenotyping module and the heterogeneity module appear designed to produce these statistical analyses that are used in these figures and, if they are based on the assumption that each track is independent, then this will produce inappropriate analyses as a default.

      We appreciate the reviewer’s comment, although we are unclear about the specific concern being raised. To clarify, in our large-scale phenotyping analysis, each position is assigned to a TME niche based on the CytoMAP analysis and the workflow outlined in Figure 3c. Multiple positions are imaged per mouse. For each position, we measure the proportion of tumor cells exhibiting a specific behavioral phenotype, and these proportions are subsequently used for statistical analysis (Figure 3 d). 

      In contrast, in Supplementary Fig. 5e-g, we treat each cell track as an individual unit, grouping them by their assigned large-scale region. Here, we assess whether differences between regions can be detected using a conventional single-feature analysis—a more traditional approach. However, we find that this method loses important behavioral patterns and distinctions that BEHAV3D-TP captures.

      We hope that this explanation, along with the modifications made to the figures and figure legends, provides greater clarity.  

      Reviewer #3 (Public review):

      Comment #1: The most challenging task of analyzing 3D time-lapse imaging data is to accurately segment and track the individual cells in 3D over a long time duration. BEHAV3D Tumor Profiler did not provide any new advancement in this regard, and instead relies on commercial software, Imaris, for this critical step. Imaris is known to have a very high error rate when used for analyzing 3D time-lapse data. In the Methods section, the authors themselves stated that "Tumor cell tracks were manually corrected to ensure accurate tracking". Based on our own experience of using Imaris, such manual correction is tedious and often required for every time step of the movie. Therefore, Imaris is not a satisfactory tool for analyzing 3D time-lapse data. Moreover, Imaris is expensive and many research labs probably can't afford to buy it. The fact that BEHAV3D Tumor Profiler critically depends on the faulty ImarisTrack module makes it unclear whether the BEHAV3D tool or the results are reliable.

      If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D.

      We appreciate the reviewer’s comments on the challenges of segmenting and tracking individual cells in 3D time-lapse imaging data. As mentioned previously (please refer to comment #1 to reviewer #1), our primary focus is to develop an analytical tool for comprehensive data analysis rather than developing tools for image processing. However to enhance accessibility, we have updated our tool to support data formats from open-source Fiji plugins, such as TrackMate, which will benefit users without access to commercial software (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In Supplementary Figures 1, 2, and 3, we present IVM data from different sources, processed using three distinct methods: MTrackJ (Supplementary Fig. 1), Cellpose + TrackMate (Supplementary Fig. 2), and µSAM + TrackMate (Supplementary Fig. 3). The latter two represent state-of-the-art deep learning approaches.

      On the other hand, while we recognize the limitations of Imaris, it remains widely used in the intravital microscopy community due to its user-friendly interface for 3D visualization and semi-automated segmentation capabilities. Since no perfect tracking method currently exists, we initially utilized Imaris for its ability to allow manual correction of faulty tracks, ensuring the reliability of our results. This approach, not only widely used (see above) but was the best available option when we began our analysis, allowing us to obtain accurate results efficiently.

      In the revised manuscript, we clarify the scope of our study and provide information on both Imaris and alternative processing options to strengthen the reliability of our findings:

      In introduction: “While significant efforts have been made to develop opensource segmentation and tracking tools for live imaging data, including IVM22–27 fewer tools exist for the unbiased analysis of tumor dynamics. One major barrier is that implementing such analytical methods often requires substantial computational expertise, limiting accessibility for many biomedical researchers conducting IVM experiments. To bridge this gap, we present BEHAV3D Tumor Profiler (BEHAV3D-TP)  by providing a robust, user-friendly tool that allows researchers to extract meaningful insights from dynamic cellular behaviors without requiring advanced programming skills.”

      In the Methods, we describe now describe not only Imaris processing pipeline, but also the µSAM segmentation pipelines and reference to CellPose IVM processing, which are combined with TrackMate for tracking. Additionally, to integrate morphological information from µSAM with tracking data from TrackMate, we developed a web tool to merge the outputs from both processing steps: https://morphotrack-merger.streamlit.app/  

      Comment #2: The authors developed a "Heterogeneity module" to extract distinctive tumor migratory phenotypes from the cell tracks quantified by Imaris. The cell tracks of the individual tumor cells are all quite short, indicating relatively low motility of the tumor cells. It's unclear whether such short migratory tracks are sufficient to warrant the PCA analysis to identify the 7 distinctive migratory phenotypes shown in Figure 2d. It's also unclear whether these 7 migratory phenotypes correspond to unique functional phenotypes.  

      For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells.

      While some tumor cells exhibit limited motility, indicated by short tracks, others demonstrate significant migratory capabilities (Figure 2 Invading and Retreating cells). This variability in tumor cell behavior is a central focus of our analysis, and our tool is specifically designed to identify and distinguish these differences. Our PCA analysis effectively captures this variability, as illustrated in Figure 2 d-f. It differentiates between cells exhibiting varying degrees of migratory behavior, including both highly and less migratory phenotypes, as well as their directionality relative to the tumor core and the persistence of their movements. Thus, we believe that our approach provides valuable insights into the distinct migratory phenotypes within the tumor microenvironment. 

      While our current manuscript does not provide explicit evidence linking each motility cluster to functional differences among the tumor cells, it is important to note that the state of the field supports the idea that cell dynamics can predict cell states and phenotypes. Research conducted by ourselves (Dekkers, Alieva et al., Nat Biotech, 2023) and others, such as Craiciuc et al. (Nature, 2022) and Freckmann et al. (Nat Comm, 2022) has shown that variations in cell motility patterns are indicative of underlying functional characteristics. For instance, cell morphodynamic features have been shown to reflect differences in cell types, T cell targeting states (Dekkers, Alieva et al., Nat Biotech, 2023), immune cell types (Crainiciuc et al. (Nature, 2022)), tumor metastatic potential, and drug resistance states (Freckmann et al. (Nat Comm, 2022)). In the revised manuscript, we have referenced relevant studies to underscore the biological significance of these behaviors. By doing so, we hope to clarify the potential implications of our findings and strengthen the overall narrative of our research:

      In discussion: “While our current study does not provide direct functional validation of the distinct motility clusters identified, existing literature strongly supports the notion that cell dynamics can serve as a proxy for functional states and phenotypic heterogeneity. Prior work, including studies by our group[19,66]  as well as Crainiciuc et al.[35] and Freckmann et al.[20], has demonstrated that variations in cell motility patterns can reflect underlying functional characteristics. Specifically, cell morpho-dynamic features have been shown to correlate with differences in cell type identity, T-cell engagement, metastatic potential, and drug resistance states. This growing body of evidence suggests that tumor cell behavior, as captured by BEHAV3D-TP, may serve as a predictive tool for deciphering functional tumor heterogeneity. Future studies integrating transcriptomic or proteomic profiling of motility-defined subpopulations could further elucidate the biological significance of these behavioral phenotypes.”

      Comment #3: Using only motility to classify tumor cell behaviours in the tumor microenvironment (TME) is probably not sufficient to capture the tumor cell difference. There are also other non-tumor cell types in the TME. If the authors aim to develop a computational tool that can elucidate tumor cell behaviors in the TME, they should consider other tumor cell features, e.g., morphology, proliferation state, and tumor cell interaction with other cell types, e.g., fibroblasts and distinct immune cells.

      The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.

      We believe that using dynamic features alone is sufficient to capture differences in tumor behavior, as demonstrated by our results in Figure 2. However, we appreciate the reviewer’s suggestion to consider additional features, such as cell morphology, to finetune our analyses. To this end, we have adapted our pipeline to be compatible with any dynamic, morphologic or spatial features present in the data. In the revised manuscript we showcase this new addition with the analyses of two new dataset: 2D IVM data from healthy epithelial breast cells (Supplementary Fig 2) and 3D IVM data from adult gliomas (Supplementary Fig 3). These analyses identified cells with specific morphodynamic characteristics, which exhibited distinct kinetic behaviors or spatial distributions.

      However, we would like to point out that not all features may provide informative insights and that a wide range of features can instead introduce biologically irrelevant noise, making interpretation more challenging. For instance, in 3D microscopy, the zaxis resolution is typically lower, which can lead to artifacts like elongation in that direction. Adding morphological features that capture this may skew the analysis. Therefore, we believe that incorporating additional features should be approached with caution. We clarify these considerations in the revised manuscript to better guide users in utilizing our computational tool effectively:

      In discussion: “In addition to motility-based classification, features such as tumor cell morphology, proliferation state, and interactions with the tumor microenvironment can further refine tumor phenotyping. BEHAV3D-TP allows for the selection of diverse feature types, supporting datasets that include both dynamic, morphological and spatial parameters. However, we recognize that expanding the feature set may introduce biologically irrelevant noise, particularly in 3D microscopy data where limited z-axis resolution can lead to morphological artifacts. This highlights the potential need in the future to include unbiased feature selection strategies, such as bootstrapping methods67, to ensure the identification of meaningful and biologically relevant parameters. Careful consideration of these aspects is key to maximizing the interpretability and predictive value of analyses performed with BEHAV3D-TP.”

      Comment #4: The authors have already published two papers on BEHAV3D [Alieva M et al. Nat Protoc. 2024 Jul;19(7): 2052-2084; Dekkers JF, et al. Nat Biotechnol. 2023 Jan;41(1):60-69]. Although the previous two papers used BEHAV3D to analyze T cells, the basic pipeline and computational steps are similar, in particular regarding cell segmentation and tracking. The addition of a "Heterogeneity module" based on PCA analysis does not make a significant advancement in terms of image analysis and quantification.

      We want to emphasize that we have no intention of duplicating our previous publications. In this manuscript, we have consistently cited our foundational papers, where BEHAV3D was first developed for T cell migratory analysis in in vitro settings. In the introduction, we clearly state that our earlier work inspired us to adopt a similar approach for analyzing cell behavior in intravital microscopy (IVM) data, addressing the specific needs and complexities of analyzing tumor cell behaviors in the tumor microenvironment.

      Importantly, our new work provides several key advancements: 1) a pipeline specifically adapted for intravital microscopy (IVM) data; 2) integration of spatial characteristics from both large-scale and small-scale phenotyping; and 3) a zero-code approach designed to empower researchers without coding skills to effectively utilize the tool. We believe that these enhancements represent meaningful progress in the analysis of cell behaviors within the tumor microenvironment which will be valuable for the IVM community. We ensure that these points are clearly articulated in the revised manuscript:

      In introduction: “In line with this concept of characterizing cellular dynamic properties for cell classification, we have previously developed an analytical platform termed BEHAV3D 19,21 allowing to perform behavioral phenotyping of engineered T cells targeting cancer. While BEHAV3D was initially developed to analyze T cell migratory behavior under controlled in vitro conditions, we sought to expand its application to investigate tumor cell behaviors in IVM data, where the complexity of the TME presents distinct analytical challenges. This manuscript builds on our foundational work but represents a significant advancement by adapting the pipeline specifically for IVM datasets.”

      Reviewer #3 (Recommendations for the authors): 

      (1) If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D. 

      We thank the reviewer for this recommendation and as stated above we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have updated our tool to support data formats generated by open-source Fiji plugins like TrackMate, MTrackJ, and ManualTracking, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In the revised manuscript, we detail this new functionality and demonstrate the operation of the BEHAV3D-TP heterogeneity module using an example dataset of glioma tumors.

      Additionally, we have updated the introduction to better clarify the scope of our study (See comment #1 from Review #3) and include references to existing image processing solutions.

      (2) For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells. 

      As noted in the comment above, the revised manuscript now incorporates references to relevant literature that support our understanding that behavioral differences among cells are driven by their underlying functional differences (See comment #2 from Reviewer #3). Additionally, we would like to point to Figure 2d and Supplementary Fig 4 c that provide evidence of the functional distinctions between the identified clusters.

      (3) The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have added the flexibility to incorporate a wide range of features, including morphological ones, and enabled users to select the specific features they wish to include in their analysis. To illustrate this functionality, we have included 2 example dataset analyzed using this approach (See comment #3 from Reviewer #3). Additionally, as indicated above we emphasize the importance of careful selection and interpretation of features, as improper choices may lead to biologically irrelevant results. This clarification is intended to ensure that users apply the tool thoughtfully and derive meaningful insights.

    1. eLife Assessment

      This important study investigates the role of Drp1 in early embryo development. The authors have addressed most of the original comments and the work now presents convincing evidence on how this protein influences mitochondrial localization and partitioning during the first embryonic divisions. The research employs the Trim-Away technique to eliminate Drp1 in zygotes, revealing critical insights into mitochondrial clustering, spindle formation, and embryonic development.

    2. Reviewer #1 (Public review):

      Summary:

      Gekko, Nomura et al., show that Drp1 elimination in zygotes using the Trim-Away ttechnique leads to mitochondrial clustering and uneven mitochondrial partitioning during the first embryonic cleavage, resulting in embryonic arrest. They monitor organellar localization and partitioning using specific targeted fluorophores. They also describe the effects of mitochondrial clustering in spindle formation and the detrimental effect of uneven mitochondrial partitioning to daughter cells.

      Strengths:

      The authors have gathered solid evidence for the uneven segregation of mitochondria upon Drp1 depletion through different means: mitochondrial labelling, ATP labelling and mtDNA copy number assessement in each daughter cell. Authors have also characterised the defects in cleavage mitotic spindles upon Drp1 loss

      Weaknesses:

      This study convincingly describes the phenotype seen upon Drp1 loss. However, it remains descriptive. Further studies should be conducted to elucidate the mechanism by which Drp1 ensures even mitochondrial partitioning during the first embryonic cleavage.

    3. Reviewer #2 (Public review):

      Gekko et al investigate the impact of perturbing mitochondrial during early embryo development, through modulation of the mitochondrial fission protein Drp1 using Trim-Away technology. They aimed to validate a role for mitochondrial dynamics in modulating chromosomal segregation, mitochondrial inheritance and embryo development and achieve this through the examination of mitochondrial and endoplasmic reticulum distribution, as well as actin filament involvement, using targeted plasmids, molecular probes and TEM in pronuclear stage embryos through the first cleavages divisions. Drp1 deletion perturbed mitochondrial distribution, leading to asymmetric partitioning of mitochondria to the 2-cell stage embryo, prevented appropriate chromosomal segregation and culminated in embryo arrest. Resultant 2-cell embryos displayed altered ATP, mtDNA and calcium levels. Microinjection of Drp1 mRNA partially rescued embryo development. A role for actin filaments in mitochondrial inheritance is described, however the actin-based motor Myo19 does not appear to contribute.

      Overall, this study builds upon their previous work and provides further support for a role of mitochondrial dynamics in mediating chromosomal segregation and mitochondrial inheritance. In particular, Drp1 is required for redistribution of mitochondria to support symmetric partitioning and support ongoing development.

      Strengths:<br /> The study is well designed, the methods appropriate and the results clearly presented. The findings are nicely summarised in a schematic.

      The addition of further quantification, including mitochondrial cluster size, elongation/aspect ratio and ROS, as requested by the reviewers, has provided further evidence for the impact of Drp1 depletion on mitochondrial morphology and function.

      Understanding the role of mitochondria in binucleation and mitochondrial inheritance is of clinical relevance for patients undergoing infertility treatment, particularly those undergoing mitochondrial replacement therapy.

      Weaknesses (original manuscript):<br /> The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      Weaknesses (revised manuscript):

      The only remaining weakness is that the authors have not undertaken additional experiments to clarify any role for mitochondrial transport following Drp1 depletion.

    4. Reviewer #3 (Public review):

      Why mitochondria are finely maintained in the female germ cell (oocyte), zygotes, and preimplantation embryos? Mitochondrial fusion seems beneficial in somatic cells to compensate for unhealthy mitochondria, for example, mitochondria with mutated mtDNA that potentially defuel the respiratory activity if accumulated above a certain threshold. However, in the germ cells, it may rather increase the risk of transmitting mutated mtDNA to the next generation. Also, finely maintained mitochondria would also be beneficial for efficient removal when damaged, as authors briefly discussed. Due in part to the limited suitable model, physiological role of mitochondrial fission in embryos were obscure. In this study, authors demonstrated that mitochondrial fission prevents multiple adverse outcomes, especially including the aberrant demixing of parental genome (a clinical phenotype of human embryos) in zygotic stage. Thus, this study would be also of clinical importance that could contribute by proposing a novel mechanism.

      After reading through the comments of other reviewers, what authors could potentially improve their manuscript had been largely summarized in three following points.

      (1) Authors would better clarify whether a loss of Drp1 contributes to the chromosome segregation defects directly (e.g. checking SAC-like activity) or indirectly (aggregated mitochondria became physically obstacle; maybe in part getting the cytoskeleton involved).

      (2) Although the level of Myo19 may not be so high (given the low level of TRAK2 in oocytes: Lee et al. PNAS 2024, PMID 38917013), authors would better further clarify the effect of Myo19-Trim with timelapse (e.g. EB3-GFP/Mt-DsRed) and EM analysis (detailed mitochondrial architecture).

      (3) Authors would better clarify phenotypic heterogeneity/variety regarding the degree of alteration in mitochondrial morphology/ architecture dependent on the levels of Drp1 loss with detailed quantification of EM images to address why aggregation of mitochondria in Drp1-/- parthenote (possibly, more likely Drp1 protein-free) looks different/weaker than Trim-awayed one. Employment of the parthenotes of Trim-awayed MII oocytes might also complement the further discussion.

      The revised preprinted have addressed all the points described above. Authors have also adequately indicated the limitations at each of the specific points. Revisions authors made have consolidated their conclusion, thus still, making this study an excellent one.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      We thank reviewer 1 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.

      It would be difficult to answer from this study whether Drp1 plays a role beyond mitochondrial fission in zygotes. However, the reasons why Drp1 KO zygotes differ from the somatic Drp1 KO model can be discussed as follows.

      First, the reviewer mentioned that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures (Udagawa et al., Curr Biol. 2014, PMID: 25264261, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. Mitochondria in oocytes/zygotes have the shape of a small sphere with an irregular cristae located peripherally. These structural features may be the cause of insensitivity or resistance to inner membrane fusion the resultant failure to form tubular mitochondria as seen in somatic cell models. Nonetheless, quantitative analysis of EM images in the revised version confirmed that the mitochondria of Drp1-depleted embryos were not only enlarged but also significantly elongated (Figure 2J-2M). Therefore, in Drp1-depleted embryos, significant structural and functional (e.g., asymmetry between daughters) changes in mitochondria were observed, and these are expected to lead to defects in the embryonic development.

      As for mitochondrial transport, we do not fully understand the intent of this question, but we do not entirely rule out mitochondrial transport. At least clustered mitochondria did not disperse again, but how mitochondria behave through the cytoskeleton within clusters will require further study, as the reviewer pointed out.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors show no effect of Myo19 Trim-Away, yet it remains unclear whether myo19 is involved in the positioning of mitochondria around the spindle. Judging by their co-localization during that stage, it might be. Therefore, in the absence of myo19, mitochondria might remain evenly distributed throughout mitosis, thus passively resulting in equal partitioning to daughter cells, with no severe developmental defects. Could the authors show a video of the whole process and discuss it?

      We have newly performed live imaging of mitochondria and chromosomes in Myo19 Trim-Away zygotes (n=13). As shown in Figure 1-figure supplement 2 and Figure 1-Video 2, there were no obvious changes in mitochondrial (and chromosomal) dynamics throughout the first cleavage and no significant mitochondrial asymmetry was observed, Therefore, we conclude that depletion of Myo19 does not cause mitochondrial asymmetry during embryonic cleavage. These results are described in the revised manuscript (Line 218-221).

      (2) Mitochondrial aggregation upon Drp1 depletion should be characterized in more detail: for example, % of mitochondria free, % in small clusters (> X diameter), and % in big clusters (>Y diameter).

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). In control embryos, mitochondria were interspersed in a large number of small clusters, while in Drp1-depleted embryos, mitochondria became highly aggregated into a small number of large clusters that was reversed by expression of mCh-Drp1. These results are described in the revised manuscript (Line 242-245).

      (3) The discrepancies with parthenogenetic embryos derived from Drp1 (-/-) parthenotes should be commented on. Quantification of the dimensions of the clusters would help establish the degree of similarity/difference. Could the authors comment on their hypothesis as to why the clusters are remarkably larger in Drp1 depleted zygotes?

      In the revised version, we have quantified the mitochondrial aggregation in Drp1 KO parthenotes (Figure 2-figure supplement 1; the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). The size of mitochondrial clusters in Drp1 KO parthenotes was significantly increased compared to controls, but as the reviewer noted, mitochondrial aggregation appears to be moderate compared to that in Drp1-depleted embryos. The phenotypic discrepancies in two Drp1-deficient embryo models is discussed below.

      First, it is clear that phenotypic severity of Drp1 KO oocytes is dependent on the age of the female. Indeed, oocytes collected from 8-week-old female arrested meiosis after NEB, mainly due to marked mitochondrial aggregation (Udagawa et al., Curr Biol. 2014, PMID: 25264261), whereas oocytes from juvenile female completed meiosis (Adhikari et al., Sci Adv. 2022, PMID: 35704569), and thus Drp1 KO pathenotes were obtained from juvenile female in the present study. Comparison of mitochondrial morphology in Drp1 KO oocytes in both papers also suggests that mitochondrial aggregation in adult mice is more intense (Udagawa et al., Curr Biol. Fig. 2A) than in juvenile mice (Adhikari et al., Sci Adv. 2022: Fig. 1G, 1H), and appears to be similar to Drp1-depleted embryos in this study (Figure 2E). There may be differences in the level of Drp1 depletion in these Drp1-deficient oocytes/zygotes. Similar results occurring between juvenile and adult KO female have been reported in a previous paper (Yueh et al., Development 2021, PMID: 34935904), as adult-derived Smac3<sup>Δ/Δ<?sup> zygotes arrested at the 2-cell stage, whereas juvenile-derived Smac3<sup>Δ/Δ<?sup> zygotes have developmental competence comparable to the wild type. Remarkably, the SMC3 protein levels in juvenile Smac3<sup>Δ/Δ<?sup> oocytes was also comparable to Smc3<sup>fl/fl</sup>. The authors surmised that the decline maternal SMC3 between juvenile and sexual maturity is probably due to the continuous induction of the promoter-Cre driver, suggesting that similar induction may also occur in Drp1 KO oocytes. In addition, we also observed not only age differences but also batch differences in Drp1 KO oocytes (and resulting embryos) such that little mitochondrial aggregation was observed in oocytes collected from some juvenile KO colonies. Therefore, for KO models showing age (sexual maturation)-dependent gradual phenotypic changes, Trim-way may be an approach that provides more reproducible results as it induces acute degradation of maternal proteins.

      (4) Mitochondrial clusters in Drp1 trim-away zygotes resemble those seen when defects in mitochondrial positioning are obtained by TRAK2 induction (PMID: 38917013), pointing again to a role of actin in the clustering process. Could the authors explore the role of actin further?

      TRAK2 and microtubule-dependent mechanisms may also be involved in mitochondrial dynamics during the first cleavage division, possibly in association with migration of two pronuclei. Although the mitochondrial aggregation induced by TRAK2 overexpression is similar to that in Drp1-depleted embryos, it is unlikely that changes at the EM level occurred as seen in Drp1-depleted embryos (enlarged mitochondria, etc.). In addition, in TRAK2-overexpressing embryos, rather than uneven partitioning of mitochondria, the daughter blatomeres themselves were uneven in size after cleavage, making it difficult to precisely assess the similarity between the two models.

      Regarding the role of F-actin, we show that the subcellular distribution of cytoplasmic actin overlaps with that of mitochondria throughout the first cleavage and seems to accumulate in aggregated mitochondria, particularly during the mitotic phase, as higher correlation was observed (Figure 1E). Although it was not observed that actin and the myo19 motor regulate mitochondrial partitioning, as reported in somatic cell-based studies, it is possible that actin accumulated in mitochondria may be indirectly involved in mitochondrial dynamics via mitochondrial fission. For example, inverted formin 2 (INF2) enhance actin polymerization and is required for efficient mitochondrial fission as an upstream function of Drp1 (Korobova et al., Science 2013, PMID: 23349293). In the revised manuscript, we have added the description on this point. (Line 452-456)

      (5) Electron microscopy images showed indeed aberrant morphology of the mitochondria, yet not a hyperfused morphology. Aspect ratio (long/short axis) quantification should be included, besides the current measurement, since mitochondria in Drp1 trim-away look bigger yet as round as in the control.

      In the revised version, detailed quantitative data on EM images has been added (Figure 2J-2M). In Drp1 depleted embryos, significant increases were observed in both the major and minor axes of mitochondria. As the reviewer noted, we also assumed that mitochondria in depleted embryos were enlarged rather than elongated, but the quantification of aspect ratio shows that significant elongation occurred. These results has been described in the revised manuscript (Line 252-256).

      (6) Why are mitochondria in golgi-mcherry-expressing cells showing a different morphology of the clusters?

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      (7) Authors comment on ROS being enriched (highly accumulated) in mitochondria. However, while quantification is missing, it might seem that ROS are equally distributed in control or Drp1 Trim-Away embryos. Could the authors quantify ROS signal inside and outside of the mitochondria, perhaps using a mask drawn by mitotracker? Furthermore, it would make these data more convincing to artificially induce/deplete ROS to validate the sensitivity of the technique to variations. Also, why is ROS pattern referred to as ectopic?

      Thank you for your useful suggestions. In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E). The term ectopic was used to mean excessive accumulation of ROS in the mitochondria compared to normal embryos, but has been deleted as it is not very accurate.

      Minor comments:

      (A) Video 1: images at t=-00:20 and t=00:00 of the mtGFP are actually the same images as H2B-mCherry.

      Probably a faulty filter/shutter control failed to capture GFP fluorescence at these times. It appears that the autocontrast function detected a small amount of mCherry fluorescence leakage. It would be possible to replace it with another video, but as the relevant frame were unrelated to the analysis, the previous video was used as is. The same problem also occurs in the newly added Myo19-depleted zygote movie (Figure 1-Video 2, 03:15).

      (B) Could you calculate the degree of colocalization between mt-GFP and ER-mCherry in ctrl and Drp1 trim-away? While it is apparent that ER is somehow more associated with mitochondrial clusters, it would be informative to quantify it.

      Since the ER is partially confined to the mitochondrial aggregation site, it was difficult to calculate correlation coefficients from fluorescence images of mt-GFP and ER-mCherry to quantitatively assess colocalization. Instead, line scan analysis of whole mitochondrial clumps showed that the peak of the ER-mCherry signal overlaps with that of mt-GFP, but this is not the case for Golgi-mCherry or peroxisome-mCherry (Figure 2-figure supplement 2A-2C).

      (C) Regarding the developmental arrest: The quantification of the different stages at each developmental time could be more informative. For example, at E4.5 how many embryos are at each stage (2-cell, 4-cell, ... blastocyst)? Also, could the authors comment on the reduction in developmental competence in Figure 4C, regarding the blastocyst stage?

      Many arrested embryos do not maintain their morphologies and undergo a unique degenerative process over time, known as cell fragmentation. Therefore, it is difficult to accurately determine the number of each developmental stage at, for example, E4.5 days. In this study, the 2-cell stage was observed at E1.5, the 4-8 cell at E2.5-E3.0, morula at E3.5 and the blastocyst at E4.5.

      Although the rate of embryos reaching the blastocyst stage was reduced compared to that of normal embryos, the overexpression of mCh-Drp1 may explain the failure of complete restoration of developmental competence, since embryos injected solely with mCh-Drp1 mRNA also showed reduced developmental competence. For rescue experiments, the comparison with internal controls is more important and therefore we described below. This is a specific effect of Drp1 deletion because none of the internal control conditions increased arrest at the 2-cell stage and arrest was completely reversed by microinjecting Trim-away insensitive exogenous mCh-Drp1 mRNA (Line 337-340).

      (D) In lines 103 to 105, proliferation should be changed to division or development.

      In the revised version, proliferation has been changed to division (Line 103).

      (E) Could the authors reference the statement in lines 168-169?

      The following 3 references have been added (Hardy et al., 1993, PMID: 8410824; Meriano et al., 2004, PMID: 15588469; Seikkula et al., 2018, PMID: 29525505).

      (F) Line 448: "Cells lacking Drp1 have highly elongated mitochondria that cannot be divided into transportable units,..." This is clearly not the case for zygotes, so why are then these mitochondria still clustering and not transported elsewhere?

      Although it is difficult to answer this reviewer's question precisely, EM images of Drp1-depleted embryos suggest that individual mitochondria appear not only to be enlarged but also to have increased outer membrane attachment due to excessive aggregation. Thus, these large mitochondrial clumps may therefore be preventing transport.

      Reviewer #2 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      In the revised version, the time after hCG has been indicated (Line 176-182). In subsequent Drp1 depletion experiments, the revised version notes that “no significant delay in cell cycle progression was observed following Drp1 depletion (data not shown) compared to control embryos (Figure 1A)” (Line 291-193). There was a slight discrepancy in the time post-hCG between live imaging and immunofluorescence analysis (Figure 1-figure supplement 1A), which may be due to manipulation of zygotes outside incubator during the microinjection of mRNA.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various mRNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 h of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the Western blot analysis, samples were prepared according to the time of the start of the observation.

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). We have also quantified the mitochondrial aggregation in Drp1<sup>fl/fl</sup> and Drp1<sup>Δ/Δ</sup> parhenotes (Figure 2-figure supplement 1; note that the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). Mitochondria appear to be slightly more aggregated in Drp1<sup>fl/fl</sup> embryos than in control, but no significant differences in cluster size or number were observed (data not shown). On the other hand, mitochondrial clusters in Drp1 Trim-Away embryos were remarkably larger than Drp1<sup>Δ/Δ</sup> parhenotes, Please refer to the response to reviewer 1's comment (3) for discussion of this discrepancy.

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      In the revised version, the band intensities in Western blot analysis were quantified and validated the previous results (Figure 1H for Myo19 depletion, Figure 2B for Drp1 expression during preimplantation development, Figure 2D for Drp1 depletion). The number of embryos analyzed was described in Figure legends (Pooled samples ranging from 20 to 100 were used).

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E).

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      In the revised manuscript, we have discussed this reference (Zhou et al., Nature Communications, PMID: 36513638) (Line 482-483).

      Reviewer #2 (Recommendations For The Authors):

      The authors report that disruption of F-actin organization led to asymmetry in mitochondrial inheritance, however depletion of Myo19 does not impact inheritance. The authors note in the discussion that loss of another mitochondrial motor protein, Miro, has been shown to affect mitochondrial inheritance. They suggest this may be due to reduced levels of Myo19, despite data from the present study suggesting a lack of involvement of Myo19. Given that Miro1 also interacts with microtubules, and crosstalk between actin filaments and microtubules has been reported, have the authors considered whether other motor proteins, such as KIF5, may be involved in mitochondrial movement in the zygote and therefore inheritance? Myo19 also plays a role in mitochondrial architecture. Were any differences noted at the EM level?

      During oocyte meiosis and early embryonic cleavage, kinesin-5 has been reported to be important for the formation of bipolar spindles (Fitzharris, Curr Biol., 2009, PMID: 19465601) and may have some involvement in mitochondrial dynamics. Given that the migration of two pronuclei towards the zygotic centre is dynein-dependent manner (Scheffler Nat Commun. 2021PMID: 33547291), dynein may also be involved in the process of mitochondrial accumulation around the pronuclei. Nevertheless, whether microtubule-dependent mechanisms regulate mitochondrial partitioning remains controversial. Mitochondria basically diverge from microtubules at the onset of mitosis, and indeed Miro1-deleted zygotes did not show the asymmetric mitochondrial partitioning (Lee et al., Front Cell Dev Biol. 2022, PMID: 36325364). More recently, it was reported that overexpression of TRAK2 causes significant mitochondrial aggregation in embryos (Lee et al., Proc Natl Acad Sci U S A. 2024, PMID: 36325364), but since overexpression might disrupt a regulatory balance by other motors/adaptor complexes, further investigation using TRAK2-deficient embryos is expected.

      As noted by the reviewer, myo19 seems to be important for the maintenance of mitochondrial cristae architecture and, consequently, for the regulation of mitochondrial function (Shi et al., Nat Commun. 2022, PMID: 35562374). We have not observed the EM images in myo19-depleted embryos, but we examined their membrane potential and ROS by TMRM and H2DCF staining, respectively, and confirmed that they were comparable to control embryos (data not shown). The loss of myo19 in zygotes/embryos did not cause any functional changes in mitochondria, suggesting that mitochondrial architecture may not be substantially affected either.

      Transcriptomic analysis would be useful to identify alterations in cell cycle checkpoint regulators, as well as immunofluorescence to identify changes in spindle assembly checkpoint protein recruitment.

      The present results showed that the majority of Drp1-depleted embryos arrest at the G2 stage, possibly due to cell cycle checkpoint mechanisms. Transcriptome analysis would certainly be beneficial, but eventually more detailed analysis of proteins and their phosphorylation modifications, etc. is needed for accurate assessment. These studies will be the subject of future work.

      Minor comments:

      There are many instances where the English could be improved, particularly the overuse of the word 'the'.

      We have checked the manuscript again carefully and hopefully it has been improved some.

      Line 144: replace 'took' with 'take'.

      We have corrected this in the revised version (Line 140).

      Line 157: it is unclear what is meant by 'hinders the functional importance of Drp1 in mature oocytes and embryos'.

      This description has been corrected to “complicates the functional analysis of Drp1 in mature oocytes and embryos” (Line 152-153)

      Line 198: replace with 'displayed a mitochondrial distribution pattern closely associated with'

      We have corrected this in the revised version (Line 195-196).

      Line 200: provide a time to clarify when the cytoplasmic meshwork was 'subsequently reorganized'

      In the revised version, “at the metaphase” has been added (Line 198).

      Line 204: replace 'to' with 'for'

      We have corrected this in the revised version (Line 203).

      Lines 285-87: consider rearranging the text to improve the flow.

      To improve the flow of text before and after, the following sentence has been added; We postulated that this asymmetry was due to non-uniformity in the distribution of mitochondria around the spindle (Line 295-297)

      Line 418: replace 'central' with 'centre'

      We have corrected this in the revised version (Line 430).

      Line 427: replace 'pertaining' with 'partitioning'

      We have corrected this in the revised version (Line 438).

      Line 574: clarify to what '1-5% of that of the oocytes' refers

      We have corrected it to “1-5% of the total volume of the zygote.” (Line 587-588).

      Line 619: indicate the dilution used

      We apologize for the previous incorrect description. We used a part of the extract as the template, not a dilution, and have corrected it to be accurate (Line 631-632).

      Line 634: replace 'on' with 'in' and detail in which medium embryos were mounted.

      We have corrected this in the revised version (Line 647).

      Please check all spelling in the figures.

      Figure 1J - inheritance is spelt incorrectly.

      Figure-Suppl 1, D: Interphase (PN) and (2-cell) is spelt incorrectly. G: inheritance is spelt incorrectly.

      Figure 5F - bottom section prior to cytokinesis, spindle is spelt 'spincle'

      Ensure consistency in abbreviation use (e.g. use of NEB and NEBD).

      Thank you for your careful correction of typographical errors. In the revised version, all points raised by the reviewers have been corrected.

      Reviewer #3 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.

      Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.

      In the revised manuscript, we have added the following comment; swollen or partially elongated mitochondria with lamella cristae structures in the inner membrane were observed in Drp1 depleted embryos. In addition, the quantification of aspect ratio (long/short axis) shows that significant mitochondrial elongation was occurred (Figure 2M). These results has been described in the revised manuscript (Line 251-256).

      - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.

      Thank you for your very useful comments. Although it would be interesting to investigate whether alterations in ATP levels occurred in localized areas (e.g., around the spindle), the present study used conventional fluorescence microscope instead of confocal laser microscopy to observe ATeam fluorescence in order to quantify the fluorescence intensity in the whole embryo (or whole blastomere) and thus we currently cannot provide the images that reviewer expected. As shown in Figure-figure supplement 1C, the ATP levels tend to be higher at the cell periphery in control and at the mitochondrial aggregation areas in Drp1-depleted embryos, but it would need high resolution images using confocal microscopy to show it clearly.

      - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?

      Review of multiple videos shows that aggregated mitochondria were localized toward the cell center, but did not exhibit the behavior of preferentially concentrating near the female pronucleus.

      - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca<sup>2+</sup> response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?

      We think that the reviewer's comments are mostly correct. It is clear that there is a bias in Ca<sup>2+</sup> store levels between blastomeres of Drp1 depleted embryos, However, since mitochondria were not stained simultaneously in this experiment, we cannot draw conclusions in detail, such that daughter blastomere that inherit more mitochondria have higher Ca<sup>2+</sup> stores, or that blastomere with more aggregated mitochondria have lower Ca<sup>2+</sup> stores.

      - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?

      The marked centration of mitochondrial clusters in Drp1-depleted embryos appears to be associated with migration of the pronuclei toward the cell center, which is unique to the first embryonic cleavage. Since the assembly of the male and female pronuclei at the cell center is also unique to the first cleavage, binucleation due to mitochondrial misplacement was observed only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments

      - Line 262: "Since mitochondrial dynamics are spatially coordinated at the ER-mitochondria MCSs," adequate ref. would better be added.

      We have added an adequate reference to the revised manuscript (Friedman et al., 2011, PMID: 21885730).

      - Line 333-336: "...as assessed by the presence of the nuclear envelope." Do authors show the data? In Figure 4-figure supplement 1A, the difference of the phosphoH3-ser10 signal between control and Trim-Away group might be weak. For clarity, it would be helpful if authors indicate the different points to note in the figure.

      Although the data is not shown, nuclear staining of arrested 2-cell stage embryos exhibited clear nuclear membranes, similar to the DAPI image in Figure 4-figure supplement 1A. We have indicated that the data is not shown in the revised version (Line 345). Based on a report that phosphorylated histone H3 (Ser10) localizes in pericentromeric heterochromatin that hat can be visualized by DAPI staining in late G2 interphase cell (Hendzel et al., 1997, Chromosoma, PMID: 9362543), this study qualitatively estimated the G2 phase from the phosphorylated histone H3 signal and the DAPI counterstained images. We have noted this point in the revised figure legend (Line 1012-1014).

      Typos or points for reword/rephrase

      - Line 149: "molecular identification" may better be " molecular characteristics".

      We have corrected this in the revised version (Line 145).

      - Line 157: "hinders the functional importance" would be "implies the functional importance" or "complicates the functional analysis".

      We have corrected this in the revised version (Line 152-153).

      - Line 208: "Since the role of F-actin in many cellular events, such as cytokinesis, preclude them as targets for experimentally manipulating mitochondrial distribution, " may better be "Given many cellular roles, disruption of F-actin per se was unsuitable as a strategy for manipulating mitochondrial distribution", for example.

      We have corrected this in the revised version (Line 207-208).

      - Line 260: "with MCSs with the plasma.." may better be "with MCSs such as with the plasma..".

      We have corrected this in the revised version (Line 267-268).

      - Line 312: "distribution and segregation" may better be "distribution and the resulting segregation of the inter-organelle contacts".

      We have corrected this in the revised version (Line 324-325).

      - Line 427: "pertaining" might be "partitioning".

      We have corrected this in the revised version (Line 438).

      Line 463: "loss of Drp1 induced mitochondrial aggregation disturbs" may better be "mitochondrial aggregation induced by the loss of Drp1 disturbs".

      We have corrected this in the revised version (Line 478-479).

      - Line 752: "endoplasmic reticulum (pink) " would be " endoplasmic reticulum (aqua) ".

      We have corrected this in the revised version (Line 780).

      - Figure 5E: "(Noma 2-cell embryos)" would be "(Nomal 2-cell embryos)".

      - Figure 5F: "Mitochondrial centration prevents dual spincle assembly" would be "Mitochondrial centration prevents dual spindle assembly".

      Thank you for your careful correction of typographical errors. We have corrected all the words/expressions the reviewer pointed out in the revised version.

    1. eLife Assessment

      This valuable study provides a novel framework for leveraging longitudinal field observations to examine the effects of aging on stone tool use behaviour in wild chimpanzees. The methods and results are robust providing solid evidence of the effects of old age on nut cracking behaviour at this field site. Despite the low sample size of five individuals, this study is of broad interest to ethologists, primatologists, archaeologists, and psychologists.

    2. Reviewer #1 (Public review):

      Summary:

      Howard-Spink et al. investigated how older chimpanzees changed their behavior regarding stone tool use for nutcracking over a period of 17 years, from late adulthood to old age. This behavior is cognitively demanding, and it is a good target for understanding aging in wild primates. They used several factors to follow the aging process of five individuals, from attendance at the nut-cracking outdoor laboratory site to time to select tools and efficiency in nut-cracking to check if older chimpanzee changed their behavior.

      Indeed, older chimpanzees reduced their visits to the outdoor lab, which was not observed in the younger adults. The authors discuss several reasons for that; the main ones being physiological changes, cognitive and physical constraints, and changes in social associations. Much of the discussion is hypothetical, but a good starting point, as there is not much information about senescence in wild chimpanzees.

      The efficiency for nut-cracking was variable, with some individuals taking a long time to crack nuts while others showed little variance. As this is not compared with the younger individuals and the sample is small (only five individuals), it is difficult to be sure if this is also partly a normal variance caused by other factors (ecology) or is only related to senescence.

      Strengths:

      (1) 17 years of longitudinal data in the same setting, following the same individuals.

      (2) Using stone tool use, a cognitively demanding behavior, to understand the aging process.

      Weaknesses:

      A lack of comparison of the stone tool use behavior with younger individuals in the same period, to check if the changes observed are only related to age or if it is an overall variance. The comparison with younger chimpanzees was only done for one of the variables (attendance).

      Comments on Revised Version (from BRE):

      The authors have now added to the manuscript that they did not have sufficient data to compare additional variables to younger chimpanzees, and therefore compared intra-individual variation across field seasons. They have also explained that nut hardness, although not measured, was largely controlled for due to the experimental nature of the 'outdoor laboratory' whereby only nuts of a suitable maturity (and hardness) are provided to the chimpanzees. The discussion now also includes mention of other ecological variables and their potential influence on the results.

    3. Reviewer #2 (Public review):

      Summary:

      Primates are a particularly important and oft-applied model for understanding the evolution of, e.g., life history and senescence in humans. Although there is a growing body of work on aging in primates, there are three components of primate senescence research that have been underutilized or understudied: (1) longitudinal datasets, (2) wild populations, and (3) (stone) tool-use behaviors. Therefore, the goal of this study was to (1) use a 17-year longitudinal dataset (2) of wild chimpanzees in the Bossou forest, (3) visiting a site for field experiments on nut-cracking. They sampled and analyzed data from five field seasons for five chimpanzees of old age. From this sample, Howard-Spink and colleagues noted a decline in tool-use and tool-use efficiency in some individuals, but not in others. The authors then conclude that there is a measurable effect of senescence on chimpanzee behavior, but that it varies individually. The study has major intellectual value as a building block for future research, but there are several major caveats.

      Strengths:

      With this study, Howard-Spink and colleagues make a foray into a neglected topic of research: the impact of the physiological and cognitive changes due to senescence on stone tool use in chimpanzees. Based on novelty alone, this is a valuable study. The authors cleverly make use of a longitudinal record covering 17 years of field data, which provides a window into long-term changes in the behavior of wild chimpanzees, which I agree cannot be understood through cross-sectional comparisons.

      The metrics of 'efficiency' (see caveats below) are suitable for measuring changes in technological behavior over time, as specifically tailored to the nut-cracking (e.g., time, number of actions, number of strikes, tool changes). The ethogram and the coding protocol are also suitable for studying the target questions and objectives. I would recommend, however, the inclusion of further variables that will assist in improving the amount of valid data that can be extrapolated (see also below).

      With this pilot, Howard-Spink and colleagues have established a foundation upon which future research can be designed, including further investigation with the Bossou dataset and other existing video archives, but especially future targeted data collection, which can be designed to overcome some of the limits and confounds that can be identified in the current study.

      Weaknesses:

      Although I agree with the reasoning behind conducting this research and understand that, as the authors state, there are logistical considerations that have to be made when planning and executing such a study, there are a number of methodological and theoretical shortcomings that either need to be more explicitly stated by the authors or would require additional data collection and analysis.

      One of the main limitations of this study is the small sample size. There are only 5 of the old-aged individuals, which is not enough to draw any inferences about aging for chimpanzees more generally. Howard-Spink and colleagues also study data from only five of the 17 years of recorded data at Bossou. The selection of this subset of data requires clarification: why were these intervals chosen, why this number of data points, and how do we know that it provides a representative picture of the age-related changes of the full 17 years?

      With measuring and interpreting the 'efficiency' of behaviors, there are in-built assumptions about the goals of the agents and how we can define efficiency. First, it may be that efficiency is not an intentional goal for nut-cracking at all, but rather, e.g., productivity as far as the number of uncrushed kernels (cf. Putt 2015). Second, what is 'efficient' for the human observer might not be efficient for the chimpanzee who is performing the behavior. More instances of tool-switching may be considered inefficient, but it might also be a valid strategy for extracting more from the nuts, etc. Understanding the goals of chimpanzees may be a difficult proposition, but these are uncertainties that must be kept in mind when interpreting and discussing 'decline' or any change in technological behaviors over time.

      For the study of the physiological impact of senescence of tool use (i.e., on strength and coordination), the study would benefit from the inclusion of variables like grip type and (approximate) stone size (Neufuss et al., 2016). The size and shape of stones for nut-cracking have been shown to influence the efficacy and 'efficiency' of tool use (i.e., the same metrics of 'efficiency' implemented by Howard-Spink et al. in the current study), meaning raw material properties are a potential confound that the authors have not evaluated.

      Similarly, inter- and intraspecific variation in the properties of nuts being processed is another confound (Falótico et al., 2022; Proffitt et al., 2022). If oil palm nuts were varying year-to-year, for example, this would theoretically have an effect on the behavioral forms and strategies employed by the chimpanzees, and thus, any metric of efficiency being collected and analyzed. Further, it is perplexing that the authors analyze only one year where the coula nuts were provided at the test site, but these were provided during multiple field seasons. It would be more useful to compare data from a similar number of field seasons with both species if we are to study age-related changes in nut processing over time (one season of coula nut-cracking certainly does not achieve this).

      Both individual personality (especially neophilia versus neophobia; e.g., Forss & Willems, 2022) and motivation factors (Tennie & Call, 2023) are further confounds that can contribute to a more valid interpretation of the patterns found. To draw any conclusions about age-related changes in diet and food preferences, we would need to have data on the overall food intake/preferences of the individuals and the food availability in the home range. The authors refer briefly to this limitation, but the implications for the interpretation of the data are not sufficiently underlined (e.g., for the relevance of age-related decline in stone tool-use ability for individual survival).

      Generally speaking, there is a lack of consideration for temporal variation in ecological factors. As a control for these, Howard-Spink and colleagues have examined behavioral data for younger individuals from Bossou in the same years, to ostensibly show that patterns in older adults are different from patterns in younger adults, which is fair given the available data. Nonetheless, they seem to focus mostly on the start and end points and not patterns that occur in between. For example, there is a curious drop in attendance rate for all individuals in the 2008 season, the implications of which are not discussed by the authors.

      As far as attendance, Howard-Spink and colleagues also discuss how this might be explained by changes in social standing in later life (i.e., chimpanzees move to the fringes of the social network and become less likely to visit gathering sites). This is not senescence in the sense of physiological and cognitive decline with older age. Instead, the reduced attendance due to changes in social standing seems rather to exacerbate signs of aging rather than be an indicator of it itself. The authors also mention a flu-like epidemic that caused the death of 5 individuals; the subsequent population decline and related changes in demography also warrant more discussion and characterization in the manuscript.

      Understandably, some of these issues cannot be evaluated or corrected with the presented dataset. Nonetheless, these undermine how certain and/or deterministic their conclusions can really be considered. Howard-Spink et al. have not strongly 'demonstrated' the validity of relationships between the variables of the study. If anything, their cursory observations provide us with methods to apply and hypotheses to test in future studies. It is likely that with higher-resolution datasets, the individual variability in age-related decline in tool-use abilities will be replicated. For now, this can be considered a starting point, which will hopefully inspire future attempts to research these questions.

      Falótico, T., Valença, T., Verderane, M. & Fogaça, M. D. Stone tools differences across three capuchin monkey populations: food's physical properties, ecology, and culture. Sci. Rep. 12, 14365 (2022).<br /> Forss, S. & Willems, E. The curious case of great ape curiosity and how it is shaped by sociality. Ethology 128, 552-563 (2022).<br /> Neufuss, J., Humle, T., Cremaschi, A. & Kivell, T. L. Nut-cracking behaviour in wild-born, rehabilitated bonobos (Pan paniscus): a comprehensive study of hand-preference, hand grips and efficiency. Am. J. Primatol. 79, e22589 (2016).<br /> Proffitt, T., Reeves, J. S., Pacome, S. S. & Luncz, L. V. Identifying functional and regional differences in chimpanzee stone tool technology. R. Soc. Open Sci. 9, 220826 (2022).<br /> Putt, S. S. The origins of stone tool reduction and the transition to knapping: An experimental approach. J. Archaeol. Sci.: Rep. 2, 51-60 (2015).<br /> Tennie, C. & Call, J. Unmotivated subjects cannot provide interpretable data and tasks with sensitive learning periods require appropriately aged subjects: A Commentary on Koops et al. (2022) "Field experiments find no evidence that chimpanzee nut cracking can be independently innovated". ABC 10, 89-94 (2023).

      Comments on Revised Version (from BRE):

      The authors have revised their methods to clarify why certain field seasons were chosen and have clarified aspects of their analysis relevant to this reviewer's concerns. The coula nut cracking data and results which were of a single season have now been restricted to the Supplementary. The revised discussion now includes a much more detailed limitations section including both ecological factors but also the effects of social aging. Stone tool size, grip and other factors are also acknowledged as being potentially important for measuring efficiency but the authors were unable to include in this study due to the nature of the dataset.

    4. Author response:

      The following is the authors’ response to the original reviews

      The main criticisms levied by both reviewers can be traced down to our use of a long-term video archive to assess for the effects of aging on individual chimpanzees over extended time periods. Specifically, the reviewers raised several points surrounding whether we could exclude ecological variation over years as the explanation of changes with aging, rather than aging itself. Whilst we acknowledge there are limitations to our approach, we provide a comprehensive response to these points highlighting:

      (1) Where ecological variables have been accounted for using controls (including the behaviors of other individuals, or an aging individuals’ behavior at younger ages).

      (2) Where ecological data may be missing, thus a potential limitation to our study, and further data would be beneficial.

      (3) Whether, in light of these limitations, interannual ecological variation offers a likely explanation for the behavioral changes we have identified. We provide an argument that whilst ecological data would be desirable for our study, interannual changes in ecology are unlikely to explain the trends in our data. Additionally, we explain why age-related changes, such as senescence, are more likely to underpin the patterns described in our manuscript.

      Across 1-3, we have made substantial changes to the reporting of our manuscript to ensure that our results are communicated transparently, and conclusions are made with appropriate care. We have also moved all discussion of coula-nut cracking to the supplementary materials, given the points raised by reviewers about the lack of data describing coula-nut cracking in earlier field seasons.

      We hope that these modifications will enhance both the editors’ and reviewers’ assessment of our manuscript, where we have aimed to make careful conclusions that are supported by our available data. Similarly, we have aimed to communicate the importance of our results across fields of research including primatology, evolutionary anthropology, and comparative gerontology, and hope that our research will be of use to further studies within these subfields.

      Reviewer 1 (Recommendations for the authors):

      (1) If possible, include results or a summary of the behaviour of younger adults using stone tools during the same period. It would be helpful to know if they had the same or different pattern to exclude other factors that may influence the tool use (harder nuts in a particular season, diseases, motivation for other foods, etc). 

      We include data for other individuals when analyzing attendance. However, we did not collect comparable long-term efficiency data on younger adult individuals for this study. This is, in part, due to the time constraints imposed by long-term behavior coding. Additionally, only one adult was both present at Bossou throughout the 1999-2016 period, and younger than the threshold for our old-age category across these years (thus, the baseline used to compare with older adults would be just one younger adult, thus would not have been useful for characterizing normal variation of many younger adults over time). However, given the longitudinal data we present, we can use data from the earlier field seasons for each elderly focal individual as a personalized baseline control. Previous studies at Bossou find that across the majority of adulthood, efficiency varies between individuals, but is stable within individuals over time (e.g., Berdugo et al. 2024, cited). We detected similar stability in individuals’ efficiency over the first three field seasons sampled in our analysis, where there was very little intra-individual variation in tool-using efficiency. However, in later years, two individuals (Velu & Yo) began to exhibit relatively large reductions in efficiency.

      These results are unlikely to be explained by ecological variation. If there was a change in ecology underpinning our results, we would expect: [1] changes in ecology to also introduce variation in earlier field seasons, and [2] to influence all individuals in our study similarly. As such, if the changes observed in later field seasons were due to ecological changes, they should have caused a reduced efficiency across individuals, and to a similar degree – we did not observe this result, with large reductions in efficiency were confined to two individuals.   Moreover, for Yo (the individual who exhibited the largest reduction in efficiency) we found some additional evidence that changes in oil-palm-nut cracking efficiency extended beyond the period we sampled, i.e. they were evident even in 2018, reflecting a long-term, directional reduction in efficiency as compared to earlier years of her life. This consistent reduction in tool-using efficiency over multiple years adds further weight to the hypothesis that changes at the level of the individual were causing reduced tool-using efficiency, rather than our results being underpinned by interseasonal variation in ecology.

      Whilst we agree that our study is limited in the extent to which we can analytically assess ecological explanations for changes in nut-cracking efficiency, we believe that hypothetical ecological changes across field seasons do not predict our results. We now raise both sides of this debate in our discussion, where we outline our limitations (see lines 535-593).

      (2) The data from 2011 was scarce, with only one individual having 10 encounters. It would be better to be cautious with this season's results. 

      We appreciate this limitation raised by the reviewer. Velu and Yo were only encountered a few times in 2011; however, both were encountered more frequently in 2016. For 2011, we did not collect oil-palm nut cracking data for either Yo or Velu. Thus, their change in efficiency was detected by models using data from all other years, regardless of the few encounters in 2011. This sparsity of data may still have influenced our metrics for the proportion of time chimpanzees spent engaging in different behaviors when present at the outdoor laboratory in 2011, particularly for Velu, who was one of the two individuals who exhibited a change in behavior in this year (along with Fana, N = 10 for 2011). We have therefore added a line in our results and discussion highlighting the sparsity of data for Velu when estimating these proportions for 2011 (see lines 255-256 & 410).

      Minor corrections 

      (1) The last paragraph of the introduction presents many results, which should be in the results section. 

      We would like to keep this section of the introduction. Our paper investigates the effect of aging on many different aspects of nut cracking, which could become confusing for readers unless laid out clearly. We believe that having a short summary early on in the paper assists readers with following the methods and arguments presented within our paper.

      (2) The first section (Sampled data) of the results contains much information that belongs in the methods section. 

      We appreciate that there is some overlap between our methods and results section. However as the results section comes before the methods in our manuscript, we wanted to ensure that there is suitable information in our results that allow our results to be interpreted clearly by readers, and that the methods used to generate these results are transparently communicated. For these reasons, we will leave this information in the results, as we believe it increases our paper’s readability. 

      Reviewer 2 (Public review):

      One of the main limitations of this study is the small sample size. There are only 5 of the old-aged individuals, which is not enough to draw any inferences about aging for chimpanzees more generally. Howard-Spink and colleagues also study data from only five of the 17 years of recorded data at Bossou. The selection of this subset of data requires clarification: why were these intervals chosen, why this number of data points, and how do we know that it provides a representative picture of the age-related changes of the full 17 years? 

      We note that our sample size is limited to 5 individuals. This is an inevitable constraint of analyzing aging longitudinally in long-lived species, as only few individuals will live to old age. We argue that 17 years is a long enough period of study, as in the initially sampled field season (1999) focal individuals are reaching a mature age of adulthood (39-44 years) and begin to age progressively up to ages that are typically considered to be on the extreme side for chimpanzees’ lifespans in the wild (56-61 years). We raise in our methods that whilst it is difficult to determine precisely when chimpanzees become ‘old aged’, previous studies use the age of around 40 years, as from this age survivorship begins to decrease more rapidly (see Wood et al., Science 2023). Indeed, one focal individual (Tua) disappeared during the period of our study (presumed dead), and one other individual died in 2017 (Velu), the year after our final sampled field season. As of 2025, two other focal females have since died, and only one focal individual was still alive at Bossou (Jire, the individual exhibiting the least evidence for senescence over our study period). These observations suggest that we successfully captured data from chimpanzees during the oldest ages of their lives for most individuals in the community. Moreover, the period of 1999-2016 contains the majority of data available within the Bossou Archive, with years before and after this window containing comparably less data. This information is included within our results and methods (see sections 2.1 and 4.1).

      For our earliest field season (1999), it is unlikely that senescence had already had an effect on stone-tool use, as we measured efficiency to be high across all efficiency metrics for all individuals. For example, in 1999, the median number of hammer strikes performed by focal chimpanzees ranged from 2-4 strikes, and this was comparable to the efficiency reported across all adults observed in previous studies at Bossou (Biro et al. 2003, Anim. Cog.). This finding suggests that senescence effects had not yet taken place, allowing us to evaluate whether aging affects efficiency over subsequent field seasons. This point is now included in the manuscript on lines 449-452. 

      We sampled at 4-to-5-year intervals to balance the time-intensive nature of fine-scale behavior coding against the need to sample data across the extended 17-year time window available in our study. We limited the final year to 2016 as, in following years, data were collected using different sampling protocols (though, see limited data from 2018 in the supplementary materials). We aimed to keep the intervals between years as consistent as possible (approx. 4 years); however, for some years data were not collected at Bossou, due to disease outbreaks in the region. In these instances, we selected the closest field season where suitable data were available for study (always +/- 1 year). We have provided further clarification surrounding our sampling regime in the methods (see amendments in section 4.1)

      With measuring and interpreting the 'efficiency' of behaviors, there are in-built assumptions about the goals of the agents and how we can define efficiency. First, it may be that efficiency is not an intentional goal for nut-cracking at all, but rather, e.g., productivity as far as the number of uncrushed kernels (cf. Putt 2015). Second, what is 'efficient' for the human observer might not be efficient for the chimpanzee who is performing the behavior. More instances of tool-switching may be considered inefficient, but it might also be a valid strategy for extracting more from the nuts, etc. Understanding the goals of chimpanzees may be a difficult proposition, but these are uncertainties that must be kept in mind when interpreting and discussing 'decline' or any change in technological behaviors over time.

      We agree that knowing precisely how chimpanzees perceive their own efficiency during tool use is unlikely to be available through observation alone. However, under optimal foraging theory, it is reasonable to assume that animals aim to economize foraging behaviors such that they maximize their rate of energy intake. Moreover, a wealth of studies demonstrate that adult chimpanzees acquire and refine tool-using skill efficiency throughout their lives. For example, during nut cracking, adults often select tools with specific properties that aid efficient nut cracking (Braun et al. 2025, J. Hum. Evol.; Carvalho et al. 2008, J. Hum. Evol.; Sirianni et al. 2015, Anim. Behav.); perform nut cracking using more streamlined combinations of actions than less experienced individuals (Howard-Spink et al. 2024, Peer J; Inoue-Nakamura & Matsuzawa 1997, J. Comp. Psychol.), and as a result end up cracking nuts using fewer hammer strikes, indicating a higher level of skill (Biro et al. 2003, Anim. Cogn.; Boesch et al. 2019, Sci. Rep.). Ultimately, these factors suggest that across adulthood, experienced chimpanzees perform nut cracking with a level of efficiency which exceeds novice individuals, including across the whole behavioral sequence for tool use, even if they are not aware or intending to do so. Previous studies at Bossou have also highlighted that there are stable inter-individual differences in efficiency of individuals over time (Berdugo et al. 2024, Nat. Hum. Behav.). This pattern of findings allows us to ask whether this acquired level of skill is stable across the oldest years of an individual’s life, or whether some individuals experience decreased efficiency with age. In addition, our selection of efficiency metrics is in keeping with a wealth of studies which examine the efficiency of stone-tool using in apes, thus, we argue that this is not problematic for our study.

      As we stated in our initial responses to reviewers, it is unlikely that tool switching is a valid strategy for tool use, as it is so rarely performed by proficient adult nut crackers (including earlier in life for our focal individuals). Nevertheless, we did not find a significant change in tool switching for oil-palm nut cracking, and this behavioral change was only observed when Yo was cracking coula nuts. As we have now moved discussion of coula nut cracking to the supplementary materials (and tempered discussion of coula nut cracking to emphasize the need for more data) this behavioral variable does not influence our reported results. 

      In our discussion, we also highlight how seemingly less efficient actions may reflect a valid strategy for nut cracking. E.g. a greater number of tool strikes may reflect a strategy of compensation for progressive tool wear. This would still reflect a reduced efficiency (e.g. in terms of the rate at which kernels can be consumed), but may perhaps borne for the necessity to accommodate for changes in an individuals’ physical affordances with aging. Thus, we do take the Reviewer’s point into account, but by using an alternative, more likely, example given the available data. We have now emphasized this point in lines 521-527.

      We have also clarified these matters by adding more information into our methods (see lines 798-802 and 828-829), highlighting that we take a perspective on efficiency that reflects the speed of nut processing and kernel consumption, and the number of different behavioral elements required to do so. Our phrasing now explicitly avoids using language that assumes that individuals’ have some perception of their own efficiency during tool use.

      For the study of the physiological impact of senescence of tool use (i.e., on strength and coordination), the study would benefit from the inclusion of variables like grip type and (approximate) stone size (Neufuss et al., 2016). The size and shape of stones for nut-cracking have been shown to influence the efficacy and 'efficiency' of tool use (i.e., the same metrics of 'efficiency' implemented by Howard-Spink et al. in the current study), meaning raw material properties are a potential confound that the authors have not evaluated. 

      We did not collect this data as part of our study. Whilst grip type could be a useful variable to measure for future studies, it is not necessary to demonstrate senescence per se. However, we agree that this could be a fruitful avenue to understand changes in behavior at greater granularity, and have added this as a recommendation for further study. We also now provide a discussion on stone dimensions and materials as part of our limitations (see lines 581-589 for both points).

      Similarly, inter- and intraspecific variation in the properties of nuts being processed is another confound (Falótico et al., 2022; Proffitt et al., 2022;). If oil palm nuts were varying year-to-year, for example, this would theoretically have an effect on the behavioral forms and strategies employed by the chimpanzees, and thus, any metric of efficiency being collected and analyzed. Further, it is perplexing that the authors analyze only one year where the coula nuts were provided at the test site, but these were provided during multiple field seasons. It would be more useful to compare data from a similar number of field seasons with both species if we are to study age-related changes in nut processing over time (one season of coula nut-cracking certainly does not achieve this). 

      We have moved all discussion of coula nuts to the supplementary materials so as to avoid any confusion with oil-palm nuts (see comments from Reviewer 2, and our response). Nut hardness may influence the difficulty with which nuts are cracked, with one of the most likely factors influencing nut hardness being its age: young nuts are relatively harder to crack, whereas older nuts, which are often worm-eaten or can be empty, crack more easily, yet are not worth cracking (Sakura & Matsuzawa, 1991; Ethology). We largely controlled for this in our study, as the nuts provided at outdoor laboratories were inspected to ensure that the majority of them were of suitable maturity for cracking, and we now clarify this control in our methods (see lines 678-680) and when discussing our study limitations (see lines 551-558). In these sections, we also highlight a previous study at Bossou that shows chimpanzees select nuts which can be readily cracked, based on their age (Sakura & Matsuzawa, 1991; Ethology).

      We acknowledge that we are limited in the extent to which we can control for interannual variation in ecology with our available data. However, we highlight why interannual variability is unlikely to fully explain our results (see lines 551-580 and response to comments from Reviewer 1). We also highlight in our limitations section that future studies should (where possible) aim to collect more ecological data to account for possible confounds more rigorously.

      Both individual personality (especially neophilia versus neophobia; e.g., Forss & Willems, 2022) and motivation factors (Tennie & Call, 2023) are further confounds that can contribute to a more valid interpretation of the patterns found. To draw any conclusions about age-related changes in diet and food preferences, we would need to have data on the overall food intake/preferences of the individuals and the food availability in the home range. The authors refer briefly to this limitation, but the implications for the interpretation of the data are not sufficiently underlined (e.g., for the relevance of age-related decline in stone tool-use ability for individual survival). 

      In our discussion, we highlight that multiple aging factors may influence apes’  dietary preferences and motivations to attend experimental (and perhaps also naturally-occurring) nut cracking sites (see lines 397-443 and 542-550). We do not believe that neophobia is a likely driver underlying our results, given that the outdoor laboratory has been used to collect data for many decades, including over a decade prior to the first field season in which data were sampled for our study (now highlighted in lines 692-694). In addition, previous studies at Bossou have determined that the outdoor laboratory is visited with comparable frequency to naturallyoccurring nut cracking sites, which makes any form of novelty bias unlikely (this information is now included in our methods, see lines 397-400, and also 687-689). 

      We agree that further information is required about foraging behaviours across the home range to understand changes in attendance at the outdoor laboratory, and have now provided more clarity on this within the limitations section of our discussion 542-550. In our discussion of individual survivability, we state clearly that we cannot make a conclusion about how changes in tool use influence survival with the available data, and assert that this would require data across the home range (see lines 627-638). We agree that future research is needed to assess whether changes in tool use would influence survivability, and also suggest that it may not be survival-relevant; instead changes in tool use with aging may simply be a litmus test for detecting more generalized senescence.

      Generally speaking, there is a lack of consideration for temporal variation in ecological factors. As a control for these, Howard-Spink and colleagues have examined behavioral data for younger individuals from Bossou in the same years, to ostensibly show that patterns in older adults are different from patterns in younger adults, which is fair given the available data. Nonetheless, they seem to focus mostly on the start and end points and not patterns that occur in between. For example, there is a curious drop in attendance rate for all individuals in the 2008 season, the implications of which are not discussed by the authors. 

      As the reviewer points out, when examining the attendance rates of older individuals over sampled field seasons, we used the attendance rates of younger individuals as a control. However, we do not run this analysis using start and end points only. Attendance rates were included in our model across the full range of sample field seasons. However, as the key result here is an interaction term between age cohort (old) and the field season (scaled about the mean), we supplement this significant statistical result with a digestible comparison of attendance rates between the first and last field season, to give a general sense of effect size. We have clarified that all data were used in our model (see line 229, and also the legend for Table 2), and in this section we also provide all key model outputs and signpost where the full model output can be found in the supplementary materials.

      As far as attendance, Howard-Spink and colleagues also discuss how this might be explained by changes in social standing in later life (i.e., chimpanzees move to the fringes of the social network and become less likely to visit gathering sites). This is not senescence in the sense of physiological and cognitive decline with older age. Instead, the reduced attendance due to changes in social standing seems rather to exacerbate signs of aging rather than be an indicator of it itself. The authors also mention a flu-like epidemic that caused the death of 5 individuals; the subsequent population decline and related changes in demography also warrant more discussion and characterization in the manuscript. 

      We have adapted this part of the discussion to make it clear that social aging is not necessarily equivalent to physiological and cognitive aging. We have also clarified in this section the changes in demography at Bossou during our study, which may have further impacted social behaviors (see lines 423-443). 

      Understandably, some of these issues cannot be evaluated or corrected with the presented dataset. Nonetheless, these undermine how certain and/or deterministic their conclusions can really be considered. Howard-Spink et al. have not strongly 'demonstrated' the validity of relationships between the variables of the study. If anything, their cursory observations provide us with methods to apply and hypotheses to test in future studies. It is likely that with higher-resolution datasets, the individual variability in age-related decline in tool-use abilities will be replicated. For now, this can be considered a starting point, which will hopefully inspire future attempts to research these questions. 

      We thank the reviewer for their comments. We have adapted our manuscript to highlight that we agree that it serves a starting point for answering these valuable questions; however, we do feel that we can contribute meaningful evidence that it is likely aging effects underlying the findings in our data (see responses above). We agree with the reviewer that further study is needed to understand these questions in more detail, and have tried to ensure that our conclusions are suitably tempered, and the recommendations for research are heavily encouraged to build on our findings.  

      Falótico, T., Valença, T., Verderane, M. & Fogaça, M. D. Stone tools differences across three capuchin monkey populations: food's physical properties, ecology, and culture. Sci. Rep. 12, 14365 (2022). 

      This has now been cited.

      Forss, S. & Willems, E. The curious case of great ape curiosity and how it is shaped by sociality. Ethology 128, 552-563 (2022). 

      We do not cite this – see above.

      Neufuss, J., Humle, T., Cremaschi, A. & Kivell, T. L. Nut-cracking behaviour in wild-born, rehabilitated bonobos (Pan paniscus): a comprehensive study of hand-preference, hand grips and efficiency. Am. J. Primatol. 79, e22589 (2016). 

      This has now been cited.

      Proffitt, T., Reeves, J. S., Pacome, S. S. & Luncz, L. V. Identifying functional and regional differences in chimpanzee stone tool technology. R. Soc. Open Sci. 9, 220826 (2022). 

      This has now been cited.

      Putt, S. S. The origins of stone tool reduction and the transition to knapping: An experimental approach. J. Archaeol. Sci.: Rep. 2, 51-60 (2015). 

      We do not cite this, as we instead cite studies which highlight chimpanzees’ ability to become more efficient in tool use with repeated practice (see above). 

      Tennie, C. & Call, J. Unmotivated subjects cannot provide interpretable data and tasks with sensitive learning periods require appropriately aged subjects: A Commentary on Koops et al. (2022) "Field experiments find no evidence that chimpanzee nut cracking can be independently innovated". ABC 10, 89-94 (2023). 

      We do not cite this – see above

      Reviewer #2 (Recommendations for the authors):

      Minor Comments: 

      (1) Line 494: Citation #53 is listed twice. 

      This has been amended.

      (2) Line 501: The term 'culturally-dependent' as used here is, at best, controversial, and at worst, misapplied. I would recommend replacing it with simply the term 'cultural'. 

      This has been changed to ‘cultural’.

      Major Comments: 

      For the Introduction, in the paragraph starting on Line 91, and the Discussion, starting on Line 369, I would recommend some simple re-structuring of the argumentation. As many in the Public Review, the changes in social standing according to age are not necessarily a case of senescence in the very sense of physiological or cognitive changes of the individual. This seems to have had an effect on attendance rates, which then could have been a driver of behavioral changes and even cognitive decline as ostensibly measured by the other variables. The social impact of aging should be mentioned in the Introduction (it is not currently) and the social and physiological/cognitive effects of aging should be separated in the Discussion. You can then discuss more clearly how the former via other behavioral changes can accelerate the latter (or not). 

      We take the point raised about social aging. Integrating information about social aging into the introduction was challenging without disrupting the flow of the paper; however, we have included these valuable points in the discussion (see lines 423-443). We now structure this section to clearly distinguish social aging, and discuss how, in tandem with changes in demography at Bossou, it may have influenced rates of attendance to the outdoor laboratory over the years. We do not go into detail about how social aging may interact with physiological or cognitive effects of aging, as we cannot support this with the available data, however we highlight at the end of this paragraph how all of these possible factors require further investigation.

      For the present study, it will either be impossible or impractical to gather data on the yearly ecological conditions, contextualized dietary preferences, individual personalities, etc., so I would not ask that you do so. It is important, however, to temper some of the claims being made in the manuscript about what you have 'determined' about the nature of senescence in chimpanzees and to be more transparent about the limitations and potential confounds when interpreting the data. To avoid repetition, the key points can be found in the Public Review under 'Weaknesses'. 

      We appreciate the reviewer’s understanding of the limitations of our study. Some of these factors – such as individual personalities and dietary preferences – are addressed somewhat by our use of long-term data at the level of the individual, particularly in the analyses of efficiency, where we model individuals’ behaviors compared to those in earlier years offers an individuallybespoke control. However, there are other ecological variables of possible importance that we cannot evaluate. We now address several of these points raised by reviewers in the discussion, to ensure transparency of reporting (see limitations section of our discussion, and results to the comments provided by Reviewer 1, and our responses to points raised in the Public Review). We have also tempered some of the phrasing surrounding our conclusions, where we say that this is the first evidence that aging can impact chimpanzee tool use, we also highlight the need for an assortment of further studies. 

      Finally, the integration of the coula nut-cracking data is not well-executed as it stands. I would recommend that they collect and analyze equivalent behavioral data from the other years where coula nuts were provided. By examining only one season of coula nut-cracking, we cannot contextualize the data to past seasons; there is no sense in comparing one season of coula nut-cracking (i.e., in a sense of efficiency) to roughly contemporary seasons of palm-nut cracking due to, as you describe, differences in physical properties of the nuts. If you are not able to collect the additional data and carry out the requisite analysis, then I would recommend that the coula nut-related sections be removed from the manuscript, so that it does not detract from the logical flow of arguments and distract from the other data, which is more logically-attuned to your research questions. 

      We have removed this from the main manuscript. We have decided to include the information surrounding coula nut cracking in the supplementary materials, as this information is still relevant to the findings of our study, and may interest some readers. However, we have phrased this information to make it clear that further data is needed to compare coula nut cracking across years.

      These criticisms do not subtract from the (potential) value or importance of the work for the field. This is, of course, an important contribution to an understudied topic. As such, I would gladly advocate for the manuscript, assuming the authors would reflect on the listed caveats and make changes in response to the 'Major Comments'. 

      We thank the reviewer for their comments.

    1. eLife Assessment

      This important study explores the mechanisms underlying the maintenance of cell surface protein levels. The authors present solid evidence to support their claims, though the addition of certain validation experiments could have further strengthened the conclusions. This work will be of particular interest to cell biologists focused on membrane trafficking.

    2. Reviewer #1 (Public review):

      G. Squiers et al. analyzed a previously reported CRISPR genetic screening dataset of engineered GLUT4 cell-surface presentation and identified the Commander complex subunit COMMD3 as being required for endosomal recycling of specific cargo protein, transferrin receptor (TfR), to the cell surface. Through comparison of COMMD3-KO and other Commander subunit-KO cells, they demonstrated that the role of COMMD3 in mediating TfR recycling is independent of the Commander complex. Structural analysis and co-immunoprecipitation followed by mass spectrometry revealed that TfR recycling by COMMD3 relies on ARF1. COMMD3 interacts with ARF1 through its N-terminal domain (NTD) to stabilize ARF1. A mutation in the NTD of COMMD3 failed to rescue cell surface TfR in COMMD3-KO cells. In conclusion, the authors assert that COMMD3 stabilizes ARF1 in a Commander complex-independent manner, which is essential for recycling specific cargo proteins from endosomes to the plasma membrane.

      The conclusions of this paper are generally supported by data, but some validation experiments should be included to strengthen the study.

      (1) Specific role of ARF1 to COMMD3:<br /> The authors don't think KO/KD of ARF1 is appropriate to address its specificity to COMMD3 cargo selection, so they focused on the COMMD3 NTD mutant. Though the mutant failed to rescue COMMD3 cargo TfR recycling, they did not examine the Commander cargo ITGA6. In addition, they cannot validate that the mutant interrupts the interaction between NTD and ARF1. These missing results and validation make their claim that ARF1 is specific to the COMMD3's Commander-independent function less convincing.

    3. Reviewer #2 (Public review):

      Summary:

      The Commander complex is a key player in endosomal recycling which recruits cargo proteins and facilitates the formation of tubulo-vesicular carriers. Squiers et al found COMMD3, a subunit of the Commander complex, could interact directly with ARF1 and regulate endosomal recycling.

      Strengths:

      Overall, this is a nice study that provides some interesting knowledge on the function of the Commander complex.

      Comments on revisions:

      The authors have addressed all my previous concerns

    4. Reviewer #3 (Public review):

      Summary:

      The study by Squiers and colleagues reveals a novel, Commander-independent role for COMMD3 in endosomal recycling. Through unbiased genetic screens, the authors identified COMMD3 as a regulator of GLUT4-SPR trafficking and validated its function using knockout experiments, which demonstrated its impact on endosomal morphology and trafficking independent of the Commander complex. Importantly, they mapped the interaction between the N-terminal domain (NTD) of COMMD3 and the GTPase Arf1, and through structure-guided mutagenesis, established that this interaction is essential for COMMD3's Commander-independent activity. The manuscript provides compelling evidence supporting this newly identified function of COMMD3, and I find the authors' interpretations well-justified. This is an excellent and intriguing study.

      Comments on revisions:

      The authors addressed all comments. Congratulations on this exciting work.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public reviews):

      (1) Commander-Independent Role of COMMD3: While the authors provided evidence to support the Commander-independent role of COMMD3-such as the absence of other Commander subunits in the CRISPR screen and not decreased COMMD3 levels in other subunit-KO cells- direct evidence is lacking. The mutation that specifically disrupts the COMMD3-ARF1 interaction could serve as a valuable tool to directly address this question.

      The Reviewer raised an excellent point. We fully agree with the Reviewer that multiple lines of evidence are needed to support the novel Commander-independent function of COMMD3.

      Comparative genetic analyses in Figures 4 and 5 indicate that COMMD3 regulates endosomal retrieval independently of the Commander complex. In Figure 8 of the revised manuscript, we show that point mutations introduced into the COMMD3:ARF1 interface impair this Commander-independent function. Moreover, Figure 6 demonstrates that ARF1 upregulation fully rescues the KO phenotype of COMMD3. In addition, Figure S2 further supports that COMMD3 levels, but not those of other Commander subunits, correspond to its Commander-independent function in endosomal trafficking. We have also revised the Discussion section to elaborate on the implications of these findings. We appreciate the Reviewer’s advice.

      (2) Role of ARF1 in Cargo Selection: The Commander-independent function of COMMD3 appears cargo-dependent and relies on ARF1's role in cargo selection. The authors should investigate whether KO/KD of ARF1 reduces cell surface levels of ITGA6 and TfR.

      The Reviewer correctly pointed out that KO/KD of ARF1 may provide further insights into the Commander-independent function of COMMD3. However, since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would disrupt multiple trafficking routes, making the data difficult to interpret. Instead, we focused on point mutations in the NTD that specifically disrupt ARF1 binding without affecting the function of the Commander complex (Fig. 8). As these mutations impair the Commander-independent function of COMMD3, our data strongly support a direct role for ARF1 in this recycling pathway. We note that the discovery of a novel trafficking pathway inevitably opens many research directions. One such direction is to systematically identify cargoes that rely on COMMD3 but not the Commander complex for endosomal retrieval.

      (3) Impact on TfR Stability: Figure 7D suggests that TfR protein levels are reduced in COMMD3-KO cells, potentially due to degradation caused by disrupted recycling. This raises the question of whether the observed reduction in cell surface TfR is due to impaired endosomal recycling or decreased total protein levels. The authors should quantify the ratio of cell surface protein to total protein for TfR, GLUT-SPR, and ITGA6 in COMMD3-KO cells.

      Based on the Reviewer's suggestion, we quantified both the total levels and the surface-tototal ratio of TfR, as shown in Figure S1 of the revised manuscript. These new data further support the conclusion that defects in TfR retrieval lead to its lysosomal degradation. The GLUT-SPR data presented in the main figures represent the surface-to-total ratio of the GLUT-SPR reporter. We thank the Reviewer for the important suggestion.

      Reviewer #1 (Recommendations for the authors):

      (1) Commander-Independent Role of COMMD3: The mutation that specifically disrupts the COMMD3-ARF1 interaction could serve as a valuable tool to directly address this question. The authors should evaluate whether the full-length mutant of COMMD3 can rescue decreased levels of CCDC93 and VPS35L, as well as cell surface ITGA6, TfR, and GLUT4 inCOMMD3-KO cells.

      This is an excellent point. In our mechanistic experiments, we focused on the NTD of COMMD3 because this domain mediates its Commander-independent function and is not involved in forming the Commander holo-complex. This approach allowed us to draw unambiguous conclusions. Nevertheless, we anticipate that full-length COMMD3 carrying these point mutations would also be defective in regulating Commander-independent cargo.

      (2) Role of ARF1 in Cargo Selection: The authors should investigate whether KO/KD of ARF1 reduces cell surface levels of ITGA6 and TfR. Was ARF1 identified in the initial CRISPR screen? If so, this should be explicitly noted. Alternatively, does ARF1 overexpression rescue ITGA6 levels in COMMD3-KO cells? Furthermore, does ARF1 overexpression rescue TfR levels in COMMD3 and CCDC93 double-KO cells?

      Reinto the Commander-independent function of COMMD3. However, since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would disrupt multiple trafficking routes, making the data difficult to interpret. Instead, we focused on point mutations that specifically disrupt ARF1 binding without affecting the function of the Commander complex (Fig. 8). Since these mutations impair the Commander-independent function of COMMD3, our data strongly support a direct role for ARF1 in this novel recycling pathway. Based on our genetic data, we anticipate that all COMMD3-dependent cargoes will be similarly rescued in ARF1-overexpressing cells. In line with the Reviewer's comment, a key research direction we are currently pursuing is systematically determining how surface protein levels are affected by COMMD3 KO and ARF1 overexpression using surface proteomics.

      (3) Inconsistency in COMMD3 Rescue Levels (Figure 5A): Figure 5A shows comparable or higher levels of COMMD3 in rescued cells than in CCDC93-KO and VPS35L-KO cells. However, COMMD3 rescue did not increase cell surface TfR as much as in CCDC93-KO and VPS35L-KO cells. This inconsistency should be discussed or validated.

      To address the Reviewer’s inquiry, we quantified COMMD3 expression levels in these cell lines using multiple independent experiments. The new data are presented in Figure S2 of the revised manuscript. These expanded datasets allowed us to more accurately determine the relationship between COMMD3 expression and our genetic data. Since the Commander complex remains intact in the COMMD3 rescue cells, a significant portion of COMMD3 proteins are expected to be incorporated into the Commander complex, which does not regulate TfR recycling. In contrast, because the Commander complex is disrupted in Ccdc93 and Vps35l KO cells, all COMMD3 proteins are available to regulate TfR recycling in a Commander-independent manner. These findings are fully consistent with the similar surface TfR levels observed in Ccdc93/Vps35l KO cells and COMMD3 overexpressing cells. We thank the Reviewer for this excellent suggestion.

      (4) Significance of NTD in COMMD3 Function: The conclusion that "the NTD of COMMD3 mediates its Commander-independent function and interacts with ARF1" (Page 12) is not fully supported without a side-by-side comparison of NTD, CTD, and FL COMMD3 in the same experiment (e.g., Figures 6B and 6G). Additional data is needed to strengthen this claim.

      We conducted the experiment suggested by the Reviewer and included the data in Figure S3. Our results indicate that the COMMD3 CTD cannot mediate the Commander-independent function of COMMD3 in endosomal retrieval. We appreciate the Reviewer’s suggestion.

      (5) ARF1 Stabilization Experiments: To substantiate the claim that COMMD3 binds and stabilizes the GTP-form of ARF1, the authors should include a comparative experiment showing GTP-form, GDPform, and wild-type ARF1 (e.g., Figures 6G and 7C).

      We fully agree with the Reviewer that it would be important to compare how the ARF1:COMMD3 interaction is influenced by the nucleotide-binding state. However, trapping ARF1 in its GDP-bound state remains unfeasible, and nucleotide-free small GTPases are inherently unstable. In addition, WT ARF1 likely exists as a mixture of GTP- and GDP-bound forms, further complicating the analysis. To address the Reviewer’s comment, we used AlphaFold3 predictions. Interestingly, we found that the ipTM score of GTP-ARF1:COMMD3 is significantly higher than that of GDP-ARF1:COMMD3 or apo-ARF1:COMMD3, supporting our conclusion that COMMD3 recognizes and stabilizes the active form of ARF1.

      (6) Validation of NTD Mutation (Figure 8): Co-immunoprecipitation or cellular co-localization experiments should be performed to confirm that the NTD mutation disrupts the interaction between COMMD3 and ARF1, as depicted in Figure 8.

      This is an important question, and the best approach to address it would be to measure the binding affinity of the WT and mutant proteins using ITC or SPR. However, this is currently unfeasible, as we have not yet obtained pure recombinant COMMD3 and GTP-ARF1 proteins. Co-IP, by nature, is a crude assay that often fails to detect changes in binding affinity. A previous study on other proteins showed that mutations in protein-binding interfaces strongly reduced binding affinity as measured by SPR, but these changes would have been missed by co-IP assays (PMID: 25500532). In agreement with this limitation, our co-IP experiments did not yield conclusive results. Instead, we focused on structure-guided genetic experiments, which unequivocally demonstrated the effects of targeted mutations on the Commander-independent function of COMMD3. 

      Reviewer #2 (Public review):

      (1) All existing data suggest that COMMD3 is a subunit of the Commander complex. Is there any evidence that COMMD3 can exist as a monomer?

      The Reviewer raised an intriguing point. Indeed, COMMD proteins, including COMMD3, can exist outside the Commander holo-complex and form homo- or hetero-oligomers, as monomeric COMMD proteins are likely unstable. These observations align well with the Commander-independent function identified in this study. We have revised the Discussion section of the manuscript to further elaborate on this point and thank the Reviewer for the suggestion.

      (2) In Figure 9, the author emphasizes COMMD3-dependent cargo and Commander-dependent cargo. Can the authors speculate what distinguishes these two types of cargo? Do they contain sequence-specific motifs?

      This is another important question. Our data clearly demonstrate that COMMD3 has a Commander-independent function in addition to its canonical role within the Commander holocomplex. Since cargo proteins typically possess multiple sorting signals that operate at different stages of the exocytic and endocytic pathways, identifying COMMD3-dependent sorting signals remains a challenge. ARF4 has been shown to specifically recognize the VXPX motif (PMID: 15728366), suggesting that ARF1 may similarly bind cytosolic sorting signals, with COMMD3 stabilizing this interaction. A key future direction is to systematically identify COMMD3-dependent cargo proteins and elucidate the mechanisms underlying their endosomal sorting. We have revised the Discussion section of the manuscript to explicitly address this point and thank the Reviewer for this important suggestion.

      (3) What could be the possible mechanism underlying the observation that the knockout of COMMD3 results in larger early endosomes? How is the disruption of cargo retrieval related to the increase in endosome size?

      The endosomal retrieval process is critical for recycling membrane proteins and lipids back to the plasma membrane or the trans-Golgi network. When this process is disrupted, cargo that should be recycled accumulates within endosomes, leading to their enlargement. For example, defects in retromer function can cause endosomal swelling due to cargo accumulation (PMID: 33380435). We added this citation to the revised manuscript and thank the Reviewer for the advice. 

      Reviewer 3 (Recommendations for the authors):

      (1) Figure 4: How do the authors define Commander-dependent vs. Commander-independent cargos?

      In Figure 4, the surface expression of ITGA6 is reduced to approximately 0.75 across all knockouts. However, there is a similar level of reduction for GLUT4-SPR in the commd5 knockout and for LAMP1 in the commd5 and commd1 knockouts. Are GLUT4-SPR and LAMP1 Commander-dependent or Commander-independent cargos? Additionally, how does COMMD3 specifically identify/distinguish these cargos?

      This is an excellent point. Our data suggest that TfR is a COMMD3-dependent but Commander-independent cargo, whereas ITGA6 is a Commander-dependent cargo that does not involve COMMD3-specific functions. The other two cargoes we examined—GLUT-SPR and LAMP1—primarily rely on COMMD3, with the Commander complex playing a minor role. Together, these observations clearly demonstrate that COMMD3 has a Commander-independent function in addition to its canonical role within the Commander holo-complex. Since cargo proteins typically possess multiple sorting signals that operate at different stages of the exocytic and endocytic pathways, identifying COMMD3-dependent sorting signals remains a challenge. ARF4 has been shown to specifically recognize the VXPX motif (PMID: 15728366), suggesting that ARF1 may similarly bind cytosolic sorting signals, with COMMD3 stabilizing this interaction. A key future direction is to systematically identify COMMD3-dependent cargo proteins and elucidate the mechanisms underlying their endosomal sorting. We have revised the Discussion section of the manuscript to explicitly address this point. We thank the Reviewer for this important suggestion.

      (2) There is an increase in the surface expression of GLUT4-SPR in the commd1 knockout. Is this increase significant? The figure suggests a significant increase, but the text states it remains unchanged. Clarification is needed.

      We found that surface levels of GLUT-SPR were slightly increased in Commd1 KO cells, in stark contrast to the strong reduction observed in Commd3 KO cells (Fig. 4B). This finding is consistent with our conclusion that COMMD3 has a distinct role from other Commander subunits. We have revised the Results section to more clearly describe these data and thank the Reviewer for the advice.

      (3) Figure 5A: To support the claim that COMMD3 is upregulated in the vps35l KO/Ccdc93 KO, the authors should quantify COMMD3 expression. Also, why is there a Vps35l band present in the Vps35l knockout cells?

      Based on the Reviewer’s suggestion, we quantified the total levels of COMMD3 and included these new data in Figure S2. In this study, gene deletion was achieved through the simultaneous introduction of two independent gRNAs. Based on our previous experience, this strategy typically results in the complete loss of gene expression. We posit that the residual band observed in Vps35l KO cells originates from background signals, such as nonspecific staining by the antibody.

      (4) Figure 7: It is intriguing that COMMD3 stabilizes Arf1-GTP and can compensate for COMMD3 in knockout cells. However, is this stabilization specific to TfR cargo only? The authors should test additional Commander-dependent and Commander-independent cargos to clarify this point.

      Based on our genetic data, we anticipate that all COMMD3-dependent cargoes will be similarly rescued in ARF1-overexpressing cells. In line with the Reviewer's comment, an important direction we are pursuing is the use of surface proteomics to systematically determine how surface protein levels are affected by COMMD3 KO and ARF1 overexpression.

      (5) Is Arf1 interaction specific to COMMD3? The authors should investigate the effects of Arf1 knockout on COMMD3 expression and test its role in regulating Commander-dependent and Commander-independent cargos.

      The Reviewer raised an excellent point. Since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would interfere with multiple trafficking routes and the data would be difficult to interpret. Thus, in this work, we focused on the function and mechanism of the COMMD3:ARF1 complex on the endosome. Based on the suggestion of the Reviewer, we used AlphaFold3 to predict ARF1 binding to COMMD proteins. Interestingly, the complex with the highest predicted ipTM score is COMMD3:ARF1, while other COMMD proteins have much lower predicted binding scores. These results are consistent with the results of our unbiased CRISPR screens and targeted gene KO, and further support the conclusion that the COMMD3:ARF1 binding is specific and physiologically important in endosomal trafficking.

    1. eLife Assessment

      This valuable study uses AlphaFold2 to guide the structural modelling of different states of the human voltage-gated potassium channel KV11.1, a key pharmacological drug target. Follow-up molecular dynamics and drug-docking simulations, combined with experimental characterization, offer convincing evidence supporting the models. The work shows potential for improving drug potency predictions in ion channel pharmacology.

    2. Reviewer #1 (Public review):

      Summary:

      Ngo et. al use several computational methods to determine and characterize structures defining the three major states sampled by the human voltage-gated potassium channel hERG: the open, closed and inactivated state. Specifically, they use AlphaFold and Rosetta to generate conformations that likely represent key features of the open, closed and inactivated states of this channel. Molecular dynamics simulations confirm that ion conduction for structure models of the open but not the inactivated state. Moreover, drug docking in silico experiments show differential binding of drugs to the conformation of the three states; the inactivated one being preferentially bound by many of them. Docking results are then combined with a Markov model to get state-weighted binding free energies that are compared with experimentally measured ones.

      Strengths:

      The study uses state-of-the-art modeling methods to provide detailed insights into the structure-function relationship of an important human potassium channel. AlphaFold modeling, MD simulations and Markov modeling are nicely combined to investigate the impact of structural changes in the hERG channel on potassium conduction and drug binding.

      Weaknesses:

      (1) Selection of inactivated conformations based on AlphaFold modeling seems a bit biased.<br /> The authors base their initial selection of the "most likely" inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit the "Streetlight effect". It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. In addition, I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      (2) The comparison of predicted and experimentally measured binding affinities lacks of appropriate controls. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Importantly, labels for open, closed and inactivated state should be randomized to check robustness of the findings. Such a control would strengthen the overall findings significantly.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e,g. Figure 3d).

      The authors have successfully achieved their goal of providing new insights into the structural details of the three major conformational states sampled by the human voltage-gated potassium channel hERG, and linking these states to changes in drug-binding affinities. However, the study would benefit from more robust controls and orthogonal validation. Additionally, the generalizability of the approach remains to be demonstrated.

    3. Reviewer #2 (Public review):

      Summary:

      Ngo et al. use AlphaFold2 and Rosetta to model closed, open, and inactive states of the human ion channel hERG. Subsequent MD simulations and comparisons with experiment support the plausibility of their models.

      Strengths:

      Ngo et al. employ various computational methods to enhance AlphaFold2's prediction capabilities for the human voltage-gated potassium channel hERG. They guide AlphaFold2 to explore different protein conformations and states, including its open, closed, and inactivated forms, using targeted templates. Additionally, they applied the Rosetta FastRelax protocol with an implicit membrane to refine the conformation of each residue in the predictions and address steric clashes, along with molecular dynamics (MD) simulations to account for membrane-pore flexibility. The methodology is well-described, and the figures are clear and descriptive.

      The authors have addressed some of the concerns raised during the first round of reviews. For instance, to mitigate potential bias in selecting the inactivated conformation, they evaluated conformational variability via backbone dihedral angles at specific residues in the selectivity filter and the drug binding sites. They also evaluated the top representative model from inactivated-state-sampling Cluster 3 (termed "AF ic3"), which was initially excluded. This model is now included in the revised manuscript as Figure S9a, b. MD simulations confirmed that this state could be a potential alternative open-state conformation. The authors also acknowledged the limitation of their study by not incorporating other enhanced sampling methods and AF3.

      In the revised manuscript, the authors provided more extensive explanations of their methods. For example, they explained that their approach to template selection was guided by their experience-AlphaFold2 with larger templates often overly constraining predictions to the input structure, reducing its flexibility to explore alternative conformations. In contrast, smaller, targeted fragments increase the likelihood that AlphaFold2 will incorporate the desired structural features while predicting the rest of the protein. They also noted that pLDDT scores are not always reliable for selecting new or alternative conformations, citing proper references. They included a model from cluster 3 of the inactivated-state sampling process, which exhibited lower pLDDT scores to illustrate this further.

      Another point raised by the reviewers was the exclusion of the N-terminal PAS domain due to GPU memory limitations and its impact on the study. This omission may overlook the PAS domain's potential roles in gating kinetics and allosteric effects on drug binding. The authors acknowledged these limitations in the main text and highlighted the need for future studies to explore these regions in greater detail. They also alluded to potential future research to address these points. Additionally, they have made some of their analysis scripts and tools available on GitHub as a community resource.

      Weakness:

      The primary issue with the study is the lack of a general pipeline or strategy that can be universally applied to any system, even if limited to ion channels or membrane proteins. A related paper assessed the conformational variability in voltage-sensing domains (VSDs) by applying both the default MSA depth and a range of reduced MSA depths to enhance conformational diversity (please see https://doi.org/10.1101/2025.03.12.642934). They generated 600 models for 32 members of the voltage-gated cation channel superfamily and demonstrated that AlphaFold2 can predict a range of diverse structures of the VSDs, representing activated, deactivated, and intermediate conformations, with more diversity observed for some VSDs compared to others.

      The authors have addressed one of the reviewer's concerns about generalizability by including an example in Figure S14 of the modified text, showing how their approach can be applied to model another ion channel system. However, some outstanding questions remain: Is this method better suited for ion channels or membrane proteins with already solved structures and extensive research available? Can this pipeline be applied to other systems as well? Additionally, how does this method compare to other methods using MSA subsampling and other enhanced AF-based techniques to generate alternative conformations of proteins?

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the “most likely” inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the “Streetlight effect”. It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      We sincerely thank the reviewer for their perceptive critique highlighting potential bias in selecting the inactivated conformation. We recognize that over-relying on preconceived traits could limit exploration of diverse inactivated states, and we appreciate the opportunity to address this concern.

      Although we selected the model with the flipped V625 in the selectivity filter (SF) from the first round of inactivated-state sampling as the template for the second round, the resulting models still exhibited substantial diversity in their SF conformations. This selection primarily served to steer predictions away from the open-state configuration observed in the PDB 5VA2 SF, and we have clarified this rationale in the Methodology section. To assess conformational variability, we examined backbone dihedral angles (phi φ and psi ψ) at key residues in the selectivity filter (S624 – G628) and drugbinding region on the pore-lining S6 segment (Y652, F656), of all 100 models sampled in the subsequent inactivatedstate-sampling attempt. By overlaying the φ and ψ dihedral angles from different models, including the open state (PDB 5VA2-based), the closed state, and representative models from AlphaFold inactivated-state-sampling Cluster 2 and Cluster 3, we found that these conformations consistently fall within or near high-probability regions of the dihedral angle distributions. This indicates that these structural states are well represented within the ensemble of conformations sampled by AlphaFold within the scope of this study, particularly at functionally critical positions.

      Following the analysis above and consistent with the reviewer’s suggestion, we evaluated the top representative model from inactivated-state-sampling Cluster 3 (named “AF ic3”), which we had initially excluded. This model demonstrated SF residue G626 carbonyl oxygen flipped away from the conduction pathway, hinting at potential impact on ion conduction, yet its pore region structurally resembled the open state (Figure S9a, b). To test this objectively, we ran molecular dynamics (MD) simulations (2 runs, 1 μs long each, with applied 750 mV voltage) with varied initial ion/water configurations in the SF, finding it consistently open and conducting throughout (Figure S9c, d), consistent with our previous observations in Figure S11 that ion conduction can still occur when the upper SF is dilated. Drug docking (Figure S12) further revealed that the model exhibited binding affinities similar to those for the PDB 5VA2-based openstate structure. These findings combined led us to classify it as a possible alternative open-state conformation.

      Models from Cluster 4 were not tested due to extensive steric clashes, where residues in the SF overlapped with neighboring residues from adjacent subunits. The remaining models displayed SF conformations that combined features from earlier clusters. However, due to subunit-to-subunit variability, where individual subunits adopted differing conformations, they were classified as outliers. This combination of features may be valuable to investigate further in a follow-up study.

      We acknowledge that our approach is just one of many ways to sample different states, and alternative strategies, such as generating more models, varying multiple sequence alignment (MSA) subsampling, or testing different templates, might reveal improved models. Given that hERG channel inactivation likely spans a spectrum of conformations, our resource limitations may have restricted us to exploring and validating only part of this diversity. Nevertheless, the putative inactivated (AlphaFold Cluster 2) model’s non-conductivity and improved affinity for drugs targeting the inactivated state observed in our study suggests that this approach may be capturing relevant features of the inactivated-state conformation. We look forward to investigating deeper other possibilities in a future study and are grateful for the reviewer’s feedback.

      (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.

      We appreciate the reviewer’s insightful suggestion. To address this, we extended our analysis by incorporating an alternative AlphaFold2-predicted model from inactivated-state-sampling cluster 3 as a structural control. This model was established in a previously discussed analysis to be open and conducting as a follow up to comment #1, so we will call it Open (AF ic3) to differentiate it from Open (PDB 5VA2). We evaluated this new model in single-state and multi-state contexts alongside our original open-state model based on the experimental PDB 5VA2 structure. Additionally, we expanded the drug docking procedure to explore a broader region around the putative drug binding site by increasing the sampling space, and we adopted an improved approach for selecting representative docking poses to better capture relevant binding modes.

      Shown in Figure 7 are comparisons of experimental drug potencies with the binding affinities from the molecular docking calculations under the following conditions:

      (a) Single-state docking using the experimentally derived open-state structure (PDB 5VA2)

      (b) Multi-state docking incorporating open (PDB 5VA2), inactivated, and closed-state conformations weighted by experimentally observed state distributions

      (c) Single-state docking using an alternative AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)

      (d) Multi-state docking combining the AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)

      Using only the open-state model (PDB 5VA2) yielded a moderate correlation with experimental data (R<sup>2</sup> = 0.43, r = 0.66, Figure 7a). Incorporating multi-state binding (weighted by their experimental distributions) improved the correlation substantially (R<sup>2</sup> = 0.63, r = 0.79, Figure 7b), boosting predictive power by 47% and underscoring the value of multi-state modeling. Importantly, this improvement was achieved without considering potential drug-induced allosteric effects on the hERG channel conformation and gating, which will be addressed in future work.

      Next, we substituted the PDB 5VA2-based open-state model with the AF ic3 open-state model. Docking to this alternative model alone produced similar performance (R<sup>2</sup> = 0.44, r = 0.66, Figure 7c), and incorporating it into the multi-state ensemble further improved the correlation with experiments (R<sup>2</sup> = 0.64, r = 0.80, Figure 7d), representing a 45% gain in R<sup>2</sup> and matching the performance of multi-state docking results based on the PDB 5VA2-derived model.

      These findings suggest that the predictive power of computational drug docking is enhanced not merely by the accuracy of individual models, but by the structural diversity and complementarity provided by an ensemble of protein conformations. Rather than relying solely on a single experimentally determined protein structure, the ensemble benefits from incorporating AlphaFold-predicted models that capture alternative conformations identified through our state-specific sampling approach. These diverse protein models reflect different structural features, which together offer a more comprehensive representation of the ion channel’s binding landscape and enhance the predictive performance of computational drug docking. Overall, these results reinforce that multi-state modeling offers a more realistic and predictive framework for understanding drug – ion channel interactions than traditional single-state approaches, emphasizing the value of both individual model evaluation and their collective integration. We are grateful for the reviewer’s suggestion.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e.g. Figure 3d).

      We appreciate the reviewer’s comment on the statistical significance assessment in Figure 3d. To clarify, the comparisons shown in the subpanels are based on three selected representative models for each state, rather than a broader population sample (similarly for Figure 3b). In the closed-state predicted models, the strong convergence of the voltagesensing domain (VSD), with an all-atom RMSD of 0.36 Å between cluster 1 and 2 closed-state sampling models and 0.95 Å to the outlier cluster, indicates minimal structural variation. Those RMSD values shown in the manuscript text demonstrates good convergence and by themselves represent statistical significance assessment of those models. This trend extends to open-state and inactivated-state AlphaFold models with similarly limited differences in the VSD regions among them. This convergence suggests that population-based statistical analysis may not reveal meaningful deviations, as the low variability among models limits the insights beyond those obtained from comparing representative structures.

      Nonetheless, we acknowledge this limitation. In future studies, we plan to explore alternative modeling approaches to introduce greater variability, enabling a more robust statistical evaluation of state-specific trends in the predictions.

      (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.

      We have explored these conformational state dynamics through MD simulations for the Open (5VA2-based), Inactivated (AlphaFold Cluster 2), and Closed-state models, as presented in Figures S7, S8, S10, S11. These figures provide detailed insights: Figure S7-S8 analyzes SF and pore conformation dynamics, including averaged pore radii with and without voltage and superimposed conformational ensembles; Figure S10 tracks cross-subunit distances between protein backbone carbonyl oxygens, revealing sequential SF dilation steps near residues F627 an G628; and Figure S11 illustrates this SF dilation process over time, highlighting residue F627 carbonyl flipping and SF expansion. We appreciate the opportunity to clarify our approach.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) Protein fragments are used to model the closed and inactivated states of hERG, but the choices of fragments are not well justified. For instance, in Figure 1a, helices from 8EP1 (deactivated voltage-sensing domain) and a helix+loop from 5VA2 (selectivity filter) are used. Why just the selectivity filter and not the cytosolic domain, for instance? Why not some parts of the helices attached to the selectivity filter, or the whole membrane inserted domain of 8EP1? Same for the inactivated conformation in Figure 1c: why the cytosolic domain only?

      We thank the reviewer for their thoughtful questions regarding our choice of protein fragments for modeling the closed and inactivated states of hERG in Figures 1a and 1c, and we appreciate the opportunity to justify these selections more clearly. Our approach to template selection was guided by our experience that providing AlphaFold2 with larger templates often leads it to overly constrain predictions to the input structure, reducing its flexibility to explore alternative conformations. In contrast, smaller, targeted fragments increase the likelihood that AlphaFold2 will incorporate the desired structural features while predicting the rest of the protein. We have provided a more detailed discussion of this in the methodology section.

      For the closed state (Figure 1a), we chose the deactivated voltage-sensing domain (VSD) from the rat EAG channel (PDB 8EP1) to inspire AlphaFold2 to predict a similarly deactivated VSD conformation characteristic of hERG channel closure, as this domain’s downward shift is a hallmark of potassium channel closure. We paired this with the selectivity filter (SF) and adjacent residues from the open-state hERG structure (PDB 5VA2) to maintain its conductive conformation, as it is generally understood that K<sup>+</sup> channel closure primarily involves the intracellular gate rather than significant SF distortion. Including additional helices (e.g., S5–S6) or the entire membrane domain from PDB 8EP1 risked biasing the model toward the EAG channel’s pore structure, which differs from hERG’s, while omitting the cytosolic domain ensured focus on the VSD-driven closure without over-constraining cytoplasmic domain interactions.

      For the inactivated state (Figure 1c), we initially used only the cytosolic domain from PDB 5VA2 to anchor the prediction while allowing AlphaFold2 to freely sample transmembrane domain conformations, particularly the SF, where the inactivation occurs via its distortion. Excluding the SF or attached helices at this stage avoided locking the model into the open-state SF, and the cytosolic domain alone provided a minimal scaffold to maintain hERG’s intracellular architecture without dictating pore dynamics. Following the initial prediction, we initiated more extensive sampling by using one of the predicted SFs that differs from the open-state SF (PDB 5VA2) as a structural seed, aiming to guide predictions away from the open-state configuration. The VSD and cytosolic domain were also included in this state to discourage pore closure during prediction. Using larger fragments, like the full membrane-spanning domains or additional cytosolic regions from the open-state structure might reduce AlphaFold2’s ability to deviate from the open-state conformation, undermining our goal of capturing more diverse, state-specific features.

      It is worth noting that multiple strategies could potentially achieve the predicted models in our study, and here we only present examples of the paths we took and validated. It is likely that many of the steps may be unnecessary and could be skipped, and future work building on our approach can further explore and streamline this process. A consistent theme underlies our choices: for the closed state, we know the VSD should adopt a deactivated (“down”) conformation, so we provide AlphaFold2 with a specific fragment to guide this outcome; for the inactivated state, we recognize that the SF must change to a non-conductive conformation, so we grant AlphaFold2 flexibility to explore diverse conformations by minimizing initial constraints on the transmembrane region.

      With greater sampling and computational resources, it is possible we could identify additional plausible, non-conductive conformations that might better represent an inactivated state, as hERG inactivation may encompass a spectrum of states. In this study, due to resource limitations, we focused on generating and validating a subset of conformations. Still, we acknowledge that broader exploration could further refine these models, which could be pursued in future studies. We updated the Methods and Discussion sections to reflect this perspective, and we are grateful for the reviewer’s input, which encourages us to clarify our rationale and highlight the adaptability of our approach.

      To demonstrate the broader feasibility of this approach, we applied it to another ion channel system, voltage-gated sodium channel Na<sub>V</sub> 1.5, as illustrated in Figure S14. In this example, a deactivated VSD II from the cryo-EM structure of a homologous ion channel Na<sub>V</sub>1.7 (PDB 6N4R) (DOI: 10.1016/j.cell.2018.12.018), which was trapped in a deactivated state by a bound toxin, was used as a structural template. This guided AlphaFold to generate a Na<sub>V</sub>1.5 model in which all four voltage sensor domains (VSD I–IV) exhibit S4 helices in varying degrees of deactivation. Compared to the cryo-EM openstate Na<sub>V</sub>1.5 structure (PDB 6LQA) (DOI: 10.1002/anie.202102196), the predicted model displays a visibly narrower pore, representing a plausible closed state. This example underscores the versatility of our strategy in modeling alternative conformational states across diverse ion channels.

      (2) While the authors rely on AF2 (ColabFold) for the closed and inactivated states, they use Rosetta to model loops of the open state. Why not just supply 5VA2 as a template to ColabFold and rebuild the loops that way? Without clear explanations, these sorts of choices give the impression that the authors were looking for specific answers that they knew from their extensive knowledge of the hERG system. While the modeling done in this paper is very nice, its generalizability is not obvious.

      We appreciate the reviewer’s question about our use of Rosetta to model loops in the open-state hERG channel (PDB

      5VA2) rather than rebuilding it entirely with ColabFold. In the study, we conducted a control experiment supplying parts of PDB 5VA2 to ColabFold to rebuild the loops, generating 100 models (Figure 2a: predicted open state). The top-ranked model (by pLDDT) differed from our Rosetta-modelled structure by only 0.5 Å RMSD, primarily due to the flexible extracellular loops as expected, with the pore and selectivity filter (our areas of focus) remaining nearly identical. We chose the Rosetta-refined cryo-EM structure as this structure and approach have been widely used as an open-state reference in our other hERG channel studies, such as by Miranda et al. (DOI: 10.1073/pnas.1909196117) and Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404), to ensure that our results are more directly comparable to prior work in the field. Nonetheless, as both models (with loops modeled by Rosetta or AlphaFold) were virtually identical, we would expect no significant differences if either were used to represent the open state in our study. We have incorporated this clarification into the main text.

      (3) pLDDT scores were used as a measure of reliable and accurate predictions, but plDDT is not always reliable for selecting new/alternative conformations (see https://doi.org/10.1038/s41467-024-515072 and https://www.nature.com/articles/s41467-024-51801-z).

      We acknowledge that while pLDDT is a valuable indicator of structural confidence in AlphaFold2 predictions, its limitations warrant consideration. In our revision, we mitigated this by not relying solely on pLDDT, but we also performed protein backbone dihedral angle analysis of the protein regions of focus in all predicted models to ensure comprehensive coverage of conformational variations. From our AlphaFold modeling results, we tested a model from cluster 3 of the inactivated-state sampling process, which exhibited lower pLDDT scores, and included these results in our revised analysis. We included a note in the revised manuscript’s Discussion section: “As noted in recent studies, pLDDT scores are not reliable indicators for selecting alternative conformations (DOI: 10.1038/s41467-024-51507-2 and DOI: 10.1038/s41467-024-51801-z). To address this, we performed a protein backbone dihedral angle analysis in the regions of interest to ensure that our evaluation captured a representative range of sampled conformations.”

      (4) Extensive work has been done using AF2 to model alternative protein conformations (https://www.biorxiv.org/content/10.1101/2024.05.28.596195v1.abstract, along with some references the authors cite, such as work by McHaourab); another group recently modeled the ion channel GLIC (https://www.biorxiv.org/content/10.1101/2024.09.05.611464v1.abstract). Therefore, this work, though generally solid and thorough, seems more like a variation on a theme than a groundbreaking new methodology, especially because of the generalizability issues mentioned above.

      We sincerely thank the reviewer for acknowledging the solidity of our study and for drawing our attention to the impressive recent efforts using AlphaFold2 to explore alternative protein conformations. These studies are valuable contributions that highlight the versatility of AlphaFold2, and we are grateful for their context in evaluating our work.

      Building on these efforts, our approach not only enhances the prediction of conformational diversity but also introduces a twist by incorporating structural templates to guide AlphaFold2 toward specific functional protein states. More significantly, our study advances beyond mere structural modeling by integrating these conformations with their rigorous validation by incorporating multiple simulation results tested against experimental data to reveal that AlphaFold-predicted conformations can align with distinct physiological ion channel states. A key finding is that drug binding predictions using AlphaFold-derived hERG channel states substantially improve correlation with experimental data, which is a longstanding challenge in computational screening of multi-state proteins like the hERG channel, for which previous structural models have been mostly limited to the open state based on the cryo-EM structures. Our approach not only captures this critical state dependence but also reveals potential molecular determinants underlying enhanced drug binding during hERG channel inactivation, a phenomenon observed experimentally but poorly understood. These insights advance drug safety assessment by improving predictive screening for hERG-related cardiotoxicity, a major cause of drug attrition and withdrawal.

      We view our methodology as a natural evolution of the advancements cited by the reviewer, offering an approach that predicts diverse hERG channel conformational states and links them to meaningful functional and pharmacological outcomes. To address the reviewer’s concern about generalizability, we have expanded the methodology section to make it easier to follow and include additional details. As an example, we show how our approach can be applied to model another ion channel system, Na<sub>V</sub>1.5, in Figure S14.

      Furthermore, to enhance the applicability of our methodology, we have uploaded the scripts for analyzing AlphaFoldpredicted models to GitHub (https://github.com/k-ngo/AlphaFold_Analysis), ensuring they are adaptable for a wide range of scenarios with extensive documentation. This enables users, even those not focused on ion channels, to effectively apply our tools to analyze AlphaFold predictions for their own projects and produce publication-ready figures.

      While it is likely that multiple modeling approaches could lead AlphaFold to model alternative protein conformations, the key challenge lies in validating the physiological relevance of those predicted states. This study is intended to support other researchers in applying our template-guided approach to different protein systems, and more importantly, in rigorously in silico testing and validation of the biological significance of the conformation-specific structural models they generate.

      Minor concerns:

      (1) The authors mention in the Introduction section that capturing conformational states, especially for membrane proteins that may be significant as drug targets, is crucial. It would be helpful to relate their work to the NMR studies domains of the hERG channel, particularly the N-terminal “eag” domain, which is crucial for channel function and can provide insights into conformational changes associated with different channel states (https://doi.org/10.1016/j.bbrc.2010.10.132 ).

      We appreciate the reviewer’s insightful comment regarding the PAS domain and the potential influence of other regions, such as the N-linker and distal C-region, on drug binding and state transitions.

      The PAS domain did appear in the starting templates used for initial structural modeling (as shown in Figure 1a, b, c), but it was not included in the final models used for subsequent analyses. The omission was primarily due to hardwareimposed constraints, as including these additional regions would exceed the memory capacity of our current graphics processing unit (GPU) card, leading to failures during the prediction step.

      The PAS domain, even if not serving as a conventional direct drug-binding site, can influence the gating kinetics of hERG channels. By altering the probability and duration with which channels occupy specific states, it can indirectly affect how well drugs bind. For example, if the presence of the PAS domain shifts hERG channel gating so that more channels enter (and remain in) the inactivated state as was shown previously (e.g., DOI: 10.1085/jgp.201210870), drugs with a higher affinity for that state would appear to bind more potently, as observed in previous electrophysiological experiments (e.g., DOI: 10.1111/j.1476-5381.2011.01378.x). It is also plausible that the PAS domain could exert allosteric effects that alter the conformational landscape of the hERG channel during gating transitions, potentially impacting drug accessibility or binding stability. This is an intriguing hypothesis and an important avenue for future research.

      With access to more powerful computational resources, it would be valuable to explore the full-length hERG channel, including the PAS domain and associated regions, to assess their potential contributions to drug binding and gating dynamics. We incorporated a discussion of these points into the main text, acknowledging the limitations of our current models and highlighting the need for future studies to explore these regions in greater detail. The addition reads: “…Our models excluded the N-terminal PAS domain due to GPU memory limitations, despite its inclusion in initial templates. This omission may overlook its potential roles in gating kinetics and allosteric effects on drug binding (e.g., PMID: 21449979, PMID: 23319729, PMID: 29706893, PMID: 30826123, DOI:10.4103/jpp.JPP_158_17). Future research will explore the full-length hERG channel with enhanced computational resources to assess these regions’ contributions to conformational state transitions and pharmacology.”

      (2) In the second-to-last paragraph of the Introduction, the authors describe how AlphaFold2 works. They write, “AlphaFold2 primarily requires the amino acid sequence of a protein as its input, but the method utilizes other key elements: in addition to the amino acid sequence, AlphaFold2 can also utilize multiple sequence alignments (MSAs) of similar sequences from different species, templates of related protein structures when available, and/or homologous proteins (Jumper et al., 2021a). Evolutionarily conserved regions over multiple isoforms and species indicated that the sequence is crucial for structural integrity”. The last sentence is confusing; if the authors mean that all information required to fold the protein into its 3D structure is present in its primary sequence, that has been the paradigm. It is unclear from this paragraph what the authors wanted to convey.

      We apologize for any confusion caused by this phrasing. Our intent was not to restate the well-established paradigm that a protein’s primary sequence contains the information needed for its 3D structure, but rather to emphasize how

      AlphaFold2 leverages evolutionary conservation, via multiple sequence alignments (MSAs), to infer structural constraints beyond what a single sequence alone might reveal. Specifically, we aimed to highlight that conserved regions across species and isoforms provide additional context that AlphaFold2 uses to enhance the accuracy of its predictions, complementing the use of templates and homologous structures as described in Jumper et al. (2021). To clarify this, we revised the sentence in the manuscript to read: “AlphaFold2 primarily requires a protein's amino acid sequence as input, but it also leverages other critical data sources. In addition to the sequence, it incorporates multiple sequence alignments (MSAs) of related proteins from different species, available structural templates, and information on homologous proteins. While the primary sequence encodes the 3D structure, AlphaFold2 harnesses evolutionary conservation from MSAs to reveal structural insights that extend beyond what a single sequence can provide.” We thank the reviewer for pointing out this ambiguity.

      (3) In the Results section, the authors state that the predictions generated by their method are evaluated by standard accuracy metrics, please elaborate - what standard metrics were used to judge the predictions and why (some references would be a nice addition). Further, on Page 6, the sentence “There are fewer differences between the open- and closed-state models (Figure S2b, d)” is confusing, fewer differences than what? or there are a few differences between the two states/models? Please clarify.

      The original sentence referring to “standard accuracy metrics” is somewhat misplaced, as our intent was not to apply any conventional “benchmarking” to judge the predictions, but rather to evaluate functional and structural relevance in a physiologically meaningful context. Specifically, we assessed drug binding affinities from molecular docking simulations (in Rosetta Energy Units, R.E.U.) against experimental drug potency data (e.g., IC<sub>50</sub> values converted to free energies in kcal/mol, Figure 7), analyzed differences in interaction networks across states in relation to known mutations affecting hERG inactivation (Figure 4, Table 2), validated ion conduction properties through MD simulations with the applied voltage against expected state-dependent hERG channel behavior (Figure 5), and compared predicted structural models to available experimental cryo-EM structures (Figure 3). We clarified in the text that our assessment emphasized the physiological plausibility of the generated conformations, drawing on evidence from existing computational and experimental studies at each step of the analysis above.

      As for the sentence on page 6, “There are fewer differences between the open- and closed-state models,” we apologize for the ambiguity; we meant that the hydrogen bond networks in the selectivity filter region exhibit fewer differences between the open and closed states compared to the more pronounced variations seen between the open and inactivated states. We revised this sentence to read: “The open- and closed-state models show fewer differences in their selectivity filter hydrogen bond networks compared to those between the open and inactivated states,” to enhance readability.

      (4) In the Discussion, the authors reiterate that this methodology can be extended to sample multiple protein conformations, and their system of choice was hERG potassium channel. I think this methodology can be applied to a system when there is enough knowledge of static structures, and some information on dynamics (through simulations) and mutagenesis analysis available. A well-studied system can benefit from such a protocol to gauge other conformational states.

      We agree that this approach is well-suited to systems with sufficient static structures, dynamic insights from simulations, and mutagenesis data, as seen with the hERG channel. We appreciate the reviewer’s implicit concern about generalizability to less-characterized systems and addressed this in the Discussion as a limitation, noting that the method’s effectiveness may depend on prior knowledge. Future studies can explore whether the advent of AlphaFold3 and other deep learning approaches can enhance its applicability to systems with more limited data. We have added this comment to the Discussion: “…A limitation of our methodology is its reliance on well-characterized systems with ample static structures, molecular dynamics simulation data, and mutagenesis insights, as demonstrated with the hERG channel, which may limit its applicability to less-studied proteins.”

      (5) The Methods section must be broken down into steps to make it easier to follow for the reader (if they want to implement these steps for themselves on their system of choice).

      a. Is possible to share example scripts and code used to piece templates together for AF2. Also, since the AF3 code is now available, the authors may comment on how their protocol can be applicable there or have plans to implement their protocol using AF3 (which is designed to work better for binding small molecules). Please see https://github.com/google-deepmind/alphafold3 for the recently released code for AF3.

      We appreciate the reviewer’s suggestion to improve the Methods section and their comments on scripts and AlphaFold3 (AF3). We revised the Methods to separate it into clear steps (e.g., template preparation, AF2 setup, clustering, and refinement) for better readability and reproducibility, and uploaded the sample scripts along with the instructions to GitHub (https://github.com/k-ngo/AlphaFold_Analysis).

      Regarding AF3’s recent code release, we plan to explore the applicability of our methodology to AF3 in a follow-up study, leveraging its advanced features to refine conformational predictions and state-specific drug docking, and added a brief comment to the Discussion to reflect this future direction: “…Following the recent release of AlphaFold3’s source code, we plan to explore the applicability of our template-guided methodology in a follow-up study, leveraging AF3’s advanced diffusion-based architecture to enhance protein conformational state predictions and state-specific drug docking, particularly given its improved capabilities for modeling small molecule – protein interactions…”

      b. The authors modified the hERG protein by removing a segment, the N-terminal PAS domain (residues M1 - R397) because of graphics card memory limitation. Would the removal of the PAS domain affect the structure and function of the channel protein? HERG and other members of the “eag K<sup>+</sup> channel” family contain a PAS domain on their cytoplasmic N terminus. Removal of this domain alters a physiologically important gating transition in HERG, and the addition of the isolated domain to the cytoplasm of cells expressing truncated HERG reconstitutes wild-type gating. (see https://doi.org/10.1371/journal.pone.0059265). Please elaborate on this.

      We thank the reviewer for raising an important point about the removal of the N-terminal PAS domain and for highlighting its physiological role in hERG channel gating transitions. In our study, unlike experimental settings where PAS removal alters gating, we believe this omission has minimal impact on our key analyses.

      The drug docking procedure focuses on optimizing drug binding poses with minor protein structural refinement around the putative drug binding site, which in our case is the hERG channel pore region, where hERG-blocking drugs predominantly bind. The cytoplasmic PAS domain, located distally from this site, remains outside the protein structure refinement zone during drug docking simulations. However, one aspect we have not yet considered is the potential effect of drug modulation of the hERG channel gating and vice versa particularly given the PAS domain’s role in gating. This interplay could be significant but requires investigation beyond our current drug docking framework. We plan to explore this in future studies using alternative simulation methodologies, such as extended MD simulations or enhanced sampling techniques, to comprehensively capture these dynamic protein - ligand interactions.

      Similarly, in our 1 μs long MD simulations assessing ion conductivity (Figure 4), the timescale is too short for PASmediated gating changes to propagate through the protein and meaningfully influence ion conduction and channel activation dynamics, which occurs on a millisecond time scale (see e.g., DOI: 10.3389/fphys.2018.00207). To fully address this limitation, we plan to explore the inclusion of the PAS domain in a follow-up study with enhanced computational resources, allowing us to investigate its structural and functional contributions more comprehensively.

      (6) The first paragraph of the Methods reads as though AF2 has layers that recycle structures. We doubt that the authors meant it that way. Please update the language to clarify that recycling is an iterative process in which the pairwise representation, MSA, and predicted structures are passed (“recycled”) through the model multiple times to improve predictions.

      We agree that the phrasing might suggest physical layers recycling structures, which was not our intent. Instead, we meant to describe AlphaFold2’s iterative refinement process, where intermediate outputs, such as the pairwise residue representations, multiple sequence alignments (MSAs), and predicted structures, are iteratively passed (or “recycled”) through the model to enhance prediction accuracy. To clarify this, we revised the relevant sentence to read: “A critical feature of AlphaFold2 is its iterative refinement, where pairwise residue representations, MSAs, and initial structural predictions are recycled through the model multiple times, improving accuracy with each iteration.”

      Reviewer #3 (Recommendations for the authors):

      The authors should integrate the very recently published CryoEM experimental data of hERG inhibition by several drugs (Miyashita et al., Structure, 2024; DOI: 10.1016/j.str.2024.08.021).

      We thank the reviewer for the suggestion. Here, we compare drug binding in our open-states (PDB 5VA2-derived and an additional AlphaFold-predicted model from Cluster 3 of inactivated-state-sampling attempt named “AF ic3”) and inactivated-state models, using the cationic forms of astemizole and E-4031, with the corresponding experimental structures (Figure S13). Drug binding in the closed state is excluded as the pore architecture deviates too much from those in the cryo-EM structures. Experimental data (DOI: 10.1124/mol.108.049056) indicate that both astemizole and E4031 bind more potently to the inactivated state of the hERG channel.

      Astemizole (Figure S13a):

      - In the PDB 5VA2-derived open-state model, astemizole binds centrally within the pore cavity, adopting a bent conformation that allows both aromatic ends of the molecule to engage in π–π stacking with the side chains of Y652 from two opposing subunits. Hydrophobic contacts are observed with S649 and F656 residues.

      - In the AF ic3 open-state model, the ligand is stabilized through multiple π–π stacking interactions with Y652 residues from three subunits, forming a tight aromatic cage around its triazine and benzimidazole rings. Hydrophobic interactions are observed with hERG residues T623, S624, Y652, F656, and S660.

      - In the inactivated-state model, astemizole adopts a compact, horizontally oriented pose deeper in the channel pore, forming the most extensive interaction network among all the states. The ligand is tightly stabilized by multiple π–π stacking interactions with Y652 residues across three subunits, and forms hydrogen bonds with residues S624 and Y652. Additional hydrophobic contacts are observed with residues F557, L622, S649, and Y652.

      - Consistent with our findings, electrophysiology study by Saxena et al. identified hERG residues F557 and Y652 as crucial for astemizole binding, as determined through mutagenesis (DOI: 10.1038/srep24182).

      - In the cryo-EM structure (PDB 8ZYO) (DOI: 10.1016/j.str.2024.08.021), astemizole is stabilized by π–π stacking with Y652 residues. However, no hydrogen bonds are detected which may reflect limitations in cryo-EM resolution rather than true absence of contacts. Additional hydrophobic interacts are observed with L622 and G648 residues.

      E-4031 (Figure S13b):

      - In the PDB 5VA2-derived open-state model, E-4031 binds within the central cavity primarily through polar interactions. It forms a π–π stacking interaction with residue Y652, anchoring one end of the molecule. Polar interactions are observed with residues A653 and S660. Additional hydrophobic contacts are observed with residues A652 and Y652.

      - In the AF ic3 open-state model, E-4031 adopts a slightly deeper pose within the central cavity stabilized by dual π–π stacking interactions between its aromatic rings and hERG residue Y652. Additional hydrogen bonds are observed with residues S624 and Y652, and hydrophobic contacts are observed with residues T623 and S624.

      - In the inactivated-state model, E-4031 adopts its deepest and most stabilized binding pose, consistent with its experimentally observed preference for this state. The ligand is stabilized by multiple π–π stacking interactions between its aromatic rings and hERG residues Y652 from opposing subunits. The sulfonamide nitrogen engages in hydrogen bonding with residue S649, while the piperidine nitrogen hydrogen bonds with residue Y652. Hydrophobic contacts with residues S624, Y652, and F656 further reinforce the binding, enclosing the ligand in a densely packed aromatic and polar environment.

      - Previous mutagenesis study showed that mutations involving hERG residues F557, T623, S624, Y652, and F656 affect E-4031 binding (DOI: 10.3390/ph16091204).

      - In the cryo-EM structure (PDB 8ZYP) (DOI: 10.1016/j.str.2024.08.021), E-4031 engages in a single π–π stacking interaction with hERG residue Y652, anchoring one end of the molecule. The remainder of the ligand is stabilized predominantly through hydrophobic contacts involving residues S621, L622, T623, S624, M645, G648, S649, and additional Y652 side chains, forming a largely nonpolar environment around the binding pocket.

      In both cryo-EM structures, astemizole and E-4031 adopt binding poses that closely resembles the inactivated-state model in our docking study, consistent with experimental evidence that these drugs preferentially bind to the inactivated state (DOI: 10.1124/mol.108.049056). This raises the possibility that the cryo-EM structures may capture an inactivatedlike channel state. However, closer examination of the SF reveals that the cryo-EM conformations more closely resemble the open-state PDB 5VA2 structure (DOI: 10.1016/j.cell.2017.03.048), which has been shown to be conductive here and in previous studies (DOI: 10.1073/pnas.1909196117, 10.1161/CIRCRESAHA.119.316404).

      The conformational differences between the cryo-EM and open-state docking results may reflect limitations of the docking protocol itself, as GALigandDock assumes a rigid protein backbone and cannot account for ligand-induced large conformational changes. In our open-state models, the hydrophobic pocket beneath the selectivity filter is too small to accommodate bulky ligands (Figure 3a, b), whereas the cryo-EM structures show a slight outward shift in the S6 helix that expands this space (Figure S13).These allosteric rearrangements, though small, falls outside the scope of the current docking protocol, which lacks flexibility to capture these local, ligand-induced adjustments (DOI: 10.3389/fphar.2024.1411428).

      In contrast, docking to the AlphaFold-predicted inactivated-state model reveals a reorganization beneath the selectivity filter that creates a larger cavity, allowing deeper ligand insertion. Notably, neither our inactivated-state docking nor the available cryo-EM structures show strong interactions with F656 residues. However, in the AlphaFold-predicted inactivated model, the more extensive protrusion of F656 into the central cavity may further occlude the drug’s egress pathway, potentially trapping the ligand more effectively. This could explain why mutation of F656 significantly reduces the binding affinity of E-4031 (DOI: 10.3390/ph16091204). These findings suggest that inactivation may trigger a series of modular structural rearrangements that influence drug access and binding affinity, with different aspects potentially captured in various computational and experimental studies, rather than resulting from a single, uniform conformational change.

      Discussion of the original Wang and Mackinnon finding, DOI: 10.1016/j.cell.2017.03.048 regarding C-inactivation, pore mutation S631A and F627 rearrangement is likely warranted. Since hERG inactivation is present at 0 mV in WT channels (the likely voltage for the CryoEM study) please discuss how this might affect interpretations of starting with this structure as a template for models presented here, perhaps as part of Figure S1.

      We sincerely thank the reviewer for bringing up the insightful findings from Wang and MacKinnon regarding hERG C-type inactivation as well as the voltage context of their cryo-EM structure (PDB 5VA2). We recognize that WT hERG exhibits inactivation at 0 mV, likely the condition of the cryo-EM study, raising the possibility that PDB 5VA2, while classified as an open state, might subtly reflect features of inactivation. Notably, PDB 5VA2 has been widely adopted in numerous studies and consistently found to represent a conducting state, such as in Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) and Miranda et al. (DOI: 10.1073/pnas.1909196117). Our MD simulations further support this, showing K<sup>+</sup> conduction in the 5VA2-based open-state model (Figure 4a, c), consistent with its selectivity filter conformation (Figure S1a). Although we used PDB 5VA2 as a starting template for predicting inactivated and closed states, our AlphaFold2 predictions did not rigidly adhere to this structure, as evidenced by distinct differences in hydrogen bond networks, drug binding affinities, pore radii, and ion conductivity between our state-specific hERG channel models (Figures S2, 5, 3b, 4). Nevertheless, this does not preclude the possibility that PDB 5VA2’s certain potential inactivated-like traits at 0 mV could subtly influence our predictions elsewhere in the model, which warrants further exploration in future studies. In our revised analysis, we also tested an alternative AlphaFold-predicted conformation, referred to as Open (AlphaFold cluster 3), which, while sharing some similarities with PDB 5VA2, exhibits subtle differences in the selectivity filter and pore conformations. This structure was also found to be conducting ions and showed a drug binding profile similar to that of the PDB 5VA2-based open-state model. We greatly appreciate this feedback which helped us refine and strengthen our analysis.

      Page 8, the significance of 750 and 500 mV in terms of physiological role?

      We appreciate this opportunity to clarify the methodological rationale. Although these voltages significantly exceed typical physiological membrane potentials, their use in MD simulations is a well-established practice to accelerate ion conduction events. This approach helps overcome the inherent timescale limitations of conventional MD simulations, as demonstrated in previous studies of hERG and other ion channels. For instance, Miranda et al. (DOI: 10.1073/pnas.1909196117), Lau et al. (DOI: 10.1038/s41467-024-51208-w), Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) applied similarly high voltages (500~750 mV) to study hERG K<sup>+</sup> conduction, which is notably small under physiological conditions at ~2 pS (DOI: 10.1161/01.CIR.94.10.2572), necessitating amplification to observe meaningful permeation within nanosecond-to-microsecond timescales. Likewise, studies of other K<sup>+</sup> ion channels, such as Woltz et al. (DOI: 10.1073/pnas.2318900121) on small-conductance calcium-activated K<sup>+</sup> channel SK2 and Wood et al. (DOI: 10.1021/acs.jpcb.6b12639) on Shaker K<sup>+</sup> channel, have used elevated voltages (250~750 mV) to probe ion conduction mechanisms via MD simulations. In addition, the typical timescale of these simulations (1 μs) is too short to capture major structural effects such as those leading to inactivation or deactivation which occur over milliseconds in physiological conditions.

      The abstract could be edited a bit to more clearly state the novel findings in this study.

      We thank the reviewer for their suggestion. We have revised the abstract to read: “To design safe, selective, and effective new therapies, there must be a deep understanding of the structure and function of the drug target. One of the most difficult problems to solve has been resolution of discrete conformational states of transmembrane ion channel proteins. An example is K<sub>V</sub>11.1 (hERG), comprising the primary cardiac repolarizing current, I<sub>kr</sub>. hERG is a notorious drug antitarget against which all promising drugs are screened to determine potential for arrhythmia. Drug interactions with the hERG inactivated state are linked to elevated arrhythmia risk, and drugs may become trapped during channel closure. While prior studies have applied AlphaFold to predict alternative protein conformations, we show that the inclusion of carefully chosen structural templates can guide these predictions toward distinct functional states. This targeted modeling approach is validated through comparisons with experimental data, including proposed state-dependent structural features, drug interactions from molecular docking, and ion conduction properties from molecular dynamics simulations. Remarkably, AlphaFold not only predicts inactivation mechanisms of the hERG channel that prevent ion conduction but also uncovers novel molecular features explaining enhanced drug binding observed during inactivation, offering a deeper understanding of hERG channel function and pharmacology. Furthermore, leveraging AlphaFold-derived states enhances computational screening by significantly improving agreement with experimental drug affinities, an important advance for hERG as a key drug safety target where traditional single-state models miss critical state-dependent effects. By mapping protein residue interaction networks across closed, open, and inactivated states, we identified critical residues driving state transitions validated by prior mutagenesis studies. This innovative methodology sets a new benchmark for integrating deep learning-based protein structure prediction with experimental validation. It also offers a broadly applicable approach using AlphaFold to predict discrete protein conformations, reconcile disparate data, and uncover novel structure-function relationships, ultimately advancing drug safety screening and enabling the design of safer therapeutics.”

      Many of the Supplemental figures would fit in better in the main text, if possible, in my opinion. For instance, the network analysis (Fig. S2) appears to be novel and is mentioned in the abstract so may fit better in the main text. The discussion section could be focused a bit more, perhaps with headers to highlight the key points.

      Yes, we agree with the reviewer and made the suggested changes. We moved Figure S2 as a new main-text figure.

      Additionally, we revised the Discussion section to improve focus and clarity.

    1. eLife Assessment

      In this important manuscript, the authors reveal novel findings on the role of exosomes in regulating filopodia formation. Filopodia are crucial for various cellular processes, including migration, polarization, directional sensing, and the formation of neuronal synapses. The authors convincingly demonstrate that exosomes, particularly those enriched with the protein THSD7A, play a significant role in promoting filopodia formation in both cancer cells and neurons.

    2. Joint Public Review:

      Summary:

      The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present on filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells and/or primary rat neurons, they find that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is down regulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism.

      Strengths:

      Through proteomic analysis, the authors revealed that endoglin is an important player in the effective trafficking of THSD7A within exosomes. This study offers interesting insights into the dynamic interplay between exosome-mediated protein trafficking and essential cellular processes, emphasizing its significant relevance in both cancer progression and neural function. The authors communicated their findings clearly and effectively.

      (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel.

      (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function.

      (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes.

      (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia.

      (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings.

      Weaknesses:

      While the authors showed the important role of exosomal cargo protein THSD7A in neurons, it will be interesting to conduct any in vivo studies to determine whether THSD7A plays a similar role in promoting filopodia and synapse formation in vivo. Some of the comments of the reviewers were not fully addressed, such as rigorous analysis and quantification through Live-cell imaging through TIRF microscopy tracking labeled THSD7A and filopodia formation, which would provide more clarity in timing and strengthen causality of this relationship. The authors need to consider fully characterizing the role of Cdc42. If the authors would like to fully elaborate on the role of Cdc42 in another manuscript, it is better not to mention at all the role of Cdc42 in filopodia formation in this paper.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study significantly advances our understanding of how exosomes regulate filopodia formation. Filopodia play crucial roles in cell movement, polarization, directional sensing, and neuronal synapse formation. McAtee et al. demonstrated that exosomes, particularly those enriched with the protein THSD7A, play a pivotal role in promoting filopodia formation through Cdc42 in cancer cells and neurons. This discovery unveils a new extracellular mechanism through which cells can control their cytoskeletal dynamics and interaction with their surroundings. The study employs a combination of rescue experiments, live-cell imaging, cell culture, and proteomic analyses to thoroughly investigate the role of exosomes and THSD7A in filopodia formation in cancer cells and neurons. These findings offer valuable insights into fundamental biological processes of cell movement and communication and have potential implications for understanding cancer metastasis and neuronal development.

      Weaknesses:

      The conclusions of this study are in most cases supported by data, but some aspects of data analysis need to be better clarified and elaborated. Some conclusions need to be better stated and according to the data observed.

      We appreciate the reviewer's recognition of the impact of our study. We will address the concerns about data analysis and the statement of our conclusions in our full response to reviewers.

      Reviewer #2 (Public review):

      Summary:

      The authors show that small EVs trigger the formation of filopodia in both cancer cells and neurons. They go on to show that two cargo proteins, endoglin, and THSD7A, are important for this process. This possibly occurs by activating the Rho-family GTPase CDC42.

      Strengths:

      The EV work is quite strong and convincing. The proteomics work is well executed and carefully analyzed. I was particularly impressed with the chick metastasis assay that added strong evidence of in vivo relevance.

      Weaknesses:

      The weakest part of the paper is the Cdc42 work at the end of the paper. It is incomplete and not terribly convincing. This part of the paper needs to be improved significantly

      We appreciate the reviewer's recognition of the impact of our study. Indeed, more work needs to be done to clarify the role of Cdc42 in the induction of filopodia by exosome-associated THSD7A. We anticipate that this will be a separate manuscript, delving in-depth into how exosome-associated THSD7A interacts with recipient cells to activate Cdc42 and carrying out a variety of assays for Cdc42 activation.

      Reviewer #3 (Public review):

      Summary:

      The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present in filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells, and/or primary rat neurons, they found that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is downregulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism.

      Strengths:

      (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel.

      (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function.

      (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes.

      (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia.

      (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings.

      Weaknesses:

      (1) A better characterization of the nature of the small EV population is missing:

      It is unclear why the authors chose to proceed to quantitative mass spectrometry with the bands in the Coomassie from size-separated EV samples, as there are other bands present in the small EV lane but not the large EV lane. This is important to clarify because it underlies how they were able to identify THSD7A as a unique regulator of exosome-mediated filopodia formation. Is there a reason why the total sample fractions were not compared? This would provide valuable information on the nature of the small and large EV populations.

      We would like to clarify that there are two sets of proteomics data in the manuscript. The first was comparing bands from a colloidal Coomassie-stained gel from two samples: small EVs and large EVs from B16F1 cells. In this proteomics experiment, we identified endoglin as present in small EVs, but not large EVs. For this experiment, we only sent four bands from the small EV lane, chosen based on their obvious banding pattern difference on the Coomassie gel.

      In the second proteomics experiment, we used quantitative iTRAQ proteomics to compare small EVs purified from B16F1 control (shScr) and endoglin KD (shEng1 and shEng2) cell lines. In this experiment, we sent total protein extracted from small EV samples for analysis. So, these samples included the entire EV content, not just selected bands from a gel. In this experiment, we identified THSD7A as reduced in the shEng small EVs.

      (2) Data analysis and quantification should be performed with increased rigor:

      a) Figure 1C - The optical and temporal resolution are insufficient to conclusively characterize the association between exosome secretion and filopodia. Specifically, the 10-second interval used in the image acquisitions is too close to the reported 20-second median time between exosome secretion and filopodia formation. Two-5 sec intervals should be used to validate this. It would also be important to correlate the percentage of filopodia events that co-occur with exosome secretion. Is this a phenomenon that occurs with most or only a small number of filopodia? Additionally, resolution with typical confocal microscopy is subpar for these analyses. TIRF microscopy would offer increased resolution to parse out secretion events. As the TIRF objective is listed in the Methods section, figure legends should mention which images were acquired using TIRF microscopy.

      We acknowledge that the frame rate naturally limits our estimates of the timing of filopodia formation after exosome secretion. We set out to show a relationship between exosome secretion and filopodia formation, based on their proximity in timing. While our data set shows a median time interval of 20 seconds, the true median could be between 10-30 seconds, based on our frame rate. Regardless of the exact timing, our data show that exosome secretion is rapidly followed by filopodia formation events.

      To address the question of the percentage of filopodia events that are preceded by exosome secretion, the reviewer is correct in stating that we might need TIRF microscopy and a faster frame rate to observe all the MVB fusion events and get an accurate calculation of this number. The timing of the acquisition was based on the typical timing of filopodia formation, which is slow relative to MVB fusion. Thus, with the current dataset, we could miss secretion events taking place between the 10 second time intervals. Therefore, to address this question, we would need to acquire a new dataset with a much more rapid frame acquisition (multiple frames per second rather than one frame every ten seconds). Regardless, for the secretion events that we visualized with the current dataset, we always observed subsequent filopodia formation.

      No TIRF imaging was used in this manuscript. A TIRF objective was used for selected neuron imaging (see methods); however, it was used for spinning disk confocal microscopy, not for TIRF imaging. This is stated in the methods.

      b) Figure 2 - It would be important to perform further analysis to concretely determine the relationship between exosome secretion and filopodia stability. Are secretion events correlated with the stability of filopodia? Is there a positive feedback loop that causes further filopodia stability and length with increased secretion? Furthermore, is there an association between the proximity of secretion with stability? Quantification of filopodia more objectively (# of filopodia/cell) would be helpful.

      Our data show that manipulation of general exosome secretion, via Hrs knockdown, affects both de novo filopodia formation and filopodia stability (Fig 2g,h). Interestingly, knockdown of endoglin only affects de novo filopodia formation, while filopodia stability is unaffected (Fig 4g,h). These results suggest that filopodia stability is dependent upon exosome cargoes besides endoglin/THSD7A. Such cargoes might include other extracellular matrix molecules, such as fibronectin. We previously showed that exosomes promote nascent cell adhesion and rapid cell migration, through exosome-bound fibronectin (Sung et al., Nature Communications, 6:7164, 2015). We also previously found that inhibition of exosome secretion affects the persistence of invadopodia, which are filopodia-dependent structures (Hoshino et al., Cell Reports, 5:1159-1168, 2013). We agree that this is an interesting research direction, and perhaps future work could focus on exosomal factors that are responsible for filopodia persistence. This would possibly involve more proteomics analysis to identify candidate exosomal cargoes involved in this process.

      With regard to the way we plotted the filopodia data, we plotted the cancer cell data as filopodia per cell area so that it matched the neuron data, which was plotted as filopodia per 100 µm of dendrite distance. Since the neurons cannot be imaged as a whole cell, the quantification is based on the length of the dendrite in the image. We found that graphing the cancer cell data as filopodia per cell gave similar results as filopodia per cell area. To demonstrate that this quantification gives similar results, we have now plotted the filopodia per cell area data from Fig 2 as filopodia per cell and placed these new plots in Supp Fig 2.

      c) Figure 6 - Why use different gel conditions to detect THSD7A in small EVs from B16F1 cells vs HT1080 and neurons? Why are there two bands for THSD7A in panels C and E? It is difficult to appreciate the KD efficiency in E. The absence of a signal for THSD7A in the HT1080 shEng small EVs that show a signal for endoglin is surprising. The authors should provide rigorous quantification of the westerns from several independent experimental repeats.

      Detection of THSD7A via Western blot was, unfortunately, not straightforward and simple. Due to the large size (~260 kDa) of THSD7A, its low level of expression in cancer cells, as well as the inconsistency of commercially available THSD7A antibodies, we had to troubleshoot multiple conditions. We found that it was much easier to detect THSD7A in the human fibrosarcoma cell line HT1080 than in the mouse B16F1 cells, both in the cell lysates and in the small EVs. We were unable to detect THSD7A using the same (reducing) conditions for the mouse melanoma B16F1 samples but were successful using native gel conditions. We also detected THSD7A in rat primary neuron samples. All these samples were from different source organisms (human, mouse, rat) and from either cell lysates or extracellular vesicles, further complicating the analyses. Expression and maturation of THSD7A in these different cell types and compartments could involve different post-translational modifications, such as glycosylation, thus requiring different methods needed to detect THSD7A on Western blots and leading to different banding patterns.

      With regard to the level of knockdown of THSD7A in the Western blot shown in Figure 6E, the normalized level is quantitated below the bands. If you compare that quantitation to the filopodia phenotypes in the same panel, they are quite concordant. Figures 7B and 7C show quantification of triplicate Western blots, highlighting the significant accumulation of THSD7A in shEng cell lysates, as well as significant small EV secretion of THSD7A in control and WT rescued conditions.

      (3) The study lacks data on the cellular distribution of endoglin and THSD7A:

      a) Figure 6 - Is THSD7A expected to be present in the nucleus as shown in panel D (label D is missing in the Figure). It is not clear if this is observed in neurons. a Western of endogenous THSD7A on cell fractions would clarify this. The authors should further characterize the cellular distribution of THSD7A in both cell types. Similarly, the cellular distribution of endoglin in the cancer cells should be provided. This would help validate the proposed model in Figure 8.

      The image in figure 6D shows an HT1080 cell stained with phalloidin-Alexa Fluor 488 to visualize F-actin with or without expression of THSD7A-mScarlet. In order to fully visualize the thin filopodia protrusions, the cellular plane of focus of the images for this panel was purposely taken at the bottom of the cell, where the cell is attached to the coverslip glass. Thus, we interpret the red signal across the cell body as THSD7A-mScarlet expression on the plasma membrane underneath the cell, not in the nucleus. The neuron images only include the dendrite portion of the neurons; therefore, there is no nucleus present in the neuronal images. For the cellular distribution of endoglin, we agree that this is an important future direction to understand how endoglin regulates THSD7A trafficking. We have added the lack of these data to the “Limitations” section at the end of the manuscript.

      b) Figure 7 - Although the western blot provides convincing evidence for the role of endoglin in THSD7A trafficking, the microscopy data lack resolution as well as key analyses. While differences between shSCR and shEng cells are clear visually, the insets appear to be zoomed digitally which decreases resolution and interferes with interpretation. It would be crucial to show the colocalization of endoglin and THSD7A within CD63-postive MVE structures. What are the structures in Figure 7E shSCR zoom1? It would be important to rule out that these are migrasomes using TSPAN4 staining. More information on how the analysis was conducted is needed (i.e. how extracellular areas were chosen and whether the images are representative of the larger population). A widefield image of shSCR and shEng cells and DAPI or HOECHST staining in the higher magnification images should be provided. Additionally, the authors should quantify the colocalization of external CD63 and mScarlet signals from many independently acquired images (as they did for the internal signals in panel F). Is there no external THSD7A signal in the shEng cells?

      The images for Figure 7E were taken with high resolution on a confocal microscope. Insets for Figure 7E were digitally zoomed so that readers could see the tiny structures. Zoom 1 in Figure 7E shows areas of extracellular deposition, whereas Zoom 2 shows THSD7A colocalization with CD63 in MVE. In the extracellular areas (Zoom 1), we observe small punctate depositions that are positive for CD63 and/or THSD7A-mScarlet. Our interpretation of this staining is that the cells are secreting heterogeneous small EVs that are then attached to the glass coverslip. The images and zooms in Fig 7E were chosen to be representative and indeed reveal that there is more extracellular deposition of THSD7A-mScarlet outside the control shScr cells compared to the shEng cells, consistent with more secretion of THSD7A in small EVs from shScr cells when compared to those of shEng cells (Fig 7A,B). However, we did not quantify this difference, as these experiments were conducted with transient transfection of THSD7A-mScarlet, and it is challenging to determine which cell the extracellular THSD7A-mScarlet came from, complicating any quantitative analysis on a per-cell basis.

      Quantification of internal THSD7A localization is much more straightforward in this experimental regime. Indeed, in Figure 7F, we quantitated internal colocalization of THSD7A-mScarlet and CD63, which we obtained by choosing only cells that were visually positive for THSD7A-mScarlet in each transient transfection and omitting all extracellular signals. Quantifying the extracellular colocalization of THSD7A and CD63 could certainly be a future direction for this project and would require establishing cells that stably express THSD7A-mScarlet.

      With regard to whether the extracellular deposits are migrasomes, we have no reason to believe that they would be migrasomes. The preponderance of our evidence points to exosomes as carrying THSD7A and inducing filopodia. Furthermore, CD63 is an exosome marker (Sung et al., Nat Comm, 2020) and does not induce migrasomes, unlike many other tetraspanins (Huang et al., Nat Cell Bio, 2019).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors need to clarify the figure labeling and description and conclusions would be better to be drawn based on the findings. Some figures need to more clear e.g. Figure 1E needs to have information on what are green and red fluorescent proteins. Do all figures in 1A have the same scale bar or different? Figure 3A lacks a scale bar. In Figure 3, the GFP signal is in yellow, does it represent a merge or is it just the GFP alone? Figure 6D is missing a D. Figure 4D needs to be better explained. Additionally, both figures 8B and 8C since represent a model based on all the findings of the study would be better to stand alone as a separate figure from figure 8A.

      The figure legend for figure 1E notes that green corresponds to GFP-Rab27b and the red corresponds to mCherry filler. In addition, the labels are marked to the right of the figure. For Figure 1A, we have now indicated in the legend that all scale bars = 10 µm. In figure 3, neurons were co-transfected with GFP or GFP-Rab27b. Thus, the yellow signal in these images is the merge of the mCherry filler with either GFP (expression throughout the neuron body and dendrites) or GFP-Rab27b (punctate colocalization). We have added a scale bar to Fig 3A. Figure 6D has been corrected, with a “D” label added. Figure 4D shows representative images of cells with filopodia under the various conditions, including add-back of control or endoglin-KD EVs. We have clarified the conditions in the figure legend for 4D. For Figure 8, we have now split it into 2 figures: one with data (Fig 8) and one with the model (Fig 9).

      Reviewer #2 (Recommendations for the authors):

      For the most part, this story is strong and well-presented. The findings are interesting and will significantly advance our understanding of how EVs affect various processes such as cancer metastasis. However, the Cdc42 work is not great. They only indirectly implicate Cdc42 with a somewhat iffy inhibitor (ML141) and a constitutively active form transfected into cells. Both approaches have drawbacks such as off-target effects in the case of the inhibitor and possible cross-talk to other GTPases in the case of the active mutant. The activation of Cdc42 should be demonstrated by an activity assay. Several commercial kits are available. Inhibition of Cdc42 should be tested by knockdown in addition to the inhibitor.

      We appreciate the reviewer’s recognition of our work. To address the limitations of our study, particularly the Cdc42 mechanistic work, we have now added a “Limitations of the study” section at the end of the text. Here, we address our experimental limitations and future directions.

      Reviewer #3 (Recommendations for the authors):

      (1) Since the purified small EVs contain canonical exosomal markers and originate from MVEs, the authors should consider a more consistent use of the term "exosome" to avoid confusion.

      We acknowledge that the usage of both “exosomes” and “small extracellular vesicles” can seem confusing to many readers. Typically in the EV field, we use the term “exosome” when we can reliably determine that the EVs originate from the endocytic pathway. Thus, we use this term when we have specifically perturbed this pathway by targeting Hrs or Rab27. We use the term “small extracellular vesicles” or SEVs when referring to a purified heterogeneous population of SEVs from unknown or a variety of origins. Thus, when referring to vesicles isolated from the conditioned media, we call them SEVs because we cannot determine their origin. Clarification of this terminology has been added to the introduction of the paper.

      (2) 1st results section - expressing mCherry as a "filler" is confusing, clarify that this is meant to identify cellular background.

      This has now been clarified in the paper.

      (3) Figure 3 - Although Rab27a and Rab27b play a role in exosome secretion, Rab27b does not have redundant functions with Rab27a in every cellular context. The authors should mention the specific roles of Rab27a and Rab27b in promoting MVE fusion with the PM and in regulating the anterograde movement of MVEs to the PM, respectively (Ostrowski et al. 2010, Citation 52 in the ms). Although Rab27a is not highly expressed in neurons, it is not currently clear whether Rab27b has a redundant function with Rab27a or whether there is another unknown factor that plays this role. As neurons also do not express endoglin, the mechanisms that mediate how EVs regulate filopodia formation in these cells are most probably different than in cancer cells. This should be highlighted in the discussion.

      We have now added a couple of clarifying sentences about the roles of Rab27a and Rab27b to the results section, including the Ostrowski reference and another reference suggesting possible redundancy of Rab27a and Rab27b. With regard to endoglin not being expressed by neurons, that is one reason why we carried out the proteomics with control and endoglin-KD EVs to find a universal cargo that would directly induce filopodia formation. Indeed, THSD7A seems to be such a universal cargo, expressed in both cancer cell and neuron EVs and inducing filopodia in both cell types. This point, along with the requirement for regulation of THSD7A by other molecules in neurons, is discussed in the results and discussion sections.

      (4) As the authors note, the mechanistic link between endoglin-sorted, exosomal THSD7A and Cdc42-mediated filopodia formation remains unclear. While the findings on Cdc-42 are clear, they are not surprising. What is the role of mDia/ENA/VASP or BAR proteins in this? The authors should also consider an assay to determine whether exosomal THSD7A binds to the PM to cause the signaling or if the cargo is first internalized before performing its function. Since this process is both autocrine and paracrine, the authors could co-culture THSD7A-mScarlet cells with vector control cells and observe how THSD7A-mScarlet is localized in the non-expressing cells.

      As other reviewers also noted, the Cdc42 mechanistic data at the end of the paper has clear limitations that are now addressed within the manuscript in a “Limitations of the Study” section. Here we discuss our experimental troubleshooting and approach to assaying Cdc42 involvement in this process. We acknowledge there are many rigorous experiments that could be pursued in the future to strengthen our mechanism and proposed model.

      We also agree that elucidating how THSD7A specifically interacts with target cells would be very informative and insightful. This would be most effectively assayed using a cell line that is stably expressing THSD7A-mScarlet and could be a future direction of this project. However, it is out of the scope of this current publication.

    1. eLife Assessment

      This is a potentially valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. While the scope of the model is ambitious, its presentation is incomplete and would benefit from substantially more methodological clarity and better biological justification. The work will interest the broad community of researchers studying cortical-hippocampal interactions and sequences.

    2. Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments, both from the rodent and the human literature, such as splitter cells, lap cells, and the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over-/under-representation of context information.

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, and action selection. The model also nicely links ideas from reinforcement learning to neuronally interpretable mechanisms, e.g., learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be majorly improved. Judgment of generality and plausibility of the results is hampered, but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is unclear whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work in the larger field.

      More specifically:

      (1) The methods section is impenetrable. The specific adaptations of the model to the individual use cases of the model, as well as the posthoc analyses of the simulations, did not become clear. Important concepts are only defined in passing and used before they are introduced. The authors may consider a more rigorous mathematical reporting style. They also may consider making the methods part self-contained and moving it in front of the results part.

      (2) The description of results in the main text remains on a very abstract level. The authors may consider showing more simulated neural activity. It remains vague how the different stimuli and contexts are represented in the network. Particularly, the simulations and related statistical analyses underlying the paradigms in Figure 4 are incompletely described.

      (3) The literature review can be improved (laid out in the specific recommendations).

      (4) Given the large range of experimental phenomenology addressed by the manuscript, it would be helpful to add a Discussion paragraph on how much the results from mice and humans can be integrated, particularly regarding the nature of the context selection network.

      (5) As a minor point, the hippocampus is pretty much treated as a premotor network. Also, a Discussion paragraph would be helpful.

    3. Reviewer #1 (Public review):

      Summary:

      The manuscript by Ito and Toyozumi proposes a new model for biologically plausible learning of context-dependent sequence generation, which aims to overcome the predefined contextual time horizon of previous proposals. The model includes two interacting models: an Amari-Hopfield network that infers context based on sensory cues, with new contexts stored whenever sensory predictions (generated by a second hippocampal module) deviate substantially from actual sensory experience, which then leads to hippocampal remapping. The hippocampal predictions themselves are context-dependent and sequential, relying on two functionally distinct neural subpopulations. On top of this state representation, a simple Rescola-Wagner-type rule is used to generate predictions for expected reward and to guide actions. A collection of different Hebbian learning rules at different synaptic subsets of this circuit (some reward-modulated, some purely associative, with occasional additional homeostatic competitive heterosynaptic plasticity) enables this circuit to learn state representations in a set of simple tasks known to elicit context-dependent effects.

      Strengths:

      The idea of developing a circuit-level model of model-based reinforcement learning, even if only for simple scenarios, is definitely of interest to the community. The model is novel and aims to explain a range of context-dependent effects in the remapping of hippocampal activity.

      Weaknesses:

      The link to model-based RL is formally imprecise, and the circuit-level description of the process is too algorithmic (and sometimes discrepant with known properties of hippocampus responses), so the model ends up falling in between in a way that does not fully satisfy either the computational or the biological promise. Some of the problems stem from the lack of detail and biological justification in the writing, but the loose link to biology is likely not fully addressable within the scope of the current results. The attempt at linking poor functioning of the context circuit to disease is particularly tenuous.

    4. Reviewer #3 (Public review):

      Summary:

      This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.

      Strengths:

      Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.

      Weaknesses:

      The paper would be stronger, however, if it were implemented in a more biologically plausible manner - e.g., in continuous rather than discrete time. Additionally, not enough information is provided to properly evaluate the paper, and most of the time, the network is treated as a black box, and we are not shown how the computations are actually being performed.

    5. Author Response:

      We appreciate the reviewers’ thoughtful assessments and constructive feedback on our manuscript. The central goal of our study was to propose a simple and biologically inspired model-based reinforcement learning (MBRL) framework that draws on mechanisms observed in episodic memory systems. Unlike model-free approaches that require processing at each state transition, our model uses sequential activity (= transition model) to predict environmental changes in the long term by leveraging episode-like representations.

      While many prior studies have focused on optimizing task performance in MBRL, our primary aim is to explore how flexible, context-dependent behavior—reminiscent of that observed in biological systems—can be instantiated using simple, neurally plausible mechanisms. In particular, we emphasize the use of an Amari-Hopfield network for the context selection module. This network, governed by Hebbian learning, forms attractors that can correct for sensory noise and facilitate associative recall, allowing dynamic separation of prediction errors due to sensory noise versus those due to contextual mismatches. However, we acknowledge that our explanation of these mechanisms, especially in relation to sensory noise, was not sufficiently developed in the current manuscript. We plan to revise the text to clarify this limitation and to expand on the implications of these mechanisms in the context of psychiatric disorder-like behaviors, as illustrated in Figure 5. Several reviewers raised concerns about the clarity of our model. Our implementation is intentionally algorithmic rather than formal, designed to provide an accessible proof-of-concept model. We will revise the manuscript to better describe the core logic of the model—namely, the bidirectional interaction between the Hopfield network (X) and the hippocampal sequence module (H), where X sends the information on estimated current context to H, and H returns a future prediction based on the episode to X. This interaction forms a loop enabling the current context estimation and its reselection.

      The key advantage of this architecture is its ability to flexibly adjust the temporal span of episodes used for inference and control, providing a potential solution to the challenge of credit assignment over variable time scales in MBRL. Because our model forms and stores the variable length of episodes depending on the context, it can handle both short-horizon and long-horizon tasks simultaneously. Moreover, because each episode is organized by context, reselecting contexts enables rapid switching between these variable timescales. This flexibility addresses a challenge in MBRL—the assignment of credit across variable time scales—without requiring explicit optimization. To better illustrate this important feature, we plan to include additional experiments in the revised manuscript that demonstrate how context-dependent modulation of episode length enhances behavioral flexibility and task performance.

      Finally, we will address the comments on the presentation and the biological grounding of our model. To improve clarity and biological relevance, we will revise the Methods section to explicitly describe how the model is grounded in mechanisms observed in real neural systems. Also, we will clarify which parts of our figures represent computational results versus schematic illustrations and more clearly explain how each model component relates to known neural mechanisms. These revisions aim to improve both clarity and accessibility for a broad audience, while reinforcing the biological relevance of our approach.

      We thank the reviewers again for their insightful comments, which will help us substantially improve the manuscript. We look forward to submitting a revised version that more clearly conveys the contributions and implications of our work.

    1. eLife Assessment

      Building on previous structural studies, this work provides valuable new insights into the architecture of the autophagy initiation complex, comprising ULK1, ATG13, and FIP200. The authors present their findings with solid supporting evidence, making this study a significant contribution to the autophagy field.

    2. Reviewer #1 (Public review):

      In this study, Hama et al. investigated the molecular regulatory mechanisms underlying the formation of the ULK1 complex in mammalian cells. Their results showed that in mammalian cells, ULK1, ATG13, and FIP200 form a complex with a stoichiometry of 1:1:2. These predicted interaction regions were validated through both in vivo and in vitro experiments, providing deeper insight into the molecular basis of ULK1 complex assembly in mammalian cells.

      The revised manuscript has addressed the majority of my concerns, and I have no further questions. Overall, this is a solid and impactful study that significantly advances our understanding of how the ULK1 complex is formed.

    3. Reviewer #2 (Public review):

      Summary:

      This is important work that helps to uncover how the process of autophagy is initiated - via structural analyses of the initiating ULK1 complex. High resolution structural details and a mechanistic insight of this complex have been lacking and understanding how it assembles and functions is a major goal of a field that impacts many aspects of cell and disease biology. While we know components of the ULK1 complex are essential for autophagy, how they physically interact is far from clear. The work presented makes use of AlphaFold2 to structurally predict interaction sites between the different subunits of the ULK1 complex (namely ULK1, ATG13 and FIP200). Importantly, the authors go on to experimentally validate that these predicted sites are critical for complex formation by using site-directed mutagenesis and then go on to show that the three-way interaction between these components is necessary to induce autophagy in cells.

      Strengths:

      The data are very clear. Each binding interface of ATG13 (ATG13 with FIP300/ATG13 with ULK1) is confirmed biochemically with ITC and IP experiments from cells. Likewise, IP experiments with ULK1 and FIP200 also validate interaction domains. A real strength of the work is in the analyses of the consequences of disrupting ATG13's interactions in cells. The authors make CRISPR KI mutations of the binding interface point mutants. This is not a trivial task and is the best approach as everything is monitored under endogenous conditions. Using these cells the authors show that ATG13's ability to interact with both ULK1 and FIP200 is essential for a full autophagy response.

      Weaknesses:

      I think a main weakness here is the failure to acknowledge and compare results with an earlier preprint that shows essentially the same thing (https://doi.org/10.1101/2023.06.01.543278). Arguably, this earlier work is much stronger from a structural point of view as it relies not only on AlphaFold2 but also actual experimental structural determinations (and takes the mechanisms of autophagy activation further by providing evidence for a super complex between the ULK1 and VPS34 complexes). That is not to say that this work is not important, as in the least it independently helps to build a consensus for ULK1 complex structure. Another weakness is that the downstream "functional" consequences of disrupting the ULK1 complex are only minimally addressed. The authors perform a Halotag-LC3 autophagy assay, which essentially monitors the endpoint of the process. There are a lot of steps in between, knowledge of which could help with mechanistic understanding. Not in the least is the kinase activity of ULK1 - how is this altered by disrupting its interactions with ATG13 and/or FIP200?

      Update:

      I feel the authors have addressed my concerns in their revised manuscript

    4. Reviewer #3 (Public review):

      In this study, the authors employed the protein complex structure prediction tool AlphaFold-Multimer to obtain a predicted structure of the protein complex composed of ULK1-ATG13-FIP200 and validated the structure using mutational analysis. This complex plays a central role in the initiation of autophagy in mammals. The results obtained in this study reveal extensive binary interactions between ULK1 and ATG13, between ULK1 and FIP200, and between ATG13 and FIP200, and pinpoint the critical residues at each interaction interface. Mutating these critical residues led to the loss of binary interactions. Interestingly, the authors showed that the ATG13-ULK1 interaction and the ATG13-FIP200 interaction are partially redundant for maintaining the complex. The experimental data presented by the authors are of high quality and convincing. The revised manuscript offers enhanced details about the prediction procedure and results, along with additional experimental findings, significantly increasing the scientific value of this paper.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Hama et al. explored the molecular regulatory mechanisms underlying the formation of the ULK1 complex. By employing the AlphaFold structural prediction tool, they showed notable differences in the complex formation mechanisms between ULK1 in mammalian cells and Atg1 in yeast cells. Their findings revealed that in mammalian cells, ULK1, ATG13, and FIP200 form a complex with a stoichiometry of 1:1:2. These predicted interaction regions were validated through both in vivo and in vitro assays, enhancing our understanding of the molecular mechanisms governing ULK1 complex formation in mammalian cells. Importantly, they identified a direct interaction between ULK1 and FIP200, which is crucial for autophagy. However, some aspects of this manuscript require further clarification, validation, and correction by the authors.

      Thank you for your thorough evaluation of our manuscript. We have carefully revised the manuscript to address your concerns by performing extra experiments and providing additional clarifications, validations, and corrections as written below.

      Reviewer #2 (Public review):

      Summary:

      This is important work that helps to uncover how the process of autophagy is initiated - via structural analyses of the initiating ULK1 complex. High-resolution structural details and a mechanistic insight of this complex have been lacking and understanding how it assembles and functions is a major goal of a field that impacts many aspects of cell and disease biology. While we know components of the ULK1 complex are essential for autophagy, how they physically interact is far from clear. The work presented makes use of AlphaFold2 to structurally predict interaction sites between the different subunits of the ULK1 complex (namely ULK1, ATG13, and FIP200). Importantly, the authors go on to experimentally validate that these predicted sites are critical for complex formation by using site-directed mutagenesis and then go on to show that the three-way interaction between these components is necessary to induce autophagy in cells.

      Strengths:

      The data are very clear. Each binding interface of ATG13 (ATG13 with FIP300/ATG13 with ULK1) is confirmed biochemically with ITC and IP experiments from cells. Likewise, IP experiments with ULK1 and FIP200 also validate interaction domains. A real strength of the work in in their analyses of the consequences of disrupting ATG13's interactions in cells. The authors make CRISPR KI mutations of the binding interface point mutants. This is not a trivial task and is the best approach as everything is monitored under endogenous conditions. Using these cells the authors show that ATG13's ability to interact with both ULK1 and FIP200 is essential for a full autophagy response.

      Thank you for your thoughtful review and for highlighting the importance of our approach.

      Weaknesses:

      I think a main weakness here is the failure to acknowledge and compare results with an earlier preprint that shows essentially the same thing (https://doi.org/10.1101/2023.06.01.543278). Arguably this earlier work is much stronger from a structural point of view as it relies not only on AlphaFold2 but also actual experimental structural determinations (and takes the mechanisms of autophagy activation further by providing evidence for a super complex between the ULK1 and VPS34 complexes). That is not to say that this work is not important, as in the least it independently helps to build a consensus for ULK1 complex structure. Another weakness is that the downstream "functional" consequences of disrupting the ULK1 complex are only minimally addressed. The authors perform a Halotag-LC3 autophagy assay, which essentially monitors the endpoint of the process. There are a lot of steps in between, knowledge of which could help with mechanistic understanding. Not in the least is the kinase activity of ULK1 - how is this altered by disrupting its interactions with ATG13 and/or FIP200?

      Thank you for this valuable feedback. In response, we performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model. We have summarized both the similarities and differences in newly included figures (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text. Furthermore, to address the downstream consequences of ULK1 complex disruption, we have investigated the impact on ULK1 kinase activity, specifically examining how mutations affecting ATG13 or FIP200 interaction alter ULK1’s phosphorylation of a key substrate ATG14. In addition, we analyzed the effect on ATG9 vesicle recruitment. We provide the corresponding data as Figure S3C-E and detailed discussions in the revised manuscript.

      Reviewer #3 (Public review):

      In this study, the authors employed the protein complex structure prediction tool AlphaFold-Multimer to obtain a predicted structure of the protein complex composed of ULK1-ATG13-FIP200 and validated the structure using mutational analysis. This complex plays a central role in the initiation of autophagy in mammals. Previous attempts at resolving its structure have failed to obtain high-resolution structures that can reveal atomic details of the interactions within the complex. The results obtained in this study reveal extensive binary interactions between ULK1 and ATG13, between ULK1 and FIP200, and between ATG13 and FIP200, and pinpoint the critical residues at each interaction interface. Mutating these critical residues led to the loss of binary interactions. Interestingly, the authors showed that the ATG13-ULK1 interaction and the ATG13-FIP200 interaction are partially redundant for maintaining the complex.

      We are grateful for your high evaluation of our work.

      The experimental data presented by the authors are of high quality and convincing. However, given the core importance of the AlphaFold-Multimer prediction for this study, I recommend the authors improve the presentation and documentation related to the prediction, including the following:

      (1) I suggest the authors consider depositing the predicted structure to a database (e.g. ModelArchive) so that it can be accessed by the readers.

      We have deposited the AlphaFold model to ModelArchive with the accession code ma-jz53c, which is indicated in the revised manuscript.

      (2) I suggest the authors provide more details on the prediction, including explaining why they chose to use the 1:1:2 stoichiometry for ULK1-ATG13-FIP200 and whether they have tried other stoichiometries, and explaining why they chose to use the specific fragments of the three proteins and whether they have used other fragments.

      We appreciate your suggestion. As we noted in the original manuscript, previous studies have shown that the C-terminal region of ULK1 and the C-terminal intrinsically disordered region of ATG13 bind to the N-terminal region of the FIP200 homodimer (Alers, Loffler et al., 2011; Ganley, Lam du et al., 2009; Hieke, Loffler et al., 2015; Hosokawa, Hara et al., 2009; Jung, Jun et al., 2009; Papinski and Kraft, 2016; Wallot-Hieke, Verma et al., 2018). We relied on these findings when determining the specific regions to include in our complex prediction and when selecting a 1:1:2 stoichiometry for ULK1–ATG13–FIP200 which was reported previously (Shi et al., 2020). We also used AlphaFold2 to predict the structures of the full-length ULK1–ATG13 complex and the complex of the FIP200N dimer with full-length ATG13, confirming that there were no issues with our choice of regions (revised Figure S1A-C). In the revised manuscript, we have provided a more detailed explanation of our rationale based on the previous reports and additional AlphaFold predictions.

      (3) I suggest the authors present the PAE plot generated by AlphaFold-Multimer in Figure S1. The PAE plot provides valuable information on the prediction.

      We provided the PAE plot in the revised Figure S1C.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1D, the labels for the input and IP of ATG13-FLAG should be corrected to ATG13-FLAG FIP3A.

      We thank the reviewer for pointing out these labeling mistakes. We revised the labels based on the suggestions.

      (2) In the discussion section, the authors should address why ATG13-FLAG ULK1 2A in Fig. 2D leads to a significantly lower expression of ULK1 and provide possible explanations for this observation.

      ATG13 and ATG101, both core components of the ULK1 complex, are known to stabilize each other through their mutual interaction. Loss or reduction of one protein typically leads to the destabilization of the other. In this context, ULK1 is similarly stabilized by binding to ATG13. Therefore, ATG13-FLAG ULK2A mutant, which has reduced binding to ULK1, likely loses this stabilizing activity and ULK1 becomes destabilized, resulting in the lower expression levels of ULK1. We added these discussions in the revised manuscript.

      (3) In Figure 4B, the authors should explain why Atg13-FLAG KI significantly affects the expression of endogenous ULK1. Could Atg13-FLAG KI be interfering with its binding to ULK1? Experimental evidence should be provided to support this. Additionally, does Atg13-FLAG KI affect autophagy? Wild-type HeLa cells should be included as a control in Figure 4C and 4D to address this question.

      Thank you for your constructive suggestion. We found a technical error in the ULK1 blot of Figure 4B. Therefore, we repeated the experiment. The results show that ULK1 expression did not significantly change in the ATG13-FLAG KI. These findings are consistent with Figure S3A. We have replaced Figure 4B with this new data.

      We agree that including wild-type HeLa cells as a control is essential to determine whether ATG13-FLAG KI affects autophagy. We performed the same experiments in wild-type HeLa cells and found that ATG13-FLAG KI does not significantly impact autophagic flux. Accordingly, we have replaced Figures 4D and 4E with these new data.

      (4) In Figure 3C, the authors used an in vitro GST pulldown assay to detect a direct interaction between ULK1 and FIP200, which was also confirmed in Figure 3E. However, since FLAG-ULK1 FIP2A affects its binding with ATG13 (Fig. 3E), it is possible that ULK1 FIP2A inhibits autophagy by disrupting this interaction. The authors should therefore use an in vitro GST pulldown assay to determine whether GST-ULK1 FIP2A affects its binding with ATG13. Additionally, the authors should investigate whether the interaction between ULK1 and FIP200 in cells requires the involvement of ATG13 by using ATG13 knockout cells to confirm if the ULK1-FIP200 interaction is affected in the absence of ATG13.

      Thank you for the valuable suggestion. We examined the effect of the FIP2A mutation on the ULK1–ATG13 interaction using isothermal titration calorimetry (ITC) to obtain quantitative binding data. The results showed that the FIP2A mutation does not markedly alter the affinity between ULK1 and ATG13 (revised Figure S2B), suggesting that FIP2A mainly weakens the ULK1–FIP200 interaction. Regarding experiments in ATG13 knockout cells, ULK1 becomes destabilized in the absence of ATG13, making it technically difficult to assess how the ULK1–FIP200 interaction is affected under those conditions.

      Reviewer #2 (Recommendations for the authors):

      I feel the manuscript would benefit from a more detailed comparison with the Hurely lab paper - are the structural binding interfaces the same, or are there differences?

      We appreciate the suggestion to compare our results more closely with the work from the Hurley lab. We performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text.

      As mentioned, what happens downstream of disrupting the ULK1 complex? How is ULK1 activity changed, both in vitro and in cells? Does disruption of the ULK1 complex binding sites impair VPS34 activity in cells (for example by looking at PtdIns3P levels/staining)?

      Thank you for your insightful comments. We focused on elucidating how disrupting the ULK1 complex leads to impaired autophagy. To assess ULK1 activity, we measured ULK1-dependent phosphorylation of ATG14 at Ser29 (PMID: 27046250; PMID: 27938392). In FIP3A and FU5A knock-in cells, ATG14 phosphorylation was significantly reduced, indicating decreased ULK1 activity (revised Figure S3D, E). This observation is consistent with previous work showing that FIP200 recruits the PI3K complex. Notably, in ATG13 knockout cells, ATG14 phosphorylation became almost undetectable, though the underlying mechanism remains to be fully investigated. Altogether, these data point to reduced ULK1 activity as a key factor explaining the autophagy deficiency observed in FU5A knock-in cells.

      We also explored possible downstream mechanisms. One well-established function of ATG13 is to recruit ATG9 vesicles (PMID: 36791199). These vesicles serve as an upstream platform for the PI3K complex, providing the substrate for phosphoinositide generation (PMID: 38342428). To clarify how our mutations impact this step, we starved ATG13-FLAG knock-in cells and observed ATG9 localization. Unexpectedly, even in FU5A knock-in cells where ATG13 is almost completely dissociated from the ULK1 complex, ATG9A still colocalized with FIP200 (revised Figure S3C). These puncta also overlapped with p62, likely because p62 bodies recruit both FIP200 and ATG9 vesicles. Although we suspect that ATG9 recruitment is nonetheless impaired under these conditions, we were unable to definitively demonstrate this experimentally and consider it an important avenue for future study.

      Reviewer #3 (Recommendations for the authors):

      Here are some additional minor suggestions:

      (1) The UBL domains are only mentioned in the abstract but not anywhere else in the manuscript. I suggest the authors add descriptions related to the UBL domains in the Results section.

      We thank the reviewer for pointing out the lack of description of UBL domains, which we added in Results in the revised manuscript.

      (2) The authors may want to consider adding a diagram in Figure 1A to show the domain organization of the three full-length proteins and the ranges of the three fragments in the predicted structure.

      We have added a proposed diagram as Figure 1A.

      (3) I suggest the authors consider highlighting in Figure 1A the positions of the binding sites shown in Figure 1B, for example, by adding arrows in Figure 1A.

      We have added arrows in the revised Figure 1B (which was Figure 1A in the original submission).

      (4) In Figure 1D, "Atg13-FLAG" should be "Atg13-FLAG FIP3A".

      We have revised the labeling in Figure 1D.

      (5) "the binding of ATG13 and ULK1 to the FIP200 dimer one by one" may need to be re-phrased. "One by one" conveys a meaning of "sequential", which is probably not what the authors meant to say.

      We have revised the sentence as “the binding of one molecule each of ATG13 and ULK1 to the FIP200 dimer”.

      (6) In "Wide interactions were predicted between the four molecules", I suggest changing "wide" to "extensive".

      We have changed “wide” to “extensive” in the revised manuscript.

      (7) In "which revealed that the tandem two microtubule-interacting and transport (MIT) domains in Atg1 bind to the tandem two MIT interacting motifs (MIMs) of ATG13", I suggest changing the two occurrences of "tandem two" to "two tandem" or simply "tandem".

      We simply used "tandem" in the revised manuscript.

    1. eLife Assessment

      This important study investigates how hummingbird hawkmoths integrate stimuli from across their visual field to guide flight behavior. Cue conflict experiments provide solid evidence for an integration hierarchy within the visual field: hawkmoths prioritize the avoidance of dorsal visual stimuli, potentially to avoid crashing into foliage, while they use ventrolateral optic flow to guide flight control. The paper will be of broad interest to enthusiasts of visual neuroscience and flight behavior.

    2. Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field, elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a prioritization for generating behavior that supports hawkmoth safety rather than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

    3. Reviewer #2 (Public review):

      Summary

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight. The authors linked their behavioral results to visual scene statistics in the hawkmoths' natural environment. The partition of ventral and dorsal visuomotor pathways is well in line with differences in visual cue frequencies. The response hierarchy, however, seems to be dominated by dorsal features, that are less frequent, but presumably highly relevant for the animals' flight safety.

      Strengths

      The data are very interesting and unique. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      I find the majority of the data, which are also the data supporting the main claims of the paper, compelling. However, the measurements of flight height are less solid than the rest and I think these data should be interpreted more carefully.

    4. Reviewer #3 (Public review):

      The authors have significantly improved the paper in revising to make its contributions distinct from their prior paper. They have also responded to my concerns about quantification and parameter dependency of the integration conclusion. While I think there is still more that could be done in this capacity, especially in terms of the temporal statistics and quantification of the conflict responses, they have a made a case for the conclusions as stated. The paper still stands as an important paper with solid evidence a bit limited by these concerns.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a priortiziation for generating behavior that supports hawkmoth safety rather than than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

      Weaknesses:

      The work would be further clarified and strengthened by additional explanation included in the main text, figure legends, and methods that would permit the reader to draw their own conclusions more feasibly. It would be helpful to have all figure panels referenced in the text and referenced in order, as they are currently not. In addition, it seems that sometimes the incorrect figure panel is referenced in the text, Figure S2 is mislabeled with D-E instead of A-C and Table S1 is not referenced in the main text at all. Table S1 is extremely important for understanding the figures in the main text and eliminating acronyms here would support reader comprehension, especially as there is no legend provided for Table S1. For example, a reader that does not specialize in vision may not know that OF stands for optic flow. Further detail in figure legends would also support the reader in drawing their own conclusions. For example, dashed red lines in Figures 3 and 4 A and B are not described and the letters representing statistical significance could be further explained either in the figure legend or materials to help the reader draw their own conclusions.

      We appreciate the suggestions to improve the clarity of the manuscript. We have extensively re-structured the entire manuscript. Among others, we have referenced all figure panels in the text in the order they appear. To do so, we combined the optic flow and contrast measurements of our setup with the methods description of the behavioural experiments (formerly Figs. 5 and 2, respectively). This new figure 2 now introduces the methods of the study, while the remainder of Fig. 2, which presented the experiments that investigated the vetrolateral and dorsal response in more detail, is now a separate figure (Fig. 3). This arrangement also balances the amount of information contained  in each figure better.

      Reviewer #2 (Public review):

      Summary:

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight.

      Strengths:

      The data are very interesting, unique, and compelling. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses:

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      We thank the reviewer for the feedback, and the suggestions for improvement of the manuscript (our implementations are detailed below). We fully agree that this study raises several intriguing questions regarding the dorsal visual response, including how the animals perceive and respond to rotational optic flow in their dorsal visual field, particularly since rotational optic flow may be processed separately from translational optic flow.

      In our free-flight setup, it was not possible to generate rotational optic flow in a controlled manner. To explore this aspect more systematically, a tethered-flight setup would be ideal, or alternatively, a free-flight setup integrated with virtual reality. This would be a compelling direction for a follow-up study.

      Reviewer #3 (Public review):

      The central goal of this paper as I understand it is to extract the "integration hierarchy" of stimulus in the dorsal and ventrolateral visual fields. The segregation of these responses is different from what is thought to occur in bees and flies and was established in the authors' prior work. Showing how the stimuli combine and are prioritized goes beyond the authors' prior conclusions that separated the response into two visual regions. The data presented do indeed support the hierarchy reported in Figure 5 and that is a nice summary of the authors' work. The moths respond to combinations of dorsal and lateral cues in a mixed way but also seem to strongly prioritize avoiding dorsal optic flow which the authors interpret as a closed and potentially dangerous ecological context for these animals. The authors use clever combinations of stimuli to put cues into conflict to reveal the response hierarchy.

      My most significant concern is that this hierarchy of stimulus responses might be limited to the specific parameters chosen in this study. Presumably, there are parameters of these stimuli that modulate the response (spatial frequency, different amounts of optic flow, contrast, color, etc). While I agree that the hierarchy in Figure 5 is consistent for the particular stimuli given, this may not extend to other parameter combinations of the same cues. For example, as the contrast of the dorsal stimuli is reduced, the inequality may shift. This does not preclude the authors' conclusions but it does mean that they may not generalize, even within this species. For example, other cue conflict studies have quantified the responses to ranges of the parameters (e.g. frequency) and shown that one cue might be prioritized or up-weighted in one frequency band but not in others. I could imagine ecological signatures of dorsal clutter and translational positioning cues could depend on the dynamic range of the optic flow, or even having spatial-temporal frequency-dependent integration independent of net optic flow.

      We absolutely agree that in principle, an observed integration hierarchy is only valid for the stimuli tested. Yet, we do believe that we provide good evidence that our key observations are robust also for related stimuli to the ones tested:

      Most importantly, we found that both pathways act in parallel (and are not mutually exclusive, or winner-takes-all, for example), when the animals can enact the locomotion induced by the dorsal and ventrolateral pathway. We tested this with the same dorsal cue (the line switching direction), but different behavioural paradigms (centring vs unilateral avoidance), and different ventrolateral stimuli (red gratings of one spatial frequency, and 100% nominal contrast black-and-white checkerboard stimuli which comprised a range of spatial frequencies) – and found the same integration strategy.

      Certainly, if the contrast of the visual cues was reduced to the point that the dorsal or ventrolateral responses became weaker, we would expect this to be visible in the combined responses, with the respective reduction in response strength for either pathway, to the same degree as they would be reduced when stimuli were shown independently in the dorsal and ventrolateral visual field.

      For testing whether the animals would show a weighting of responses when it was not possible to enact locomotion to both pathways, we felt it was important to use similar external stimuli to be able to compare the responses. So we can confidently interpret their responses in terms of integration. Indeed, how this is translated to responses in the two pathways depends a) on the spatiotemporal tuning, contrast sensitivity and exact receptive fields of the two systems, b) the geometry of the setup and stimulus coverage, and therefore the ability of the animals to enact responses to both pathways independently and c) on the integration weights.

      It would indeed be fascinating to obtain this tuning and the receptive fields, and having these, test a large array of combinations of stimuli and presentation geometries, so that one could extract integration weights for different presentation scenarios from the resulting flight responses in a future study.

      We also expanded the respective discussion section to reflect these points: l. 391-417. We also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The second part of this concern is that there seems to be a missed opportunity to quantify the integration, especially when the optic flow magnitude is already calculated. The discussion even highlights that an advantage of the conflict paradigm is that the weights of the integration hierarchy can be compared. But these weights, which I would interpret as stimulus-responses gains, are not reported. What is the ratio of moth response to optic flow in the different regions? When the moth balances responses in the dorsal and ventrolateral region, is it a simple weighted average of the two? When it prioritizes one over the other is the response gain unchanged? This plays into the first concern because such gain responses could strongly depend on the specific stimulus parameters rather than being constant.

      Indeed, we set up stimuli that are comparable, as they are all in the visual domain, and since we can calculate their external optic flow and contrast magnitudes, to control for imbalances in stimulus presentation, which is important for the interpretation of the resulting data.

      As we discussed above, we are confident that we are observing general principles of the integration of the two parallel pathways. However, we refrained from calculating integration weights, because these might be misleading for several reasons:

      (1) In situations where the animals can enact responses to both pathways, we show that they do so at the full original magnitudes. So there are no “weights” of the hierarchy in this case.

      (2) Only when responses to both systems are not possible in parallel, do we see a hierarchy. However, combined with point (1), this hierarchy likely depends on the geometry of the moths’ environment: it will be more pronounced the less both systems can be enacted in parallel.

      (3) The hierarchy also does not affect all features of the dorsal or ventrolateral pathway equally. The hawkmoths still regulate their perpendicular distance to ventral gratings with dorsal gratings present, to same degree as with only ventral grating - because perpendicular distance regulation is not a feature of the dorsal response. And while the hawkmoths show a significant reduction in their position adjustment to dorsal contrast when it is in conflict with lateral gratings (Fig. 4C), they show exactly the same amount of lateral movement and speed adjustment as for dorsal gratings alone, when not combined with lateral ones (Fig. 4D and Fig. S3A). So even for one particular setup geometry and stimulus combination, there clearly is not one integration weight for all features of the responses.

      We extended the discussion section to clarify these points “The benefit of our study system is that the same cues activate different control pathways in different regions of the visual field, so that the resulting behaviour can directly be interpreted in terms of integration weights” (l. 448-451)

      l. 391-417, we also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The authors do explain the choice of specific stimuli in the context of their very nice natural scene analysis in Fig. 1 and there is an excellent discussion of the ecological context for the behaviors. However, I struggled to directly map the results from the natural scenes to the conclusions of the paper. How do they directly inform the methods and conclusions for the laboratory experiments? Most important is the discussion in the middle paragraph of page 12, which suggests a relationship with Figure 1B, but seems provocative but lacking a quantification with respect to the laboratory stimuli.

      We show that contrast cues and translational optic flow are not homogeneously distributed in the natural environments of hawkmoths. This directly related to our laboratory findings, when it comes to responses to these stimuli in different parts of their visual field. In order to interpret the results of these behavioural experiments with respect to the visual stimuli, we did perform measurements of translational optic flow and contrast cues in the laboratory setup. As a result, we make several predictions about the animals’ use of translational optic flow and contrast cues in natural settings:

      a) Hawkmoths in the lab responded strongest to ventral optic flow, even though it was not stronger in magnitude, given our measurements, than lateral optic flow. Thus, we propose that the stronger response to ventral optic flow might be an evolutionary adaptation to the natural distribution of translational optic flow cues.

      b) In the natural habitats of hawkmoths, dorsal coverage is much less frequent that ventrolateral structures generating translational optic flow, yet the hawkmoths responded with a much higher weight to the former. Moreover, in our flight tunnel experiments, the animals responded with the same or higher weights to dorsal cues, which had a lower magnitude of translational optic flow and contrast than the same cues in the ventrolateral visual field. So we showed, combining behavioural experiments and stimulus measurements in the lab that the weighting of dorsal and ventrolateral cues did not follow their stimulus magnitude in the lab. Moreover, comparing to the natural cue distributions, we suggest that the integration weights also did not evolve to match the prevalence of these cues in natural habitats.

      We integrated the measurements of natural visual scene statistics in the new Fig. 6, to relate the behavioural findings to the natural context also in the figure structure, and sequence logic of the text, as they are discussed here.

      The central conclusion of the first section of the results is that there are likely two different pathways mediating the dorsal and the ventrolateral response. This seems reasonable given the data, however, this was also the message that I got from the authors' prior paper (ref 11). There are certainly more comparisons being done here than in that paper and it is perfectly reasonable to reinforce the conclusion from that study but I think what is new about these results needs to be highlighted in this section and differentiated from prior results. Perhaps one way to help would be to be more explicit with the open hypotheses that remain from that prior paper.

      We appreciate the suggestion to highlight more clearly what the open questions that are addressed in this study are. As a result, we have entirely restructured the introduction, added sections to the discussion and fundamentally changed the graphical result summary in Fig. 6, to reflect the following new findings (and differences to the previous paper):

      The previous paper demonstrated that there are two different pathways in hummingbird hawkmoths that mediate visual flight guidance, and newly described one of them, the dorsal response. This established flight guidance in hummingbird hawkmoths as a model for the questions asked in the current study, which are very different in nature from the previous paper.  

      The main question addressed in the current study is how these two flight guidance pathways interact to generate consistent behaviour? Throughout the literature of parallel sensory and motor pathways guiding behaviour, there are different solutions – from winner-takes-all to equal mixed responses. We tested this fundamental question using the hummingbird hawkmoth flight guidance systems as a model.

      This is the main question addressed in the various conflict experiments in this study, and we show that indeed, the two systems operate in parallel. As long as the animals can enact both dorsal and optic-flow responses, they do so at the original strengths of the responses. Only when this is not possible, hierarchies become visible. We carefully measured the optic flow and contrast cues generated by the different stimuli to ensure that the hierarchies we observed were not generated by imbalances of the external stimuli.

      - Does the interaction hierarchy of the two pathways follow the statistics of natural environments?  We did show qualitatively previously how optic flow and contrast cues are distributed across the visual field in natural habitats of the hummingbird hawkmoth. In this study, we quantitatively analysed the natural image data, including a new analysis for the contrast edges, and statistically compared the results across conditions. This quantitative analysis supported the previous qualitative assessment that the prevalence of translational optic flow was highest in the ventral and lowest in the dorsal visual field in all natural habitat types. The distribution of contrast edges across the visual field did depend on habitat type much stronger than visible in the qualitative analysis in the previous paper. When compared to the magnitude of the behavioural responses, and considering that the hummingbird hawkmoth is predominantly found in open and semi-open habitats, the natural distributions of optic flow and contrast edges did not align with the response hierarchy observed in our laboratory experiments. Dorsal cues elicited much stronger responses relative to ventrolateral optic flow responses than would be expected.

      To provide a more complete picture of the dorsal pathway, which will be important to understand its nature, and also compare to other species, we conducted additional experiments that were specifically set up to test for response features known from the translational optic flow response. To compare and contrast the two systems. These experiments here allowed us to show that the dorsal response is not simply a translational optic flow reduction response that creates much stronger output than the ventrolateral optic flow response. We particularly show that the dorsal response was lacking the perpendicular distance regulation of the optic flow response, while it did provide alignment with prominent contrasts (possibly to reduce the perceived translational optic flow), which is not observed in the ventrolateral optic flow response. The strong avoidance of any dorsal contrast cues, not just those inducing translational optic flow, is another feature not found in the ventrolateral pathway.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Many comparisons between visual conditions are made and it was confusing at times to know which conditions the authors were comparing. Thinking of a way to label each condition with a letter or number so that the authors could specify which conditions are specifically being compared would greatly enhance comprehension and readability.

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      Consider adding in descriptive words to the y-axis labels for the position graphs that would help the reader quickly understand what a positive or negative value means with respect to the visual condition.

      We did now change the viewpoint on the example tracks in Figs. 2-5, to take a virtual viewpoint from the top, not as the camera recorded from below, which requires some mental rotation to reconcile the left and right sides. Moreover, we noticed that the example track axes were labelled in mm, while the axes for the plots showing median position in the tunnel were labelled in cm. We reconciled the units as well. This will make it easier to see the direct equivalent of the axis (as well as positive and negative values) in the example tracks in those figures, and the median positions, as well as the cross-index.

      There are no line numbers provided so it is a bit challenging to provide feedback on specific sentences but there are a handful of typos in the manuscript, a few examples:

      (1) Cue conflict section, first paragraph: "When both cues were presented to in combination, ..." (remove to)

      (2) The ecological relevance section, first paragraph, first sentence: "would is not to fly"

      (3) Figure S3 legend: explanation for C is labeled as B and B is not included with A

      We apologise for the missing line numbers. We added these and resolved the issues 1-3.

      Reviewer #2 (Recommendations for the authors):

      - The pictograms in Fig. 1a were at first glance not clear to me, maybe adding l, r, d, v to the first pictogram could make the figure more immediately accessible.

      We added these labels to make it more accessible.

      - I would suggest noting in the main text that the red patterns were chosen for technical reasons (see Methods), if this is correct.

      We added this information and a reference to the methods in the main text (lines 100-102).

      - "Thus, hawkmoths are currently the only insect species for which a partitioning of the visual field has been demonstrated in terms of optic-flow-based flight control [33-35]." I think that is a bit too strong and maybe it would be more interesting to connect the current data to connected data in other insects to perhaps discuss important similarities. Ref 32 for example shows that fruit flies weigh ventral translational optic flow considerably more than dorsal translational optic flow. Reichardt 1983 (Naturwissenschaften) showed that stripe fixation in large flies (a behaviour relying in part on the motion pathway) is confined to the ventral visual field, etc...

      We have changed this sentence to acknowledge partitioning in other insects, and motivating the use of our model species for this study: While fruit flies weight ventral translational optic flow stronger than dorsal optic flow, the most extreme partitioning of the visual field in terms of  optic-flow-based flight control has been observed in hawkmoths [33-35]. (lines 60-62)

      - I think the statistical differences group mean differences could be described in more detail at least in Fig. 2 (to me the description was not immediately clear, in particular with the double letters).

      We added an explanation of the letter nomenclature to all respective figure legends:

      Black letters show statistically significant differences in group means or median, depending on the normality of the test residuals (see Methods, confidence level: 5%). The red letters represent statistically significant differences in group variance from pairwise Brown–Forsythe tests (significance level 5%). Conditions with different letters were significantly different from each other. The white boxplots depict the median and 25% to 75% range, the whiskers represent the data exceeding the box by more than 1.5 interquartile ranges, and the violin plots indicate the distribution of the individual data points shown in black.

      - "When translational optic flow was presented laterally" I would use a more wordy description, since it is the hawkmoth that is controlling the optic flow and in addition to translational optic flow, there might also be rotational components, retinal expansion etc.

      We extended the description to explain that the moths were generating the optic flow percept based on stationary gratings in different orientations, by way of their flight through the tunnel. Lines 127-129

      - While it is clearly stated that the measure of the perpendicular distance from the ventral and dorsal pattern via the size of the insect as seen by the camera is indirect, I would suggest to determine the measurement uncertainty of distance estimate.

      - Connected to above - is the hawkmoth area averaged over the entire flight and is the variance across frames similar in all the stimuli conditions? Is it, in principle, conceivable that the hawkmoths' pitch (up or down) is different across conditions, e.g. with moths rising and falling more frequently in a certain condition, which could influence the area in addition to distance?

      There are a number of sources that generate variance in the distance estimate (which was based on the size of the moth in each video frame, after background subtraction): the size of the animal, the contrast with which the animal was filmed (which also depended on the type of pattern in the tunnel – it was lower with ventral or dorsal patterns as a background than with lateral ones), and the speed of the animal, as motion blur could impact the moth’s image on the video. The latter is hard to calibrate, but the uncertainty related to animal size and pattern types could theoretically be estimated. However, since we moved between finishing the data acquisition for this study and publishing the paper, the original setup has been dismantled. We could attempt to recreate it as faithfully as possible, but would be worried to introduce further noise. We therefore decided to not attempt to characterise the uncertainty, to not give a false impression of quantifiability of this measure. For the purpose of this study, it will have to remain a qualitative, rather than a quantitative measure. If we should use a similar measure again, we will make sure to quantify all sources of uncertainty that we have access to.

      The variance in area is different between conditions. Most likely, the animals vary their flight height different for different dorsal and ventral patterns, as they vary their lateral flight straightness with different lateral visual input. For the reasons mentioned above, we cannot disentangle the effects of variations in flight height and other sources of uncertainty relating to animal size in the video frames. We therefore averaged the extracted area across the entire flight, to obtain a coarse measure of their flight height. Future studies focusing specifically on the vertical component or filming in 3D will be required to determine the exact amount of vertical flight variation.

      - Results second paragraph, suggestion: pattern wavelength or spatial frequency instead of spatial resolution.

      - Same paragraph, suggestion: For an optimal wavelength/spatial frequency of XX

      We corrected these to spatial frequency.

      - Above Fig 3- "this strongly suggests a different visual pathway". In my opinion it would be better to say sensory-motor /visuomotor pathway or to more clearly define visual pathway? Could one in principle imagine a uniform set of local motion sensitive neurons across the entire visual field that connect differentially to descending/motor neurons.

      We appreciate this point and changed this, and further instances in the manuscript to visuomotor pathway.

      - If I understood correctly, you calculated the magnitude of optic flow in the different tunnel conditions based on the image of a fisheye camera moving centrally in the tunnel, equidistant from all walls. I did not understand why the magnitude of optic flow should differ between the four quadrants showing the same squarewave patterns. Apologies if I missed something, but maybe it is worth explaining this in more detail in the manuscript.

      We recognize that this point may not have been immediately clear and have therefore provided additional clarification in the Methods and results section (lines 106-111, 543-549). We anticipated differences in the magnitude of optic flow due to potential contrast variations arising from the way the stimuli were generated—being mounted on the inner surfaces of different tunnel walls while the light source was positioned above. On the dorsal wall, light from the overhead lamps passed through the red material. For laterally mounted patterns, the animals perceived mainly reflected light, as these tunnel walls were not transparent.

      A similar principle applied to the background, which consisted of a white diffuser allowing light to pass through dorsally, but white non-transmissive paper laterally, with a 5% contrast random checkerboard patterns. The ventral side presented a more complex scenario, as it needed to be partially transparent for the ventrally mounted camera. Consequently, the animals perceived a combination of light reflections from the red patterns and the white gauze covering the ventral tunnel side, against the much darker background of the surrounding room.

      To ensure that the observed flight responses were not artifacts of deviations in visual stimulation from an ideal homogeneous environment, we used the camera to quantify the magnitude of optic flow and contrast patterns under these real experimental conditions. This approach also allowed us to directly relate the optic flow measurements taken indoors to those recorded outdoors, as we employed the same camera and analytical procedures for both datasets.

      Reviewer #3 (Recommendations for the authors):

      In addition to the considerations above I had a few minor points:

      There are so many different directions of stimuli and response that it is quite challenging to parse the results. Can this be made a little easier for the reader?

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      One suggestion (only a suggestion): I found myself continuously rotating the violin plots in my head so that the lateral position axis lined up with the lateral position of the tunnel icons below. Consider if rotating the plots 90 degs would help interpretability. It was challenging to keep track of which side was side.

      We did discuss this with a number of test-readers, and tried multiple configurations. They all have advantages and drawbacks, but we decided that the current configuration for the majority of testers was the current one. To help the mental transformations from the example flight tracks in the figures, we now present the example flight tracks in Figs. 2-5 in the same reference frame as the figures showing median position (so positive and negative values on those axes correspond directly), and changed the view from a below the tunnel to an above the tunnel view, as this is the more typical depiction. We hope that this enhances readability.

      Are height measurements sensitive to the roll and pitch of the animal? I suspect this is likely small but worth acknowledging.

      They are indeed. These effects are likely small but contribute to the overall inaccuracy, which we could not quantify in this particular setup (see also response to reviewer 2 on that point), which is why the height measurements have to be considered a qualitative approximation rather than a quantification of flight height. We added text to acknowledge the effects of roll and pitch specifically (lines 657-658)

      The Brown-Forsythe test was reported as paired but this seems odd because the same moths were not used in each condition. Maybe the authors meant something different by "paired" than a paired statistical design?

      Indeed, the data was not paired in the sense that we could attribute individual datapoints to individual moths across conditions. We applied the Brown-Forsythe test in a pairwise manner, comparing the variance of each condition with another one in pairs each, to test if the variance in position differed across conditions. We did phrase this misleadingly, and have corrected it to „The variance in the median lateral position (in other words, the spread of the median flight position) was statistically compared between the groups using the pairwise Brown–Forsythe tests“ l. 187-188

      There is some concern about individual moth preferences and bias due to repeated measures. I appreciate that the individual moth's identity was not likely known in most cases, but can the authors provide an approximate breakdown of how many individual moths provided the N sample trajectories?

      This is a very valid concern, and indeed one we did investigate in a previous study with this setup. We confirmed that the majority of animals (70%, 68% and 53% out of 40 hawkmoths, measured on three consecutive days) crossed the tunnel within a randomly picked window of 3h (Stöckl et al. 2019). We now state this explicitly in the methods section (lines 594-597). Thus, for the sample sizes in our study, statistically, each moth would have contributed a small number of tracks compared to the overall number of tracks sampled.

      The statistics section of the methods said that both Tukey-Kramer (post-hoc corrected means) and Kruskal-Wallis (non-parametric medians) were done. It is sometimes not clear which test was done for which figure, and where the Kruskal-Wallis test was done there does not seem to be a corrected statistical significance threshold for the many multiple comparisons (Fig. 2). It is quite possible I am just missing the details and they need to be clarified. I think there also needs to be a correction for the Brown-Forsythe tests but I don't know this method well.

      We first performed an ANOVA, and if the test residuals were not normally distributed, we used a Kruskal-Wallis test instead. For the post-hoc tests of both we used Tukey-Kramer to correct for multiple comparisons. The figure legends did indeed miss this information. We added it to clarify our statistical analysis strategy and refer to the methods section for more details (i.e. l. 185-186). All statistical results, including the type of statistical test used, have been uploaded to the data repository as well.

      The connection to stimulus reliability in the discussion seems to conflate reliability with prevalence or magnitude.

      We have rephrased the respective discussion sections to clearly separate the prevalence and magnitude of stimuli, which was measured, from an implied or hypothesized reliability (lines 510-511).

      Line numbers would be helpful for future review.

      We apologize for missing the line numbers and have added them to the revised manuscript.

    1. eLife Assessment

      Using a unique cerebellar disruption approach in non-human primates, this study provides valuable new insight into how cerebellar inputs to the motor cortex contribute to reaching. The findings convincingly demonstrate that reaching movements following cerebellar disruption slow down because of both an acute deficit in producing muscle activity as well as a progressive decline in compensating for limb dynamics. This work will be of interest to neuroscientists and clinicians interested in cerebellar function and pathology.

    2. Reviewer #1 (Public review):

      Summary:

      In a previous work Prut and colleagues had shown that during reaching, high frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report they extend their previous work by addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joint. More interestingly, the experiment revealed evidence for decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      None

    3. Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center out reaching movements and has been published from this laboratory in several preceding studies. I found the take-home-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, data were clear, convincing and novel. The key strengths are differentiating acute from sub-acute (within session but not immediate) kinematic consequences of cerebellar block.

    4. Reviewer #3 (Public review):

      Summary:

      In their revised manuscript, Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement related phenotypes in patients with cerebellar lesion or injury, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they find a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.

      Strengths:

      Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.

      The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption in the monkey, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).

      In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.

      Remaining comments:

      The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on joint torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas. While this experimental design was not implemented here, it seems like a good opportunity for future work using these approaches.

      The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. While it is still not entirely clear why disruption of movement during the adaptive phase is not seen for inward targets, despite the fact that many of the inward movements also exhibit large interaction torques, the authors do raise potential explanations in the Discussion.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      In a previous work Prut and colleagues had shown that during reaching, high frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report they extend their previous work by addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joint. More interestingly, the experiment revealed evidence for decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      None

      Reviewer #1 (Recommendations for the authors):

      The authors have answered my questions adequately and I have no further comments.

      Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center out reaching movements and has been published from this laboratory in several preceding studies. I found the takehome-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, data were clear, convincing and novel. The key strengths are differentiating acute from subacute (within session but not immediate) kinematic consequences of cerebellar block.

      Reviewer #2 (Recommendations for the authors):

      I think the manuscript is good as is. That said, it would have been nice to see more of the behavioral outcomes in Figure 5 (e.g. decomposition and trajectory variability) analyzed longitudinally like the velocity measurements in Fig. 4. This would clearly strengthen the insight into acute and compensatory components of cerebellar motor deficits.

      The two behavioral measures of motor noise used in our study are movement decomposition and trajectory variability (Figure 5). Since trajectory variability is measured across trials we could not analyze this measure longitudinally as a function of trial number. However, following the reviewer’s advice, we examined movement

      decomposition for successive trials in control vs. cerebellar block for movements to targets 2-4 similar to the analysis of  hand velocity in figure 4. We found no interaction effect between trial sequence x cerebellar block on movement decomposition. This result is consistent with our conclusion that noisy joint activation occurs independently of adaptive slowing of multi-joint movements. We have updated our main text (lines 293-299) and supplementary information (supplementary figure S5 and supplementary table S8) to include this result.  

      Reviewer #3 (Public review):

      Summary:

      In their revised manuscript, Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement related phenotypes in patients with cerebellar lesion or injury, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they find a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.

      Strengths:

      Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.

      The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption in the monkey, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).

      In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.

      In this revised version of the manuscript, the authors have provided additional analyses and clarification that address several of the comments from the original submission.

      Remaining comments:

      The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on joint torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas. While this experimental design was not implemented here, it seems like a good opportunity for future work using these approaches.

      We agree with the reviewer that examining the effect of the cerebellar block on immediate post-block washout trials in future studies will be insightful.    

      The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. While it is still not entirely clear why disruption of movement during the adaptive phase is not seen for inward targets, despite the fact that many of the inward movements also exhibit large interaction torques, the authors do raise potential explanations in the Discussion.

      The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study. In the revised manuscript, the authors do provide additional anatomical and evolutionary context and discuss potential limitations in the selectivity of HFS in the Materials and Methods. However, I feel that at least a brief mention of these caveats in the Introduction, where it is stated, "we then reversibly blocked cerebellar output to the motor cortex", would benefit the reader.

      Following the advice of the reviewer, we have now revised the introduction section of our manuscript in the following way (lines 61-67):

      “…We then reversibly disrupted cerebellar communication with other neural structures using high-frequency stimulation (HFS) of the superior cerebellar peduncle, assessing the impact of this perturbation on subsequent movements. Although our approach primarily affects cerebellar output to the motor cortex, it also disrupts fibers carrying input signals (e.g., spinocerebellar) and pathways to various subcortical targets (e.g., cerebellorubrospinal). Thus, our manipulation broadly interferes with cerebellar communication…”

      Reviewer #3 (Recommendations for the authors):

      Typo on line 102; "subs-sessions"

      We have corrected this typographical error in our revised manuscript (line 106).

    1. eLife Assessment

      The work presents a valuable extension of qFit-ligand, a computational method for modeling conformational heterogeneity of ligands in X-ray crystallography and cryo-EM density maps. The authors provide solid evidence of improved capabilities through careful validation against the previous version, particularly in expanding ligand sampling within conformational space. Such improvements suggest practical utility for challenging applications, including macrocyclic compound modeling and crystallographic drug fragment screening.

    2. Reviewer #1 (Public review):

      Summary:

      Flowers et al describe an improved version of qFit-ligand, an extension of qFit. qFit and qFit-ligand seek to model conformational heterogeneity of proteins and ligands, respectively, cryo-EM and X-ray (electron) density maps using multiconformer models-essentially extensions of the traditional alternate conformer approach in which substantial parts of the protein or ligand are kept in place. By contrast, ensemble approaches represent conformational heterogeneity through a superposition of independent molecular conformations.

      The authors provide a clear and systematic description of the improvements made to the code, most notably the implementation of a different conformer generator algorithm centered around RDKit. This approach yields modest improvements in the strain of the proposed conformers (meaning that more physically reasonable conformations are generated than with the "old" qFit-ligand) and real space correlation of the model with the experimental electron density maps, indicating that the generated conformers also better explain the experimental data then before. In addition, the authors expand the scope of ligands that can be treated, most notably allowing for multi conformer modeling of macrocyclic compounds.

      Strengths:

      The manuscript is well written, provides a thorough analysis, and represents a needed improvement of our collective ability to model small-molecule binding to macromolecules based on cryo-EM and X-ray crystallography, and can therefore has a positive impact on both drug discovery and general biological research.

      Weaknesses:

      Weaknesses were addressed during review. Overall, the demonstrated performance gains are modest.

      Specific comments:

      (1) The accuracy of initial placement may be critical. At the same time, in my experience ambiguous cases are quite common, for example with flat ligands with a few substituents sticking out or with ligands with highly mobile tails. There remain some questions regarding sensitivity to initial ligand placement, which individual users should check for.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript by Flowers et al. aimed to enhance the accuracy of automated ligand model building by refining the qFit-ligand algorithm. Recognizing that ligands can exhibit conformational flexibility even when bound to receptors, the authors developed a bioinformatic pipeline to model alternate ligand conformations while improving fitting and more energetically favorable conformations.

      Strengths:

      The authors present a computational pipeline designed to automatically model and fit ligands into electron density maps, identifying potential alternative conformations within the structures.

      Weaknesses:

      Ligand modeling, particularly in cases of poorly defined electron density, remains a challenging task. The procedure presented in this manuscript exhibits limitations in low-resolution electron density maps (lower than 2.0 Å) and low-occupancy scenarios. Considering that the maps used to establish the operational bounds of qFit-ligand were synthetically generated, it's likely that the resolution cutoff will be even stricter when applied to real-world data.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Flowers et al describe an improved version of qFit-ligand, an extension of qFit. qFit and qFit-ligand seek to model conformational heterogeneity of proteins and ligands, respectively, cryo-EM and X-ray (electron) density maps using multi-conformer models - essentially extensions of the traditional alternate conformer approach in which substantial parts of the protein or ligand are kept in place. By contrast, ensemble approaches represent conformational heterogeneity through a superposition of independent molecular conformations.

      The authors provide a clear and systematic description of the improvements made to the code, most notably the implementation of a different conformer generator algorithm centered around RDKit. This approach yields modest improvements in the strain of the proposed conformers (meaning that more physically reasonable conformations are generated than with the "old" qFit-ligand) and real space correlation of the model with the experimental electron density maps, indicating that the generated conformers also better explain the experimental data than before. In addition, the authors expand the scope of ligands that can be treated, most notably allowing for multi-conformer modeling of macrocyclic compounds.

      Strengths:

      The manuscript is well written, provides a thorough analysis, and represents a needed improvement of our collective ability to model small-molecule binding to macromolecules based on cryo-EM and X-ray crystallography, and can therefore have a positive impact on both drug discovery and general biological research.

      Weaknesses:

      There are several points where the manuscript needs clarification in order to better understand the merits of the described work. Overall the demonstrated performance gains are modest (although the theoretical ceiling on gains in model fit and strain energy are not clear!).

      We thank the reviewer for their thoughtful review. To address comments, we have added clarifying statements and discussion points around the extent of performance gains, our choice of benchmarking metrics, and the “standards” in the field for significance. We expanded our analysis to highlight how to use qFit ligand in “discovery” mode, which is aimed at supporting individual modeling efforts. As we now write in the discussion:

      “It is advisable to employ qFit-ligand selectively, focusing on cases with a moderate correlation between your input model and the experimental data, strong visual density in the binding pocket, high map resolution, or when your single-conformer ligand model is strained.”

      Additionally, we note in the discussion:

      “qFit-ligand primarily serves as a “thought partner” for manual modeling. Modelers still must resolve many ambiguities, including initial ligand placement, to fully take advantage of qFit capabilities. In active modeling workflows or large scale analyses, the workflow would only accept the output of qFit-ligand when it improves model quality. In cases where qFit-ligand degrades map-to-model fit and/or strain, we can simply revert to the input model. In practice, users can easily remove poorly fitting conformations using molecular modeling software such as COOT, while keeping the well modeled conformations, which is an advantage of the multiconformer approach over ensemble refinement methods.”

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Flowers et al. aimed to enhance the accuracy of automated ligand model building by refining the qFit-ligand algorithm. Recognizing that ligands can exhibit conformational flexibility even when bound to receptors, the authors developed a bioinformatic pipeline to model alternate ligand conformations while improving fitting and more energetically favorable conformations.

      Strengths:

      The authors present a computational pipeline designed to automatically model and fit ligands into electron density maps, identifying potential alternative conformations within the structures.

      Weaknesses:

      Ligand modeling, particularly in cases of poorly defined electron density, remains a challenging task. The procedure presented in this manuscript exhibits clear limitations in low-resolution electron density maps (resolution > 2.0 Å) and low-occupancy scenarios, significantly restricting its applicability. Considering that the maps used to establish the operational bounds of qFit-ligand were synthetically generated, it's likely that the resolution cutoff will be even stricter when applied to real-world data.

      We thank Reviewer #2 for their comments on the role of conformational flexibility and how our tool addresses the complexity involved in modeling alternative conformations. We agree that there are limitations at low resolution, limiting the application of our algorithm. That is the case with all structural biology tools. Automatically finding alternative conformations of ligands in high-resolution structures is an enhancement to the toolbox of ligand fitting. Expanding the algorithm to work with fragment screening data is important in this realm, as almost all of this data fits in the high-resolution range where qFit-ligand works best.

      The reported changes in real-space correlation coefficients (RSCC) are not substantial, especially considering a cutoff of 0.1. Furthermore, the significance of improvements in the strain metric remains unclear. A comprehensive analysis of the distribution of this metric across the Protein Data Bank (PDB) would provide valuable insights.

      We agree that the changes are small, partially because the baseline (manually modeled ligands) is very high. To provide additional evidence, we added evaluations using EDIAm, which is a more sensitive metric. In Figure 2 (page 10), representing the development dataset, we see more improvements above 0.1. With this being said, it is unclear what constitutes a ‘substantial’ improvement for either of these metrics, especially considering alternative conformations may only change the coordinates of a subset of ligands, just slightly improving the fit to density.

      We agree that looking across the PDB on strain would provide valuable insight. To explore this, we looked to see how qFit-ligand could improve the fitting of deposited ligands with high strain (see section: Evaluating qFit-ligand on a set of structures known to be highly strained, Page 15). While only a subset of these structures had alternative conformers placed (24.6%), we observed that in this subset, the ligands often improved the RSCC and strain. This figure also demonstrates that while RSCC may not change much numerically, the alternative conformers explain previously unexplained density with lower energy conformers than what is currently deposited.

      To mitigate the risk of introducing bias by avoiding real strained ligand conformations, the authors should demonstrate the effectiveness of the new procedure by testing it on known examples of strained ligand-substrate complexes.

      See above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A - Specific comments:

      (1) It appears necessary to provide qFit-ligand with an initial model with the ligand already placed. This is not clear from the start of the introduction on page 3. It appears that ligand position is only weakly adjusted fairly late in the process, in step F of Figure 1. It seems, therefore, that the accuracy of initial placement is rather critical (see the example discussed on page 21). At the same time, in my experience, ambiguous cases are quite common, for example with flat ligands with a few substituents sticking out or with ligands with highly mobile tails. It would be helpful for the authors to comment on the sensitivity to initial ligand placement, either in the discussion or, better yet, in the form of an analysis in which the starting model position is randomly perturbed.

      In our revised version, we have modified the introduction to clarify the necessity of including an initial ligand model (page 4).

      “The qFit-ligand algorithm takes as input a crystal or cryo-EM structure of an initial protein-ligand complex with a single conformer ligand in PDBx/mmCIF format, a density map or structure factors (encoded by a ccp4 formatted map or an MTZ), and a SMILES string for the ligand.”

      We also describe our sampling algorithm more clearly (see: Biasing Conformer Generation, page 6). Steps A-E generate many conformations (using RDKit), which are then selected/fit into experimental density (using quadratic programming). To help with additional shifting issues in the input ligand, after the first selection, we do additional rotation/translation of the generated conformers that are kept. We then do another round of fitting to the density (quadratic programming followed by mixed integer quadratic programming).

      Given this sampling, we have not elected to do an additional computational experiment to test the “radius of convergence” or dependence on initial conditions. However, we outline the fundamental procedure here so that someone can build on the work and test the idea:

      - Create single conformer models as we currently do

      - randomly perturb the coordinates of the ligand by 0.1-0.3Å

      - refine to convergence, creating a series of “perturbed, modified true positives” for each dataset

      - Run qFit ligand

      - Evaluate the variability in the resulting multi-conformer models

      (2) Top of page 6 ("Biasing Conformer Generation"): the authors say "as we only want to generate ligands that physically fit within the protein binding pocket, we bias conformation generation towards structures more likely to fit well within the receptor's binding site". Apart from the odd redundancy of this sentence, I am confused: at the stage that seems to be referred to here (A-C in Figure 1) is the fit to the electron density already taken into account, or does this only happen later (after step E)?

      Thank you for pointing this out. We have edited the statement to clarify it:

      “To guide the conformation generation from the Chem.rdDistGeom based on the ligand type and protein pocket, we developed a suite of specialized sampling functions to bias the conformational search towards structures more likely to fit well into the receptor’s binding site.”

      We do not consider the electron density during conformer generation (only selection from the generated conformers). The sampling is additionally biased by the type of ligand and the size of the binding pocket.

      (3) qFit-ligand appears to be quite slow. Are there prospects for speedup? Can the code take advantage of GPUs or multi-CPU environments?

      We agree with this. We have made some algorithmic improvements, most notably removing duplicate conformers based on root mean squared distance. This, along with parallelization, decreased the average runtime from ~19 minutes to ~8 minutes (see additional details: qFit-ligand runtime, page 8). We do not currently take advantage of GPU specific code.

      (4) Section: Detection of experimental true positive multi-conformer ligands:

      a) Why are carbohydrate ligands excluded? This seems like an important class of ligands that one would like qFit to be able to treat! Which brings me to a related question: can covalently attached groups (e.g., glycosylation sites!) be modeled using qFit-ligand, or is qFit-ligand restricted to non-covalently bound groups?

      Currently, qFit-ligand does not support covalently bound ligands, but this is an area of interest we are hoping to expand into. In the revised version, we added the non-covalently attached carbohydrates back into the true positive dataset. In Figure 4 (page 14), we show that qFit-ligand is able to improve fit to the experimental density in around 80% of structures, while also often reducing torsion strain (see additional details: qFit-ligand applied to unbiased dataset of experimental true positives, page 14).

      b) "as well as 758 cases where the ligand model's deposited alternate conformations (altlocs) were not bound in the same chain and residue number" - I do not understand what this means, or why it leads to the exclusion of so many structures. Likewise, a number of additional exclusions are described in Figure S3. Some more background on why these all happened would be helpful. Are you just left with the "easy" cases?

      Sometimes modelers will list the multiple conformations of a bound ligand as a separate residue within the PDB file, rather than as a single multiconformer model. For example, rather than writing a multiconformer LIG bound at A, 201 with altlocs ‘A’ and ‘B’, a modeler might write this instead as LIG, A, 201 and LIG A, 301. We initially excluded these kinds of structures. However, we agree that this choice resulted in the removal of many potentially valid true positives. We have since updated our data processing pipeline to include these cases, and they are examined in the updated manuscript.

      c) I do not follow the argument made at the end of this section (last two paragraphs on page 9): "when using a single average conformation to describe density from multiple conformations, the true low-energy states may be ignored". I get that, but the conformations in the "modified true positives" dataset derive directly from models in which two conformations were modeled, so this cannot be the explanation for why qFit-ligand models result in somewhat lower average strain. It would seem that the paper could be served by providing examples where single conformations were modeled in deposited structures, but qFit detects multiple conformations.

      We agree with this comment that the strain obtained from the modified true positives is likely higher than the deposited models. However, the modified structure is refined with a single conformation, and therefore changed from the deposited “A” conformation. Thus, the reduced strain observed in our qFit-ligand models relative to the modified true positives is not unexpected.

      To expand our dataset, we also looked at deposited structures with high strain, all of which were modeled as single conformers. Here, we saw a decrease in strain when alternative conformers were placed (see section: Evaluating qFit-ligand on a set of structures known to be highly strained, page 15). Further, we provide an example from the XGen macrocycle dataset where a ligand initially modeled as a single conformer exhibited relatively high strain. After qFit‐ligand modeled a second conformation, the overall strain was reduced (Figure 6C, page 19; Figure 6—figure supplement 1C, page 59).

      (5) Section: qFit-ligand applied to an unbiased dataset of experimental true positives Bottom of page 14: The paragraph starting with "qFit-ligand shows particular strength in scenarios with strong evidence..." is enigmatic: there's no illustration (unless it directly relates to the findings in Figure 4, in which case this should be more explicit). Since this points out when the reader will and will not benefit from using qFit-ligand, it should be clear what the authors are talking about.

      This claim considers all the evidence presented in the manuscript, not necessarily one particular aspect of it. We advise using qFit-ligand when there is a moderate correlation between the input model and the experimental data, strong visual density in the binding pocket, high map resolution, and/or when your single conformer ligand model is strained. We have made all of these points clearer in the updated manuscript.

      B  - Section: qFit-ligand can automatically detect and model multiple conformations of macrocycles:

      This is an exciting extension of qFit-ligand, but some aspects of the analysis strike me as worrisome. Of the initial dataset of 150 structures, fewer than half make it all the way through analysis. It's hard to believe that this is a fully representative subset. Why, for example, could 29 structures not be refined against the deposited structure factors? Why does strain calculation (in RDKit?) fail on 30 ligands? What about the other 18 cases--why did these fail (in PHENIX?).

      We agree that this is a striking number of failures, however, we note that they are not specific shortcomings of qFit-ligand (in fact, most are because standard structural biology and/or cheminformatics software fail on many PDB depositions). Therefore, these failures reflect broader limitations in standard bioinformatics and refinement restraint files when handling macrocycles. The strain calculator we used was not built for macrocycles, and after consulting with many experts in the field, the consensus was that no method works well with macrocycles. We discuss these issues in additional detail in the discussion (page 27):

      “Additionally, our algorithm’s placement within the larger refinement and ligand modeling ecosystem highlighted other areas that need improvement. We note that macrocycles, due to their complicated and interconnected degrees of freedom, suffer acutely from the refinement issues, as demonstrated by the failure of approximately one-third of datasets in our standard preparation or post-refinement pipelines due to ligand parameterization issues. Many of these stemmed from problematic ligand restraint files, highlighting the difficulty of encoding the geometric constraints of macrocycles using standard restraint libraries. Improved force-field or restraints for macrocycles are desperately needed to improve their modeling.”

      C  - Minor issues:

      (1) "Fragment-soaked event maps" - this is a semantically strange section title!

      We have updated the section title in our revised manuscript. The new title is ‘qFit-ligand recovers heterogeneity in fragment-soaked event maps’.

      (2) Too many digits! All over the manuscript, percentages are displayed with 0.01% precision, while these mostly refer to datasets with ~150 structures. Shifting just one structure from one category to another changes these percentages by nearly 1%.

      We have updated the sig figs in our revised manuscript.

      (3) The authors are keen to classify decreases in RSCC as significant only when these changes exceed 0.1, but do not apply the same standard for increases. For instance, in Figure 4B if we were to classify improvements as significant if ΔRSCC > 0.1, there would be fewer significant improvements than decreases in performance (although it is visually clear that for most datasets things get better. Similarly, in Figure 5A if we were to classify improvements as significant if ΔRSCC > 0.1, qFit-ligand would only yield significant improvements for two out of 73 cases-not a lot).

      We agree with the reviewer that there needs to be more consistency in our analysis of improvements/deteriorations. However, we note that operationally, when the decreases in model quality are observed, the modeler would simply reject the new model in favor of the input model. We have added to the discussion:

      “In active modeling workflows or large scale analyses, the workflow would only accept the output of qFit-ligand when it improves model quality. In cases where qFit-ligand degrades map-to-model fit and/or strain, we can simply revert to the input model. In practice, users can easily remove poorly fitting conformations using molecular modeling software such as COOT, while keeping the well modeled conformations, which is an advantage of the multiconformer approach over ensemble refinement methods.”

      There is generally no consensus in the field as to what might indicate a ‘significant’ change in RSCC, and any threshold we choose would be arbitrary. We note that in our manuscript, we had previously characterized a decrease in RSCC to be ‘significant’ if it exceeded 0.1. However, as there is no real scientific justification for this cutoff, or any cutoff, we moved away from this framing in the revised manuscript. Therefore, we just classify if we improve RSCC. For example, on page 9:

      “qFit-ligand modeled an alternative conformation in 72.5% (n=98) of structures. Compared with the modified true positive models, 83.7% (n=113) of qFit-ligand models have a better RSCC and 77.0% (n=104) structures saw an improvement in EDIAm, representing an improved fit to experimental data in the vast majority of structures.”

      In addition, we have conducted additional experiments using more sensitive metrics (EDIAm) to further illustrate qFit-ligand’s performance.

      (4) Small peptides are not discussed as a class of ligands, although these are quite common.

      Canonical peptides can be modeled with standard qFit. Non-canonical peptides present failure modes similar to the macrocycles discussed above, with a mix of ATOM and HETATM records and the need for custom cif definitions and link records. For these reasons we have not included an analysis outside of the macrocycle section. We have noted this caveat in the discussion:

      “We note that even linear non-canonical peptides present similar failure modes to macrocycles, with a mix of ATOM and HETATM records and the need for custom cif definitions and link records. For these reasons, we did not include analysis on small peptide ligands; however, canonical peptides can be modeled with standard qFit [8].”

      (5) Top of page 10: "while refinement improves": what kind of refinement does this refer to?

      This refers to refinement with Phenix. We have updated this language to reflect this (page 8). “We refer to these altered structures as our ‘modified true positives’, which we use as input to qFit-ligand, and subsequent refinement using Phenix.”

      (6) Bottom of page 11: "they often did" -> "it often did"

      We have made this change in the revised version.

      (7) Top of page 14: RMSDs and B factors do have units.

      We have added the units in our revision.

      (8) Top of page 24. In the generation of a composite omit map, why are new Rfree flags being generated? Did I misunderstand that?

      r_free_flags.generate=True only creates R-free flags if they are not present in the input file as is the case for many (especially older) PDB depositions.

      (9) Bottom of page 27: how large is the mask? Presumably when alt confs of the ligand are possible, it would be helpful for the mask to cover those?

      We agree that this mask should be updated. In our revision, we define the mask around the coordinates of the full qFit-ligand ensemble. The same mask is used to calculate the RSCC of the input (single conformer) model versus the qFit-ligand model.

      (10) Middle of page 29: "These structure factors are then used to compute synthetic electron density maps." - It is not clear whether the following three sentences are an explanation of the details of that statement or rather things that are done afterwards.

      We clarify this in the manuscript (page 36).

      “These structure factors are then used to compute synthetic electron density maps. To each of these maps, we generate and add random Gaussian noise values scaled proportionally to the resolution. This scaling reflects the escalation of experimental noise as resolution deteriorates, a common occurrence in real-life crystallographic data.”

      (11) Chemical synthesis: I am not qualified to assess this and am surprised to see some much detail here rather than in some other manuscript. Are the corresponding structures deposited anywhere?

      All of the structures we discuss in this manuscript are deposited in the PDB and listed in Supplementary Table 5.

      Reviewer #2 (Recommendations for the authors):

      The data should consistently present the number of structures that exhibit improvements or deterioration in particular metrics, like RSCC and strain, using a cutoff that should be significant. For instance, stating that "85.93% (n=116) of structures having a better RSCC in the qFit-ligand models compared to the modified true positive models" without clarifying the magnitude of improvement (e.g., a marginal increase of 0.01 in RSCC) lacks meaningful context. The figures should clearly indicate the specific cutoff values used for each metric. The accompanying text should provide a detailed explanation for the selection of these cutoff values, justifying their significance in the context of the study.

      Currently, there is no established consensus within the field on what constitutes a 'significant' improvement in RSCC or strain values. As such, we chose not to impose an arbitrary cutoff and just look at which structures improve RSCC. We also removed all language stating significance, as there isn’t a good standard in the field to assess significance. This is especially important as only improvements would be considered in an active modeling project. In cases where qFit ligand degrades the RSCC (or strain) to a large extent, the modeler would simply revert to the input model.

      In the first section of Results: "First, for all ligands, we perform an unconstrained search function allowing the generated conformers to only be constrained from the bounds matrix (Figure 1A). This is particularly advantageous for small ligands that benefit from less restriction to fully explore their conformational space. We then perform a fixed terminal atoms search function (Figure 1B)." It is unclear whether a fixed terminal atom search was conducted for each conformer generated in the initial step to further explore the conformational space. This aspect should be clarified to provide a more comprehensive understanding of the methodology.

      Each independent conformer generation function (A-E) is initialized with only the input ligand model and runs in parallel with the other functions. These functions do not build on each other, but rather perturb the input molecule independently of one another. In our updated manuscript, we have clarified the methodology (page 6).

      “First, in all cases, we perform an unconstrained search function (Figure 1A), a fixed terminal atoms search function (Figure 1B), and a blob search function (Figure 1C).”

      Phrase: "We randomly sampled 150 structures and, after manual inspection of the fit of alternative conformations, chose 135 crystal structures as a development set for improving qFit-ligand." The authors should explain why they filtered 10% of the structures.

      To develop qFit-ligand, we wanted to use a very high-quality dataset. We needed to know with some degree of certainty that if qFit-ligand failed to produce an alternate conformation (or generated conformations low in RSCC or high in strain), the failure was due to an algorithmic limitation rather than poor-quality input data. Therefore, after selection based on numerical metrics, we manually examined each ligand in Coot to observe if we believed the alternative conformers fit well into the density.

    1. eLife Assessment

      This important study reports a reanalysis of one experiment of a previously-published report to characterize the dynamics of neural population codes during visual working memory in the presence of distracting information. This paper presents solid evidence that working memory representations are dynamic and distinct from sensory representations of intervening distractions. This research will be of interest to cognitive neuroscientists working on the neural bases of visual perception and memory.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors re-analyzed a public dataset (Rademaker et al, 2019, Nature Neuroscience) which includes fMRI and behavioral data recorded while participants held an oriented grating in visual working memory (WM) and performed a delayed recall task at the end of an extended delay period. In that experiment, participants were pre-cued on each trial as to whether there would be a distracting visual stimulus presented during the delay period (filtered noise or randomly-oriented grating). In this manuscript, the authors focused on identifying whether the neural code in retinotopic cortex for remembered orientation was 'stable' over the delay period, such that the format of the code remained the same, or whether the code was dynamic, such that information was present, but encoded in an alternative format. They identify some timepoints - especially towards the beginning/end of the delay - where the multivariate activation pattern fails to generalize to other timepoints, and interpret this as evidence for a dynamic code. Additionally, the authors compare the representational format of remembered orientation in the presence vs absence of a distracting stimulus, averaged over the delay period. This analysis suggested a 'rotation' of the representational subspace between distracting orientations and remembered orientations, which may help preserve simultaneous representations of both remembered and viewed stimuli. Intriguingly, this rotation was a bit smaller for Expt 2, in which the orientation distractor had a greater behavioral impact on the participants' behavioral working memory recall performance, suggesting that more separation between subspaces is critical for preserving intact working memory representations.

      Strengths:

      (1) Direct comparisons of coding subspaces/manifolds between timepoints, task conditions, and experiments is an innovative and useful approach for understanding how neural representations are transformed to support cognition

      (2) Re-use of existing dataset substantially goes beyond the authors' previous findings by comparing geometry of representational spaces between conditions and timepoints, and by looking explicitly for dynamic neural representations

      (3) Simulations testing whether dynamic codes can be explained purely by changes in data SNR are an important contribution, as this rules out a category of explanations for the dynamic coding results observed

      Weaknesses:

      (1) Primary evidence for 'dynamic coding', especially in early visual cortex, appears to be related to the transition between encoding/maintenance and maintenance/recall, but the delay period representations seem overall stable, consistent with some previous findings. However, given the simulation results, the general result that representations may change in their format appears solid, though the contribution of different trial phases remains important for considering the overall result.

      (2) Converting a continuous decoding metric (angular error) to "% decoding accuracy" serves to obfuscate the units of the actual results. Decoding precision (e.g., sd of decoding error histogram) would be more interpretable and better related to both the previous study and behavioral measures of WM performance.

      Comments on revised version:

      The authors have addressed all my previous concerns.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Reviewer #1 (Recommendations For The Authors):

      (1) At several places in the reply to reviewers and the manuscript, when discussing the new simulations conducted, the authors mention they break the 180 trials into a train/test split of 108/108 - is this value correct? If so, how? (pg 19 of updated manuscript)  

      Thank you for pointing this out; it was not clearly explained. We have now added the explanation to the Methods section: 

      “For each iteration, we randomly selected 108 responses from the full set of 180 for training, and then independently sampled another 108 from the same full set for testing. This ensured that the same orientation could appear in both sets, consistent with the structure of the original experiment.”

      (2) I appreciate the authors have added the variance explained of principal components to the axes of Fig. 3, though it took me a while to notice this, and this isn't described in the figure caption at all. It would likely help readers to directly explain what the % means on each axis of Fig. 3.

      Thank you, we have now added a description in both Fig. 2 and 3:

      “The axes represent the first two principal components, with labels indicating the percent of total explained variance.”

      (3) I believe there is a typo/missing word in the new paragraph on pg 15: "neural visual WM representations in the early visual cortices are [[biased]] towards distractors" (I think the bracketed word may be omitted as a typo)

      Thank you - fixed.

    1. eLife Assessment

      In this important study, the authors have performed a zebrafish drug screen to identify suppressors of atherogenic lipoproteins. They utilize a well-established LipoGlo assay to find molecules that modulate these lipoproteins, identifying 49 potential hits. They perform some validation experiments, including studies linking enoxolone to its likely inhibitory effect on a specific transcription factor, HNF4alpha. Overall, the results are convincing and robust, and will open up new areas of exploration for those investigators interested in in vivo lipid biology.

    2. Reviewer #1 (Public review):

      Summary:

      A whole-organism drug screen was performed to identify molecules that decrease Apolipoprotein B (ApoB) as a target for agents to reduce atherosclerosis. Kelpsch et al. used a zebrafish reporter line, LipoGlo, which is a fusion of the Nano-luciferase protein to the ApoB protein as a proxy for the presence of ApoB-containing lipoproteins (B-lps) in larval stages. The LipoGlo line was screened against a well-characterized drug library and identified 49 hits from their primary screen. Follow-up studies further refined this list to 19 molecules that reproducibly reduced B-lps significantly. The authors focused their studies on enoxolone, a licorice root extract, and showed that larvae treated with this agent can reduce the production of B-lps. As enoxolone has been reported to suppress Hepatocyte Nuclear factor 4a (HNF4a), the authors investigated whether loss-of-hnf4a or pharmacological inhibition of hnf4a in zebrafish also produced similar phenotypes as enoxolone treatment. Their studies showed that this was the case. Transcriptomic studies after enoxolone treatment resulted in altered expression of genes involved in cholesterol biosynthesis and in glucose/insulin signaling pathways. This study highlights the utility of a zebrafish whole-organism chemical screen for modifiers of B-lps production and/or its clearance. A significant finding is that enoxolone inhibits hnf4a in zebrafish to reduce B-lps production and supports targeting HNF4a as a therapeutic means to reduce the emergence of atherosclerosis.

      Strengths:

      The authors performed a whole-organism chemical screen with over 3000 agents. Such screens are challenging, and the authors used strict criteria for determining hits. The conclusions of this study are well supported by the presented data.

      Weaknesses:

      There are areas within the study and writing that can be improved and extended, specifically within the gene expression studies.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to develop a large-scale drug screen to identify B-lp modulators in a vertebrate whole-animal system. Using the zebrafish LipoGlo system that the authors had previously published and validated, the authors screened 2762 drug candidates to generate 49 hits and ultimately validated 19 drugs as genuine ApoB-lowering drugs. Using LipoGlo-Electrophoresis, the authors are able to obtain insights into the ApoB-lipoprotein size/subclass distribution. The authors further validate and study the mechanism of a strong hit, Enoxolone, known as also known as 18β-Glycyrrhetinic acid, which has previously been reported to modulate lipid metabolism. The authors also show that Enoxolone effects are mediated through HNF4⍺, which has been previously shown in the mouse system, but this is the first time it has been shown in the zebrafish.

      Strengths:

      The study was methodical and robust, using a published and well-validated zebrafish LipoGlo model. The authors validated the hits from the screen independently and considered the possibility that some drugs may have been detected as false positive results due to effects on the enzymatic activity of NanoLuciferase; only one hit, verteporfin, was shown to be a false positive. Using LipoGlo-Electrophoresis, the authors are able to obtain extra insights into the ApoB-lipoprotein size/subclass distribution. They showed that while enoxolone treatment reduces total B-lps, there are no overt changes in B-lp size distribution compared to vehicle-treated animals, other than a slight increase in the zero mobility (ZM) fraction, which contains very large particles and/or tissue aggregates. In contrast, the positive control, lomitapide, does show a change in B-lp size distribution compared to vehicle-treated animals - an increase in frequency of LDLs (low-density lipoprotein), but a decrease in VLDLs (very low-density lipoprotein). This study also assesses the LipoGlo-Electrophoresis profile of HNF4⍺ inhibitors. Work in the zebrafish larvae means that the effect on overall development and an entire vertebrate organism can also be assessed. Finally, the authors applied a thorough statistical measure to define a hit, using the Strictly Standardized Mean Difference (SSMD) method.

      Weaknesses:

      While the screen was thorough and well-validated, the authors missed a chance to provide a lot of extra significance to a wide range of readership. While the hits were thoroughly validated and displayed, the authors could have also presented the LipoGlo-Electrophoresis for all validated hits or at least a number of them. This would hugely increase the insights into these compounds. Also, the authors chose to validate and follow up a mechanism for Enoxolone, yet this hit was already known to modulate lipid metabolism through HNF4⍺, therefore, hugely limiting the impact of the paper. So what the authors have shown that is novel is only subtly added to this - consistent in vertebrate models, RNA sequencing of pathways, further validation of the HNF4⍺ pathway, and a profile of resulting B-lp size distribution. It seemed an easy way out to pick such a candidate, and they could have followed up by validating more thoroughly a completely novel drug. Also, the authors' prior paper showing the methodology also depicted complementary EM and LipoGlo-microscopy approaches. The microscopy especially, would have been an easy complementary add-on to the screen to really give extra insights into B-lp metabolism in a whole organism for all candidates. This felt like a missed opportunity.

    4. Reviewer #3 (Public review):

      Summary:

      In "A‬‭ whole-animal‬‭ phenotypic‬‭ drug‬‭ screen‬‭ identifies‬‭ suppressors‬‭ of‬‭ atherogenic‬ lipoproteins", Kelpsch et al seek to identify new, chemically targetable pathways that regulate ApoB function and could ultimately serve as treatments for elevated lipid disorders and/or cardiovascular disease. Given the interconnected nature of lipid regulation in the whole organism with interdependent organs and secreted components (i.e. lipoproteins), they use the vertebrate model zebrafish to screen a large library of ~3000 compounds for their ability to lower the important ApoB-containing lipoproteins. They find 49 hits with 19 compounds passing a higher level of scrutiny, and focus on the role of enoxolone in modulating B-Ip levels at least partly through the HNF4alpha transcription factor and, putatively, through downstream cholesterol/lipid biosynthetic pathways.

      Strengths:

      The study uses a well-validated in vivo stain (LipoGlo) for measuring lipoproteins in the context of a developing whole organism with a quantitative read-out on a high-throughput platform, allowing for screening of thousands of compounds altering the complex metabolic/physiologic functions necessary for lipoprotein production.

      The use of genetic mutant HNF4alpha to assign the mechanism of action to the prime candidate compound studied (enoxolone) is a powerful approach for this challenging aspect of chemical genetics studies. See caveats in weaknesses.

      Weaknesses:

      As shown in Figure 5A, the HNF4alpha mutant homozygous -/- already lowers lipoproteins. Is it just that the mutant level is already at a minimum in this homozygous mutant (and thus enoxolone can not induce even lower lipoprotein levels), or is it true that the enoxolone molecule is primarily acting through this TF (i.e. HNF4alpha homozygous mutant is truly epistatic to enoxolone function) as favored in the text.

      While it is definitely interesting to study enoxolone effects during whole embryo development, the link to HNF4alpha had previously been described in the literature, as pointed out by the authors. The generalizability of the approach to identify truly novel pathways remains to be fully realized, but sharing this available screen data to date will invite further inquiry and be very valuable to the community.

      Figure 5 - The same allele of HNF4alpha loss of function/hypomorph (rdu14) is used in both 5A and 5B, but labeled differently in each subpanel. This is explained in the figure legend, but could be updated to use the same nomenclature in both panels to clarify the Figure presentation.

    5. Author response:

      We would like to thank the editors and reviewers for their time and their helpful feedback. We largely agree with the reviewer recommendations and comments, which we will address for the next Version on Record of this manuscript. We plan to address reviewer comments in the following ways.

      Reviewers requested a more comprehensive analysis of our RNA-seq experiment comparing vehicle treatment to enoxolone treatment over time. We will improve our analysis by providing clear, accessible, and organized tables defining differentially expressed genes at each time point, gene set lists that comprise our gene ontology analysis, and the lists of shared differentially expressed genes from enoxolone treatment and HNF4⍺ knockout. While some of this data was provided in the supplementary files, we recognize that it should be more accessible for the reader. Furthermore, as suggested by the Reviewer, we will enhance our transcriptomic analysis by utilizing bioinformatic tools such as Enrichr.

      The Reviewers noted that we identified a number of lipoprotein-lowering compounds through our drug screen, but limited the impact of our manuscript by focusing on enoxolone, a known inhibitor of HNF4⍺ and modulator of lipid metabolism. While we understand with the sentiment that other novel compounds would be interesting to study, we aimed to demonstrate proof of concept in this manuscript. We view the characterization of novel compounds as beyond the direct scope of this manuscript. We did not perform LipoGlo imaging and electrophoresis experiments on each drug because these experiments are low-throughput given the number of drugs and doses we examined. In light of the Reviewer’s comments, we will add some additional characterizations of our validated hits with LipoGlo imaging and electrophoresis studies.

      The reviewers also identified a number of typos in text and figures that will be addressed in the next Version on Record. We believe that the recommended changes will strengthen our manuscript and broaden its appeal. We are grateful for the opportunity to improve our work based on the reviewers’ valuable suggestions.

    1. eLife Assessment

      This study provides useful information on the impact of Lamin A/C knockdown on gene expression using RNA-Seq analysis. In addition, the impact of Lamin A/C knockdown on telomere dynamics is explored using live cell imaging. The conclusions, however, are inadequately supported by the data presented. Weaknesses include excessive reliance on gene ontology analysis without further validation of direct versus indirect effects, use of only one shRNA, which may have off target effects, validation of knockdown only from gene expression rather than protein levels, lack of discussion on previous studies showing the presence of Lamin A/C in the nuclear interior among others.

    2. Reviewer #1 (Public review):

      This manuscript reports a descriptive study of changes in gene expression after knockdown of the nuclear envelope proteins lamin A/C and Nesprin2/SYNE2 in human U2OS cells. The readout is RNA-seq, which is analyzed at the level of gene ontology and focused investigation of isoform variants and non-coding RNAs. In addition, the mobility of telomeres is studied after these knockdowns, although the rationale in relation to the RNA-seq analyses is rather unclear.

      RNA-seq after knockdown of lamin proteins has been reported many times, and the current study does not provide significant new insights that help us to understand how lamins control gene expression. This is particularly because the vast majority of the observed effects on gene expression appear to occur in regions that are not bound by lamin A. It seems likely that these effects are indirect. There is also virtually no overlap between genes affected by laminA/C and by SYNE2, which remains unexplained; for example, it would be good to know whether laminA/C and SYNE2 bind to different genomic regions. The claim in the Title and Abstract that LMNA governs gene expression / acts through chromatin organization appears to be based only on an enrichment of gene ontology terms "DNA conformation change" and "covalent chromatin conformation" in the RNA-seq data. This is a gross over-interpretation, as no experimental data on chromatin conformation are shown in this study. The analyses of transcript isoform switching and ncRNA expression are potentially interesting but lack a mechanistic rationale: why and how would these nuclear envelope proteins regulate these aspects of RNA expression? The effects of lamin A on telomere movements have been reported before; the effects of SYNE2 on telomere mobility are novel (to my knowledge), but should be discussed in the light of previously documented effects of SUN1/2 on the dynamics of dysfunctional telomeres (Lottersberger et al, Cell 2015).

      As indicated below, I have substantial concerns about the experimental design of the knockdown experiments.

      Altogether, the results presented here are primarily descriptive and do not offer a significant advance in our understanding of the roles of LaminA and SYNE2 in gene regulation or chromatin biology, because the results remain unexplained mechanistically and functionally. Furthermore, the RNAseq datasets should be interpreted with caution until off-target effects of the shRNAs can be ruled out.

      Specific comments:

      (1) Knockdowns were only monitored by qPCR. Efficiency at the protein level (e.g., Western blots) needs to be determined.

      (2) For each knockdown, only a single shRNA was used. shRNAs are infamous for off-target effects; therefore, multiple shRNAs for each protein, or an alternative method such as CRISPR deletion or degron technology, must be tested to rule out such off-target effects.

      (3) It is not clear whether the replicate experiments are true biological replicates (i.e., done on different days) or simply parallel dishes of cells done in a single experiment (= technical replicates). The extremely small standard deviations in the RT-qPCR data suggest the latter, which would not be adequate.

    3. Reviewer #2 (Public review):

      Summary:

      This study focused on the roles of the nuclear envelope proteins lamin A and C, as well as nesprin-2, encoded by the LMNA and SYNE2 genes, respectively, on gene expression and chromatin mobility. It is motivated by the established role of lamins in tethering heterochromatin to the nuclear periphery in lamina-associated domains (LADs) and modulating chromatin organization. The authors show that depletion of lamin A, lamin A and C, or nesprin-2 results in differential effects of mRNA and lncRNA expression, primarily affecting genes outside established LADs. In addition, the authors used fluorescent dCas9 labeling of telomeric genomic regions combined with live-cell imaging to demonstrate that depletion of either lamin A, lamin A/C, or nesprin-2 increased the mobility of chromatin, suggesting an important role of lamins and nesprin-2 in chromatin dynamics.

      Strengths:

      The major strength of this study is the detailed characterization of changes in transcript levels and isoforms resulting from depletion of either lamin A, lamin A/C, or nesprin-2 in human osteosarcoma (U2OS) cells. The authors use a variety of advanced tools to demonstrate the effect of protein depletion on specific gene isoforms and to compare the effects on mRNA and lncRNA levels.

      The TIRF imaging of dCas9-labeled telomeres allows for high-resolution tracking of multiple telomeres per cell, thus enabling the authors to obtain detailed measurements of the mobility of telomeres within living cells and the effect of lamin A/C or nesprin-2 depletion.

      Weaknesses:

      Although the findings presented by the authors overall confirm existing knowledge about the ability of lamins A/C and nesprin to broadly affect gene expression, chromatin organization, and chromatin dynamics, the specific interpretation and the conclusions drawn from the data presented in this manuscript are limited by several technical and conceptual challenges.

      One major limitation is that the authors only assess the knockdown of their target genes on the mRNA level, where they observe reductions of around 70%. Given that lamins A and C have long half-lives, the effect at the protein level might be even lower. This incomplete and poorly characterized depletion on the protein level makes interpretation of the results difficult. The description for the shRNA targeting the LMNA gene encoding lamins A and C given by the authors is at times difficult to follow and might confuse some readers, as the authors do not clearly indicate which regions of the gene are targeted by the shRNA, and they do not make it obvious that lamin A and C result from alternative splicing of the same LMNA gene. Based on the shRNA sequences provided in the manuscript, one can conclude that the shLaminA shRNA targets the 3' UTR region of the LMNA gene specific to prelamin A (which undergoes posttranslational processing in the cell to yield lamin A). In contrast, the shRNA described by the authors as 'shLMNA' targets a region within the coding sequence of the LMNA gene that is common to both lamin A and C, i.e., the region corresponding to amino acids 122-129 (KKEGDLIA) of lamin A and C. The authors confirm the isoform-specific effect of the shLaminA isoform, although they seem somewhat surprised by it, but do not confirm the effect of the shLMNA construct. Assessing the effect of the knockdown on the protein level would provide more detailed information both on the extent of the actual protein depletion and the effect on specific lamin isoforms. Similarly, given that nesprin-2 has numerous isoforms resulting from alternative splicing and transcription initiation. In the current form of the manuscript, it remains unclear which specific nesprin-2 isoforms were depleted, and to what extent (on the protein level).

      Another substantial limitation of the manuscript is that the current analysis, with the exception of the chromatin mobility measurements, is exclusively based on transcriptomic measurements by RNA-seq and qRT-PCR, without any experimental validation of the predicted protein levels or proposed functional consequences. As such, conclusions about the importance of lamin A/C on RNA synthesis and other functions are derived entirely from gene ontology terms and are not sufficiently supported by experimental data. Thus, the true functional consequences of lamin A/C or nesprin depletion remain unclear. Statements included in the manuscript such as "our findings reveal that lamin A is essential for RNA synthesis, ..." (Lines 79-80) are thus either inaccurate or misleading, as the current data do not show that lamin A is ESSENTIAL for RNA synthesis, and lamin A/C and lamin A deficient cells and mice are viable, suggesting that they are capable of RNA synthesis.

      Another substantial weakness is that the data and analysis presented in the manuscript raise some concerns about the robustness of the findings. Given that the 'shLMNA' construct is expected to deplete both lamin A and C, i.e., its effect encompasses the depletion of lamin A, which is achieved by the 'shLaminA' construct, one would expect a substantial overlap between the DEGs in the shLMNA and shLaminA conditions, with the shLMNA depletion producing a broader effect as it targets both lamin A and C. However, the Venn Diagram in Figure 4a, the genomic loci distribution in Figure 4b, and the correlation analysis in Supplementary Figure S2 show little overlap between the shLMNA and shLaminA conditions, which is quite surprising. In the mapping of the DEGs shown in Figure 4b, it is also surprising not to see the gene targeted by the shRNA, LMNA, found on chromosome 1, in the results for the shLMNA and shLamin A depletion.

      The correlation analysis in Supplementary Figure S2 raises further questions. The authors use doc-inducible shRNA constructs to target lamin A (shLaminA), lamin A/C (shLMNA), or nesprin-2 (shSYNE2). Thus, the no-dox control (Ctr) for each of these constructs would be expected to be very similar to the non-target scrambled controls (Ctrl.shScramble and Dox.shScramble). However, in the correlation matrix, each of the no-dox controls clusters more closely with the corresponding dox-induced shRNA condition than with the Ctrl.shScramble or Dox.shScramble conditions, suggesting either a very leaky dox-inducible system, strong effects from clonal selection, or substantial batch effects in the processing. Either of these scenarios could substantially affect the interpretation of the findings. For example, differences between different clonal cell lines used for the studies, independent of the targeted gene, could explain the limited overlap between the different shRNA constructs and result in apparent differences when comparing these clones to the scrambled controls, which were derived from different clones.

      The manuscript also contains several factually inaccurate or incorrect statements or depictions. For example, the depiction of the nuclear envelope in Figure 1 shows a single bilipid layer, instead of the actual double bi-lipid layer of the inner and outer nuclear membranes that span the nuclear lumen. The depiction further lacks SUN domain proteins, which, together with nesprins, form the LINC complex essential to transmit forces across the nuclear envelope. The statement in line 214 that "Linker of nucleoskeleton and cytoskeleton (LINC) complex component nesprin-2 locates in the nuclear envelope to link the actin cytoskeleton and the nuclear lamina" is not quite accurate, as nesprin-2 also links to microtubules via dynein and kinesin.

      The statement that "Our data show that Lamin A knockdown specifically reduced the usage of its primary isoform, suggesting a potential role in chromatin architecture regulation, while other LMNA isoforms remained unaffected, highlighting a selective effect" (lines 407-409) is confusing, as the 'shLaminA' shRNA specifically targets the 3' UTR of lamin A that is not present in the other isoforms. Thus, the observed effect is entirely consistent with the shRNA-mediated depletion, independent of any effects on chromatin architecture.

      The premise of the authors that lamins would only affect peripheral chromatin and genes at LADs neglects the fact that lamins A and C are also found in the nuclear interior, where they form stable structure and influence chromatin organization, and the fact that lamins A and C and nesprins additionally interact with numerous transcriptional regulators such as Rb, c-Fos, and beta-catenins, which could further modulate gene expression when lamins or nesprins are depleted.

      The comparison of the identified DEGs to genes contained in LADs might be confounded by the fact that the authors relied on the identification of LADs from a previous study (ref #28), which used a different human cell type (human skin fibroblasts) instead of the U2OS osteosarcoma cells used in the present study. As LADs are often highly cell-type specific, the use of the fibroblast data set could lead to substantial differences in LADs.

      Another limitation of the current manuscript is that, in the current form, some of the figures and results depicted in the figures are difficult to interpret for a reader not deeply familiar with the techniques, based in part on the insufficient labeling and figure legends. This applies, for example, to the isoform use analysis shown in Figure 3d or the GenometriCorr analysis quantifying spatial distance between LADs and DEGs shown in Figure 4c.

      Overall appraisal and context:

      Despite its limitations, the present study further illustrates the important roles the nuclear envelope proteins lamin A, lamin C, and nesprin-2 have in chromatin organization, dynamics, and gene expression. It thus confirms results from previous studies (not always fully acknowledged in the current manuscript) previously reported for lamin A/C depletion. For example, the effect of lamin A/C depletion on increasing mobility of chromatin had already been demonstrated by several other groups, such as Bronshtein et al. Nature Comm 2015 (PMID: 26299252) and Ranade et al. BMC Mol Cel Biol 2019 (PMID: 31117946). Additionally, the effect of lamin A/C depletion on gene and protein expression has already been extensively studied in a variety of other cell lines and model systems, including detailed proteomic studies (PMIDs 23990565 and 35896617).

      The finding that that lamin A/C or nesprin depletion not only affects genes at the nuclear periphery but also the nuclear interior is not particularly surprising giving the previous studies and the fact that lamins A and C are also founding within the nuclear interior, where they affect chromatin organization and dynamics, and that lamins A/C and nesprins directly interact with numerous transcriptional regulators that could further affect gene expression independent from their role in chromatin organization.

      The authors provide a detailed analysis of isoform switching in response to lamin A/C or nesprin depletion, but the underlying mechanism remains unclear. Similarly, their analysis of the genomic location of the observed DEGs shows the wide-ranging effects of lamin A/C or nesprin depletion, but lets the reader wonder how these effects are mediated. A more in-depth analysis of predicted regulator factors and their potential interaction with lamins A/C or nesprin would be beneficial in gaining more mechanistic insights.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript describes DOX inducible RNAi KD of Lamin A, LMNA coded isoforms as a group, and the LINC component SYNE2. The authors report on differentially expressed genes, on differentially expressed isoforms, on the large numbers of differentially expressed genes that are in iLADs rather than LADs, and on telomere mobility changes induced by 2 of the 3 knockdowns.

      Strengths:

      Overall, the manuscript might be useful as a description for reference data sets that could be of value to the community.

      Weaknesses:

      The results are presented as a type of data description without formulation of models or explanations of the questions being asked and without follow-up. Thus, conceptually, the manuscript doesn't appear to break new ground.

      Not discussed is the previous extensive work by others on the nucleoplasmic forms of LMNA isoforms. Also not discussed are similar experiments- for instance, gene expression changes others have seen after lamin A knockdowns or knockouts, or the effect of lamina on chromatin mobility, including telomere mobility - see, for example, a review by Roland Foisner (doi.org/10.1242/jcs.203430) on nucleoplasmic lamina. The authors need to do a thorough search of the literature and compare their results as much as possible with previous work.

      The authors don't seem to make any attempt to explore the correlation of their findings with any of the previous data or correlate their observed differential gene expression with other epigenetic and chromatin features. There is no attempt to explore the direction of changes in gene expression with changes in nuclear positioning or to ask whether the genes affected are those that interact with nucleoplasmic pools of LMNA isoforms. The authors speculate that the DEG might be related to changing mechanical properties of the cells, but do not develop that further.

      The technical concerns include: 1) Use of only one shRNA per target. Use of additional shRNAs would have reduced concern about possible off-target knockdown of other genes; 2) Use of only one cell clone per inducible shRNA construct. Here, the concern is that some of the observed changes with shRNA KDs might show clonal effects, particularly given that the cell line used is aneuploid. 3) Use of a single, "scrambled" control shRNA rather than a true scrambled shRNA for each target shRNA.

    1. eLife Assessment

      This study reveals that PRMT1 overexpression drives tumorigenesis of acute megakaryocytic leukemia (AMKL) and that targeting PRMT1 is a viable approach for treating AMKL. After revision, both reviewers found that these findings are important and that the data supporting these findings are convincing. Furthermore, these findings likely have significant implications for the treatment of AMKL with PRMT1 overexpression in the future.

    2. Reviewer #1 (Public review):

      Summary:

      PRMT1 overexpression is linked to poor survival in cancers, including acute megakaryocytic leukemia (AMKL). This manuscript describes the important role of PRMT1 in the metabolic reprograming in AMKL. In a PRMT1-driven AMKL model, only cells with high PRMT1 expression induced leukemia, which was effectively treated with the PRMT1 inhibitor MS023. PRMT1 increased glycolysis, leading to elevated glucose consumption, lactic acid accumulation, and lipid buildup while downregulating CPT1A, a key regulator of fatty acid oxidation. Treatment with 2-deoxy-glucose (2-DG) delayed leukemia progression and induced cell differentiation, while CPT1A overexpression rescued cell proliferation under glucose deprivation. Thus, PRMT1 enhances AMKL cell proliferation by promoting glycolysis and suppressing fatty acid oxidation.

      Strengths:

      This study highlights the clinical relevance of PRMT1 overexpression with AMKL, identifying it as a promising therapeutic target. A key novel finding is the discovery that only AMKL cells with high PRMT1 expression drive leukemogenesis, and this PRMT1-driven leukemia can be effectively treated with the PRMT1 inhibitor MS023. The work provides significant metabolic insights, showing that PRMT1 enhances glycolysis, suppresses fatty acid oxidation, downregulates CPT1A, and promotes lipid accumulation, which collectively drive leukemia cell proliferation. The successful use of the glucose analogue 2-deoxy-glucose (2-DG) to delay AMKL progression and induce cell differentiation underscores the therapeutic potential of targeting PRMT1-related metabolic pathways. Furthermore, the rescue experiment with ectopic Cpt1a expression strengthens the mechanistic link between PRMT1 and metabolic reprogramming. The study employs robust methodologies, including Seahorse analysis, metabolomics, FACS analysis, and in vivo transplantation models, providing comprehensive and well-supported findings. Overall, this work not only deepens our understanding of PRMT1's role in leukemia progression but also opens new avenues for targeting metabolic pathways in cancer therapy.

      Comments on revisions:

      The reviewer's questions were adequately addressed.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript explores the role of PRMT1 in AMKL, highlighting its overexpression as a driver of metabolic reprogramming. PRMT1 overexpression enhances the glycolytic phenotype and extracellular acidification by increasing lactate production in AMKL cells. Treatment with the PRMT1 inhibitor MS023 significantly reduces AMKL cell viability and improves survival in tumor-bearing mice. Intriguingly, PRMT1 overexpression also increases mitochondrial number and mtDNA content. High PRMT1-expressing cells demonstrate the ability to utilize alternative energy sources dependent on mitochondrial energetics, in contrast to parental cells with lower PRMT1 levels.

      Strengths:

      This is a conceptually novel and important finding as PRMT1 has never been shown to enhance glycolysis in AMKL, and provides a novel point of therapeutic intervention for AMKL.

      Comments on revisions:

      The author has responded satisfactorily to the review comments and revised the manuscript accordingly.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      We thank the reviewer for highlighting the strength in our manuscript  as quote: “Overall, this work not only deepens our understanding of PRMT1's role in leukemia progression but also opens new avenues for targeting metabolic pathways in cancer therapy.”

      Weakness :

      (1) The findings rely heavily on a single AMKL cell line, with no validation in patient-derived samples to confirm clinical relevance or even another type of leukemia line. Adding the discussion of PRMT1's role in other leukemia types will increase the impact of this work.

      We mentioned in the introduction that PRMT1 is known to be the driver for leukemia with diverse types of mutations. In a related paper published in Cell Reports (Su et al. 2021), we demonstrated that PRMT1 is upregulated in MDS myeloid dysplasia syndrome patient samples and that the inhibition of PRMT1 promotes megakaryocytic differentiation of a few MDS samples. AMKL is very rare. Via Children’s Oncology group consortium, we have obtained five AMKL samples with Down’s syndrome and AMKL with RBM15-MKL1 translocation out of 32 samples in the bank over the last 20 years. Interestingly, these patient samples also contain trisomy 19. As PRMT1 is localized on chromosome 19, we speculate that PRMT1 is the significant driver for AMKL leukemia, although we have very limited genetic evidence. However, these human frozen samples derived from peripheral blood cannot be grown in a cell culture system. Although we did not perform metabolic analysis for other AMKL cell lines, we did validate in our unpublished studies that PRMT1 drives down CPT1A expression in normal bone marrow cells and platelets in mice and in human leukemia cell line called MEG-01, which can be differentiated into megakaryocytes upon PMA (phorbol 12-myristate 13-acetate) treatment. Therefore, we expect that the PRMT1-mediated metabolic reprogramming we described here should apply to other types of hematological malignancies.

      (2) The observed heterogeneity in Prmt1 expression is noted but not further investigated, leaving gaps in understanding its broader implications.

      The expression level of PRMT1 is heterogeneous within leukemia cell populations, making it intriguing to study. We can sort the cells based on high versus low PRMT1 expression using a fluorescent dye called E84. However, we have not conducted transcriptome analysis on these two populations, mainly due to resource constraints. Theoretically, the E84 high-expression population may transiently utilize glucose more efficiently, as these cells do not ectopically express PRMT1. Therefore, when nutrient levels decline, these cells might switch to the low PRMT1 expression population. It will be interesting to see whether endogenous leukemia cells transiently expressing high levels of PRMT1 take advantage of their efficient usage of glucose and thus adapt to the niche environment successfully, as we observed in the Figure 1. I agree that this would be an interesting direction to pursue in the future.

      (3) Some figures and figure legends didn't include important details or had not matching information.

      We would like to thank the reviewer for pointing out these mistakes. Now we have corrected.

      (4) Some wording is not accurate, such as line 80 "the elevated level of PRMT1 maintains the leukemic stem cells", the study is using the cell line, not leukemia stem cells.

      Leukemic stem cells are often referred to as cells that can initiate leukemia when transplanted into recipient mice, a concept first proposed by John Dick. In this study, we found that even the 6133 cell line displays heterogeneity in terms of PRMT1 expression levels. We identified a subgroup of 6133 cells as leukemia stem cells due to their ability to initiate leukemia.

      (5) In the disease model, histopathology of blood, spleen, and BM should be shown.

      We did not conduct histopathology analysis. 6133 cells associated histopathology has been published in Mercher et al JCI 2009 and a recent preprint by Diane Krause’s group.

      (6) Can MS023 treatment reverse the metabolic changes in PRMT1 overexpression AMKL cells?

      Yes, We demonstrated in figure 4 in the seahorse assays that prmt1 inhibitor can increase the oxygen consumption.

      It would be helpful to provide a summary graph at the end of the manuscript.

      Yes, we now provide a graphic abstract.

      Reviewer #2 (Public review):

      We would like to thank the reviewer for finding the manuscript novel and important.

      Weaknesses:

      (1) The manuscript lacks detailed molecular mechanisms underlying PRMT1 overexpression, particularly its role in enhancing survival and metabolic reprogramming via upregulated glycolysis and diminished oxidative phosphorylation (OxPhos). The findings primarily report phenomena without exploring the reasons behind these changes.

      In the introduction, we highlighted that numerous studies have demonstrated how PMT1 directly interacts with several key enzymes involved in glycolysis. These studies provide a mechanism for the observed upregulation of PMT1 in leukemia. Additionally, our previous research published in eLife 2015 {Zhang, 2015 #5031} demonstrated that PRMT1 methylates the RNA-binding protein RBM15, which can bind to the 3' UTR of mRNAs encoding various metabolic enzymes. Therefore, we propose that PMT1 may also regulate metabolism indirectly through the RBM15 protein.

      (2) The article shows that PRMT1 overexpression leads to augmented glycolysis and low reliance on the OxPhos. However, the manuscript also shows that PMRT1 overexpression leads to increased mitochondrial number and mitochondrial DNA content and has an elevated NADPH/NAD+ ratio. Further, these overexpressing cells have the ability to better survive on alternative energy sources in the absence of glucose compared to low PMRT1-expressing parental cells. Surprisingly, the seashores assay in PRMT1 overexpressing cells showed no further enhancement in the ECAR after adding mitochondrial decoupler FCCP, indicating the truncated mitochondrial energetics. These results are contradicting and need a more detailed explanation in the discussion.

      We have explained the metabolic changes in more detail now. Increasing mitochondria number is not equivalent to increasing fatty acid oxidation and oxygen consumption, as the mitochondria have many other functions. PRMT1 only downregulates CPT1A, which is a rate-limiting step for long-chain fatty acid oxidation. The data suggest that PRMT1 promotes the biogenesis of mitochondria maybe via PGC1alpha as published by Stallcup’s group. The seahorse assays were performed in the high concentration of glucose instead of alternative carbon sources.  FCCP treatment under high glucose conditions did not increase the ECR and OCR, which is normal for leukemia cells as shown in other people’s publications {Sriskanthadevan, 2015 #3944}{Kreitz, 2019 #2133}. PRMT1 could dampen the activities of TCA cycle and the electron transportation chain as the proteomic data from our unpublished data and published data {Fong, 2019 #1185} suggested. The elevated NADPH/NAD+ ratio is another indication that glycolysis and anabolism are enhanced by PRMT1.

      (3) How was disease penetrance established following the 6133/PRMT1 transplant before MS023 treatment?

      Yes, the data was in figure 1f, demonstrating that the penetrance is 100%.  

      (4) The 6133/PRMT1 cells show elevated glycolysis compared to parental 6133; why did the author choose the 6133 cells for treatment with the MS023 and ECAR assay (Fig.3 b)? The same is confusing with OCR after inhibitor treatment in 6133 cells; the figure legend and results section description are inconsistent.

      Sorry for the mistakes while we are preparing the manuscript.  We used 6133/PRMT1 cells to be treated with MS023 in figure 4.

      (5) The discussion is too brief and incoherent and does not adequately address key findings. A comprehensive rewrite is necessary to improve coherence and depth.

      We agree with the reviewer. Now we added comprehensive review of PRMT1-mediated metabolism. The PRMT1 homolgous in yeast is called hmt1. In yeast, hmt1 is upregulated by glucose and enhance glycolysis. So PRMT1 enhanced glycolysis is a conserved pathway in eukaryocytic cells.

      (6) The materials and methods section lacks a description of statistical analysis, and significance is not indicated in several figures (e.g., Figures 1C, D, F; Figures 2D, E, F, I). Statistical significance must be consistently indicated. The methods section requires more detailed descriptions to enable replication of the study's findings.

      We have added extra details on the methods and statistical analysis for the figures.

      (7) Figures are hazy and unclear. They should be replaced with high-resolution images, ensuring legible text and data.

      We have prepared separate figure files with high resolution.

      (8) Correct the labeling in Figure 2I by removing the redundant "D."

      We would like to thank the reviewer and fixed the figure.

    1. eLife Assessment

      This study presents valuable findings on the regulation of survival and maintenance of brain-resident immune cells called microglia. Using compelling and sophisticated genetic tools, the authors demonstrate a gene dosage-dependent mechanism using which microglia are eliminated. This research on cell competition and survival will be of broad interest to the cell biology community.

    2. Reviewer #1 (Public review):

      Summary:<br /> The article entitled "Pu.1/Spi1 dosage controls the turnover and maintenance of microglia in zebrafish and mammals" by Wu et al., identifies a role for the master myeloid developmental regulator Pu.1 in the maintenance of microglial populations in the adult. Using a non-homologous end joining knock-in strategy, the authors generated a pu.1 conditional allele in zebrafish, which reports wildtype expression of pu.1 with EGFP and truncated expression of pu.1 with DsRed after Cre mediated recombination. When crossed to existing pu.1 and spi-b mutants, this approach allowed the authors to target a single allele for recombination and induce homozygous loss-of-function microglia in adults. This identified that although there is no short-term consequence to loss of pu.1, microglia lacking any functional copy of pu.1 are depleted over the course of months, even when spi-b is fully functional. The authors go on to identify reduced proliferation, increased cell death, and higher expression of tp53 in the pu.1 deficient microglia, as compared to the wildtype EGFP+ microglia. To extend these findings to mammals, the authors generated a conditional Pu.1 allele in mice and performed similar analyses, finding that loss of a single copy of Pu.1 resulted in similar long-term loss of Pu.1-deficient microglia. The conclusions of this paper are overall well supported by the data.

      Strengths:<br /> The genetic approaches here for visualizing recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or non-existent without the point of comparison and competition with the wildtype cells.

      Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.

      The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors argument.

      Weaknesses:<br /> This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.<br /> Recommendations for the authors It would be useful to investigate the relationship between pu.1 and tp53. The data presented here show that pu.1 deficient cells have higher expression of tp53, but this could be an indirect effect. However, since pu.1 has known DNA binding motifs, it would be worthwhile to investigate if there are any direct interactions between pu.1 and the tp53 locus -- does pu.1 directly bind and repress tp53 expression? This could be directly investigated with Cut & Run or an EMSA.

      The paper would likely also benefit from more in-depth discussion of the relationship of the zebrafish alleles and their relationship to mammalian Pu.1 -- as presented here, the authors are implicitly arguing that zebrafish pu.1 and spi-b are both more closely related to mammalian Pu.1 than to mammalian Spi-b. Clear argument, perhaps backed up by sequence alignment and homology matching, would help readers, especially those less familiar with zebrafish genome duplications.

      Comments on Revised Version (from BRE):

      The authors performed in silico analyses to support a regulatory relationship between Pu.1 and Tp53. They identified three putative Pu.1 binding sites within the zebrafish tp53 promoter region. Furthermore, they cite prior evidence demonstrating a similar interaction between PU.1 and members of the P53 family through direct DNA binding.

    3. Reviewer #2 (Public review):

      Summary:<br /> In the presented work by Wu et al. the authors investigate the role of the transcription factor Pu.1 in the survival and maintenance of microglia, the tissue resident macrophage population in the brain. To this end they generated a sophisticated new conditional pu.1 allele in zebrafish using CRISPR mediated genome editing which allows visual detection of expression of the mutant allele through a switch from GFP to dsRed after Cre-mediated recombination. Using EdU pulse-chase labelling, they first estimate the daily turnover rate of microglia in the adult zebrafish brain which was found to be higher than rates previously estimated for mice and humans. After conditional deletion of pu.1 in coro1a positive cells, they do not find a difference in microglia number at 2 and 8 days or 1 month post injection of Tamoxifen. However, at 3 month post injection, a strong decrease in mutant microglia could be detected. While no change in microglia number was detected at 1mpi, an increase in apoptotic cells and decreased proliferation as observed. RNA-seq analysis of WT and mutant microglia revealed an upregulation of tp53, which was shown to play a role in the depletion of pu.1 mutant microglia as deletion in tp53-/- mutants did not lead to a decrease in microglia number at 3mpi. Through analysis of microglia number in pU.1 mutants, the authors further show that the depletion of microglia in the conditional mutants is dependent on the presence of WT microglia. To show that the phenomenon is conserved between species, similar experiments were also performed in mice.

      This work expands on previous in vitro studies using primary human microglia. The majority of conclusions are well supported by the data, addition of controls and experimental details would strengthen the conclusions and rigor of the paper.

      Strengths:

      Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.<br /> The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.<br /> Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.

      Weaknesses:

      (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed is missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Fig. S7A).

      (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Fig.2E). A control of pu.1 KI/d839 mutants treated with 4-OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in requirement of pu.1 in embryonic and adult stages.

      Comments on Revised Version (from BRE):

      The authors have elaborated on the details of the RNA-Seq procedure and clarified the distinct phenotypes observed with global versus condition pu.1 knockout. In addition, the authors' proposed collaborative relationship between Pu.1 and Spi-b has been expanded in the revised manuscript. The authors have addressed all the minor concerns raised by the reviewer.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Strengths:

      The genetic approaches here for visualizing the recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or nonexistent without the point of comparison and competition with the wildtype cells.

      Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.

      The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors' argument.

      Weaknesses:

      This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.

      We agree with the reviewer’s assessment regarding the significance of the relationship between PU.1 and TP53. To investigate the potential interaction between Pu.1 and Tp53 in zebrafish, we analyzed the promoter region of zebrafish tp53. Indeed, we found three PU.1 binding sites (GAGGAA) on tp53 promoter, which locate on the antisense strand from position -1047 to -1042, -1098 to -1093 and -1423 to -1418 relative to the transcriptional start site (Fig. S10). These potential Pu.1 binding sites indicate a direct interaction between Pu.1 and tp53 locus. Furthermore, a previous study by Tschan et al. (2008) elucidated the mechanism by which PU.1 attenuates the transcriptional activity of the P53 tumor suppressor family through direct binding to the DNA-binding and/or oligomerization domains of p53/p73 proteins. We have also cited this study (Line 399-401) and included all above information in the discussion of the revised manuscript (Line 399-405).

      Reviewer #2 (Public review):

      Strengths:

      Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.

      The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.

      Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.

      Weaknesses:

      (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed are missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further, the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Figure S7A).

      We feel sorry for the unclearness of RNAseq procedures and have accordingly added the details about RNA-seq data analysis in the “Material and methods” section (Line 491-501). Briefly, reads were aligned to the zebrafish genome using the STAR package. Original counts were calculated with featureCounts package. Differential expression genes (DEGs) were identified with the DESeq2 package. Owing to the technical challenge of unambiguously distinguishing microglia from dendritic cells (DCs) in brain cell suspensions, we employed a strategy of isolating 3-5 cells per pool and quantifying the relative expression of the microglia-specific marker ccl34b.1 normalized to the DC-specific marker ccl19a.1. This approach aimed to reduce DC contamination in downstream analyses. Across all experimental groups subjected to RNA-seq analysis, the ccl34b.1/ccl19a.1 expression ratios exceeded 5, confirming microglia as the dominant cell population. Nonetheless, residual DC contamination in the RNA-seq data cannot be entirely ruled out. We have discussed this technical constraint in the revised manuscript to ensure methodological transparency (Line 498-501).

      (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Figure 2E). A control of pu.1 KI/d839 mutants treated with 4-OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in the requirement of pu.1 in embryonic and adult stages.

      We apologize for the omission of data regarding conditional pu.1 knockout alone in the embryos in our manuscript, which may have led to ambiguity. We would like to clarify that conditional pu.1 knockout alone at the embryonic stage does not induce microglial death (Fig S2). Microglial death occurs only in both embryonic and adult brains when Pu.1 is disrupted in the spi-b mutant background. The blebbing morphology of some microglia after pu.1 conditional knockout in adult spi-b mutant indicated microglia undergo apoptosis at both embryonic and adult stages (Figure S4 and Fig. S5). The reviewer’s concern likely arises from the distinct outcomes of global pu.1 knockout (Fig. 2) versus conditional pu.1 ablation (Fig. S2). Global knockout eliminates microglia during early development due to Pu.1’s essential role in myeloid lineage specification. We have included this clarification in the revised manuscript (Line 208-211).

      (3) The number of microglia after pu.1 knockout in zebrafish did only show a significant decrease 3 months after 4-OHT injection, whereas microglia were almost completely depleted already 7 days after injection in mice. This major difference is not discussed in the paper.

      We propose that zebrafish Pu.1 and Spi-b function cooperatively to regulate microglial maintenance, analogous to the role of PU.1 alone in mice. This cooperative mechanism likely explains the observed difference in microglial depletion kinetics between zebrafish and mice following pu.1 conditional knockout. Specifically, the compensatory activity of Spi-b in zebrafish may buffer the immediate loss of Pu.1, whereas in mice, the absence of Spi-b expression in microglia eliminates this redundancy, resulting in rapid microglial depletion. Furthermore, during evolution, SPI-B appears to have acquired lineage-specific roles, becoming absent in microglia. We have included the clarification in the revised manuscript (Line 302-305).

      (4) Data is represented as mean +/-.SEM. Instead of SEM, standard deviation should be shown in all graphs to show the variability of the data. This is especially important for all graphs where individual data points are not shown. It should also be stated in the figure legend if SEM or SD is shown

      We have represented our data as mean ± SD in the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor:

      To further strengthen the manuscript, we ask the authors to address the reviewers' comments through additional experiments where necessary. In cases where certain experiments may be challenging, we encourage the authors to address these concerns within the text, such as by referencing any prior evidence of pu.1 and tp53 interactions or incorporating in silico analyses that support such interaction.

      As suggested, we have performed in-silico analysis of Pu.1 binding sites in zebrafish tp53 promoter and also cited previous paper showing how PU.1 attenuates the transcriptional activity of the P53 tumor suppressor family (Line 399-405).

      Reviewer #1 (Recommendations for the authors):

      It would be useful to investigate the relationship between pu.1 and tp53. The data presented here show that pu.1 deficient cells have higher expression of tp53, but this could be an indirect effect. However, since pu.1 has known DNA binding motifs, it would be worthwhile to investigate if there are any direct interactions between pu.1 and the tp53 locus -- does pu.1 directly bind and repress tp53 expression? This could be directly investigated with Cut & Run or an EMSA.

      The interaction between Pu.1 and Tp53 has been discussed in the public review section.

      The paper would likely also benefit from a more in-depth discussion of the relationship of the zebrafish alleles and their relationship to mammalian Pu.1 -- as presented here, the authors are implicitly arguing that zebrafish pu.1 and spi-b are both more closely related to mammalian Pu.1 than to mammalian Spi-b. A clear argument, perhaps backed up by sequence alignment and homology matching, would help readers, especially those less familiar with zebrafish genome duplications.

      We have conducted detailed sequence alignment in our previous work (Yu et al., 2017, Blood) and found zebrafish Spi-b shares the highest similarity with the mammalian SPI-B among Ets family transcription factors in zebrafish. A unique P/S/T-rich region known to be essential for mammalian SPI-B transactivation activity is present in zebrafish Spi-b. Our data do not support the interpretation that Spi-b is more closely related to mammalian Pu.1 than to Spi-b. Instead, functional compensation between pu.1 and spi-b in microglia maintenance likely reflects their shared role as Ets-family transcriptional regulators, rather than ortholog-driven redundancy.

      Reviewer #2 (Recommendations for the authors):

      (1) The nomenclature of the genes in the SPI family in zebrafish is somewhat confusing as genes were renamed several times. It would make it easier for the reader to understand if in the abstract and the main text, spi-b would be referred to as the zebrafish orthologue of mouse SPI-B (as determined by the authors in previous work) rather than the paralogue of zebrafish pu.1. To clarify which genes were analyzed in both zebrafish and mouse, Gene accession numbers should be added.

      Thanks for the recommendations. We have changed “the paralogue of zebrafish pu.1” to “the orthologue of mouse Spi-b” in the abstract (Line 22) and added gene accession numbers for both zebrafish and mouse gene (Line 105-106 and 301-302).

      (2) Methods RNA-seq: Details on how the aligned reads were analyzed to detect differentially expressed genes are missing and should be added. In addition, a table with read counts, fold changes and adjusted p values should be added.

      We have added details of RNA-seq analysis in the Material and Methods part (Line 491-501). A table generated by Deseq2 has been included as a supplemental file to show read counts, fold changes and adjusted p values (Supplemental file 2).

      (3) Figure 2H: It would be helpful to the reader if the KO splicing would be shown in comparison to WT splicing.

      Thank you for your suggestion. We have added the sequence result between exon 3 and exon 4 of pu.1 from wildtype cDNA to show WT splicing in Figure 2H.

      (4) Legend Figure 5C. Relative expression should be replaced with transcripts per million (TPM).

      We have corrected it in the figure legend of Figure 5C (Line 786-787).

      (5) In Figure S3. the label on the y-axis in panel B is not visible.

      We apologize for the mistake during figures assembling. We have corrected it and now the y-axis is visible.

      (6) In Figure S7B an explanation for the colors in the heat map is missing and should be added.

      Colors represent scaled TPM values. The red color represents high expression while the blue color represents low expression. We have added the information in the figure legend.

      (7) A justification for the use of male mice only should be added or additional experiments in female mice should be performed.

      Female mice were excluded to avoid variability associated with estrous cycle-dependent hormonal changes, which are known to influence microglial behavior (Habib P et al., 2015). We have added a justification in the revised manuscript (Line 547-548).

      (8) The manuscript would benefit from some language editing. A few examples are listed below:

      a) line 97: the rostral blood (RBI) should read the rostral blood island.

      b) line 373 typo: nucleus translocation should read nuclear translocation.

      c) line 393 typo: pu.1-dificent should read pu.1-deficient.

      We apologize for the typos or grammar mistakes in the manuscript. We have checked the manuscript thoroughly and revised those typos or grammar mistakes.

      Reference:

      Tschan MP, Reddy VA, Ress A, Arvidsson G, Fey MF, Torbett BE (2008) PU.1 binding to the p53 family of tumor suppressors impairs their transcriptional activity. Oncogene 27: 3489-93

      Yu T, Guo W, Tian Y, Xu J, Chen J, Li L, Wen Z (2017) Distinct regulatory networks control the development of macrophages of different origins in zebrafish. Blood 129: 509-519

      Habib P, Beyer C (2015) Regulation of brain microglia by female gonadal steroids. J Steroid Biochem Mol Biol 146: 3-14

    1. eLife Assessment

      This fundamental work has the potential to advance our understanding of brain activity using electrophysiological data, by proposing a completely new approach to reconstructing EEG data that challenges the assumptions typically made in the solutions to Maxwell’s equations. Convincing evidence for the superior spatio-temporal resolution of this method is provided through a number of experiments, including simultaneous FMRI/EEG acquisitions. This work will be of broad interest to neuroscientists and neuroimaging.

    1. eLife Assessment

      The authors investigated the mechanisms underlying the pause in striatal cholinergic interneurons (SCINs) induced by thalamic input, identifying that Kv1 channels play a key role in this burst-dependent pause. The experimental evidence is convincing.<br /> The study provides important mechanistic insights into how burst activity in SCINs leads to a subsequent pause, highlighting the involvement of D1/D5 receptors.

    2. Reviewer #1 (Public review):

      Summary:<br /> Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the L-DOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the L-DOPA off state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that 1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and 2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths:<br /> The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      The authors addressed all concerns I raised. Presented data are convincing and support their claims.

    3. Reviewer #2 (Public review):

      Summary:<br /> This manuscript by Tubert et al. presents the role of D5 receptors (D5R) in regulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their findings provide a compelling model explaining the "on/off" switch of the CIN pause, driven by the distinct dopamine affinities and the balance of D2R and D5R. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of the pause response in LID mice and demonstrates the restore of the pause through D1/D5 inverse agonist clozapine.

      Strengths:<br /> The study presents solid findings, and the writing is logically structured and easy to follow. The experiments are well-designed, properly combining ex vivo electrophysiology recording, optogenetics, and pharmacological treatment to dissect / rule out most, if not all, alternative mechanisms in their model.

      Weaknesses (fixed in this revision):<br /> In this round of revision, the authors have included additional experiments examining the role of D2R, and the possible clozapine effects on serotonin receptors in the LID off -L-DOPA ex vivo slices. Although, to our surprise, D2R agonism using quinpirole and sumanirole failed to restore the CIN pause, this study still provides new insights into the balance between D2R and D5R in modulating CIN pause.

      Overall, the authors' response adequately addressed concerns raised in the previous revision.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the L-DOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the L-DOPA off state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that 1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and 2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths

      The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      Weaknesses:

      Although the paper has strengths in its methodological approaches, there is a significant gap between the presented data and the authors' claims.

      The authors answered the most of concerns I raised. However, the critical issue remains unresolved.

      I am still not convinced by the results presented in Fig. 6 and their interpretation. Since Clozapine acts as an agonist in the absence of an endogenous agonist, it may stimulate the D5R-cAMP-Kv1 pathway. Stimulation of this pathway should abolish the pause response mediated by thalamic stimulation in SCINs, rather than restoring the pause response. Clarification is needed regarding how Clozapine reduces D5R-ligand-independent activity in the absence of dopamine (the endogenous agonist). In addition, the author's argued that D5R antagonist does not work in the absence of dopamine, therefore solely D5R antagonist didn't restore the pause response. However, if D5R-cAMP-Kv1 pathway is already active in L-DOPA off state, why D5R antagonist didn't contribute to inhibition of D5R pathway? Since Clozapine is not D5 specific and Clozapine experiments were not concrete, I recommend testing whether other receptors, such as the D2 receptor, contribute to the Clozapine-induced pause response in the L-DOPA-off state.

      Thank you for the opportunity to clarify this point. It seems there may have been a misunderstanding regarding our proposal about clozapine's mechanism of action. We are not suggesting that clozapine acts as an agonist, but rather as an “inverse agonist”. Unlike classical agonists, inverse agonists produce a pharmacological effect opposite to that of an agonist. Although clozapine is best known for its antagonistic effects on dopamine and serotonin receptors, under conditions where no endogenous agonist is present, it has been shown to reduce the constitutive activity of D1 and D5 receptors (PMID: 24931197). This is explained in lines 240-254 in the Results section.

      In contrast, the prototypical and selective D1/D5 receptor antagonist SCH23390 does not exhibit inverse agonist properties and would not be expected to produce effects in the absence of an agonist (PMID: 7525564). The observation that SCH23390 blocks the effects of clozapine in dopamine-depleted animals strongly supports the idea that clozapine acts through D1/D5 receptors. This is now clarified in lines 257264.

      To further address your comments, we now include a new figure (Figure 6) presenting experiments that show D2-type receptor agonists do not restore the pause response in dyskinetic mice in the off-L-DOPA condition. These results are described in a new subsection of the Results section and discussed in a newly added paragraph in the Discussion (lines 369-380).

      Finally, to exclude a potential contribution of serotonin receptors to clozapine’s effects, we have expanded what is now Figure 7 (formerly Figure 6) to show that clozapine continues to restore the pause response even in the presence of a serotonin receptor antagonist in the bath.

      All these results are further discussed in lines 342-360.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Tubert et al. presents the role of D5 receptors (D5R) in regulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their findings provide a compelling model explaining the "on/off" switch of the CIN pause, driven by the distinct dopamine affinities of D2R and D5R. This mechanism, coupled with varying dopamine states, is likely critical for modulating synaptic plasticity in cortico-striatal circuits during motor learning and execution. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of the pause response in LID mice and demonstrates the restore of the pause through D1/D5 inverse agonism.

      Strengths:

      The study presents solid findings, and the writing is logically structured and easy to follow. The experiments are well-designed, properly combining ex vivo electrophysiology recording, optogenetics, and pharmacological treatment to dissect / rule out most, if not all, alternative mechanisms in their model.

      Weaknesses:

      While the manuscript is overall satisfying, one conceptual gap needs to be further addressed or discussed: the potential "imbalance" between D2R and D5R signaling due to the ligand-independent activity of D5R in LID. Given that D2R and D5R oppositely regulate CIN pause responses through cAMP signaling, investigating the role of D2R under LID off L-DOPA (e.g., by applying D2 agonists or antagonists, even together with intracellular cAMP analogs or inhibitors) could provide critical insights. Addressing this aspect would strengthen the manuscript in understanding CIN pause loss under pathological conditions.

      Thank you for your comments. Although our primary focus is on the role of D5 receptors, we have also investigated the effects of two D2-type receptor agonists in dyskinetic mice in the off-L-DOPA condition. We found that neither quinpirole nor sumanirole was able to restore the pause response. These results are presented in Figure 6 and related text in the Results and Discussion sections.

      Understanding why D2 agonists fail to restore the pause response—despite their expected effect of reducing cAMP levels—is an important question that warrants further investigation. Interestingly, previous studies have reported paradoxical effects of D2 receptor stimulation in SCINs in animal models of dystonia (PMID: 16934985, PMID: 21912682), as well as under conditions where the SCIN’s constitutively active integrated stress response is diminished (PMID: 33888613). This is now discussed in lines 369-380.

      Reviewer #3 (Public review):

      Summary:

      Tubert et al. investigate the mechanisms underlying the pause response in striatal cholinergic interneurons (SCINs). The authors demonstrate that optogenetic activation of thalamic axons in the striatum induces burst activity in SCINs, followed by a brief pause in firing. They show that the duration of this pause correlates with the number of elicited action potentials, suggesting a burst-dependent pause mechanism. The authors demonstrated this burst-dependent pause relied on Kv1 channels. The pause is blocked by a SKF81297 and partially by sulpiride and mecamylamine, implicating D1/D5 receptor involvement. The study also shows that the ZD7288 does not reduce the duration of the pause, and that lesioning dopamine neurons abolishes this response, which can be restored by clozapine.

      Weaknesses:

      While this study presents an interesting mechanism for SCIN pausing after burst activity, there are several major concerns that should be addressed:

      (1) Scope of the Mechanism: It is important to clarify that the proposed mechanism may apply specifically to the pause in SCINs following burst activity. The manuscript does not provide clear evidence that this mechanism contributes to the pause response observed in behavioral animals. While the thalamus is crucial for SCIN pauses in behavioral contexts, the exact mechanism remains unclear. Activating thalamic input triggers burst activity in SCINs, leading to a subsequent pause, but this mechanism may not be generalizable across different scenarios. For instance, approximately half of TANs do not exhibit initial excitation but still pause during behavior, suggesting that the burstdependent pause mechanism is unlikely to explain this phenomenon. Furthermore, in behavioral animals, the duration of the pause seems consistent, whereas the proposed mechanism suggests it depends on the prior burst, which is not aligned with in vivo observations. Additionally, many in vivo recordings show that the pause response is a reduction in firing rate, not complete silence, which the mechanism described here does not explain. Please address these in the manuscript.

      Thank you for the opportunity to clarify these points. We acknowledge that the response of SCINs to optogenetic stimulation of thalamic afferents in brain slices represents a model system that may not capture all aspects of TAN responses to behaviorally salient events. Nevertheless, this approach allows us to test mechanistic hypotheses that are difficult to address in behaving animals with current technologies. This is now stated in lines 311-314.

      Importantly, our ex vivo preparation reproduces, for the first time, the loss of TAN responses observed in non-human primates with parkinsonism, enabling investigation of the underlying mechanisms. In line with your suggestion, we have expanded the Discussion (third and fourth paragraphs) to address the sources of variability in pause responses.

      (2) Terminology: The use of "pause response" throughout the manuscript is misleading. The pause induced by thalamic input in brain slices is distinct from the pause observed in behavioral animals. Given the lack of a clear link between these two phenomena in the manuscript, it is essential to use more precise terminology throughout, including in the title, bullet points, and body of the manuscript.

      Thank you for raising this important point. We agree that it is essential to be precise in describing the nature of the pause observed in our ex vivo model. While we believe that readers would recognize from the abstract and methods that our study focuses on a model of the pause response, we understand your concern about potential confusion. In response, we have revised the terminology in the abstract, bullet points, and throughout the manuscript to more clearly reflect that we are describing an ex vivo model of the pause observed in behaving animals.

      (3) Kv1 Blocker Specificity: It is unclear how the authors ruled out the possibility that the Kv1 blocker did not act directly on SCINs. Could there be an indirect effect contributing to the burst-dependent pause?

      Clarification on this point would strengthen the interpretation of the results.

      This issue is addressed in lines 147-150.

      (4) Role of D1 Receptors: While it is well-established that activating thalamic input to SCINs triggers dopamine release, contributing to SCIN pausing (as shown in Figure 3), it would be helpful to assess the extent to which D1 receptors contribute to this burst-dependent pause. This could be achieved by applying the D1 agonist SKF81297 after blocking nAChRs and D2 receptors.

      Figure 3C shows that the D1/D5 receptor antagonist SCH23390 does not modify the pause, while the full D1/D5 agonist SKF81297 abolishes it, indicating that in our slice preparation, baseline dopamine levels are not contributing to the pause through D1/D5 receptor stimulation.

      (5) Clozapine's Mechanism of Action: The restoration of the burst-dependent pause by clozapine following dopamine neuron lesioning is interesting, but clozapine acts on multiple receptors beyond D1 and D5. Although it may be challenging to find a specific D5 antagonist or inverse agonist, it would be more accurate to state that clozapine restores the burst-dependent pause without conclusively attributing this effect to D5 receptors.

      As explained in our response to Reviewer #1, the effect of clozapine is blocked by the D1/D5-selective antagonist SCH23390. Additionally, new data presented in Figure 7C show that clozapine's ability to restore the pause response is maintained even in the presence of a broad-spectrum serotonin receptor antagonist. Since SCINs do not significantly express D1 receptors, we believe that these findings strongly support a role for D5 receptors in SCINs.

      Comments on revisions:

      The authors have addressed many of my concerns. However, I remain unconvinced that adding an 'ex vivo' experiment fully resolves the fundamental differences between the burst-dependent pause observed in slices - defined by the duration of a single AHP - and the pause response in CHINs observed in vivo, which may involve contributions from more than one prolonged AHP. In vivo, neurons can still fire action potentials during the pause, albeit at a lower frequency. Moreover, in behaving animals, pause duration does not vary with or without initial excitation. The mechanism proposed demonstrates that the pause duration, defined by the length of a single AHP, is positively correlated with preceding burst activity.

      As discussed in paragraphs 3 and 4 of the Discussion (starting at line 285), and illustrated in Figure 1J–K, our data show that the duration of the pause can be modulated by rebound excitation from thalamic input. The absence of this rebound allows us to observe a longer pause when more spikes are elicited during the initial excitatory phase, providing a clearer readout of the contribution of intrinsic membrane mechanisms. We do not claim that intrinsic mechanisms alone account for the entire phasic response of SCINs in behaving animals (lines 295-303 in Discussion).

      To improve clarity, I recommend using the term 'SCIN pause' to describe the ex vivo findings, distinguishing them more explicitly from the 'pause response' observed in behaving animals. This distinction would help contextualize the ex vivo findings as potentially contributing to, but not fully representing, the pause response in vivo.

      We did changes in the abstract, bullet points, and main text to clarify that we are not studying the in vivo response.

      Again, it would be helpful to present raw data for pause durations rather than relying solely on ratios. This approach would provide the audience with a clearer understanding of the absolute duration of the burst-dependent pause and allow for better comparison to the ~200 ms pause observed in behaving animals.

      Thank you for your comment. Following your suggestion, we provide the average pause durations for the data shown in Figure 1H (lines 127–130). We opted not to include raw pause durations in the main text for all figures, as this would make the manuscript more difficult to read and, in our view, is unnecessary. The figures already allow readers to estimate absolute durations: in each case, pause durations are shown relative to baseline ISIs in one panel, while the corresponding absolute ISIs are shown side-by-side. This provides a clearer understanding of pause magnitude relative to the cell’s spontaneous firing, which is more informative than absolute values alone, since one would expect a pause to be longer than the average ISI. Please note that baseline ISI are significantly shorter in dyskinetic mice (Figure 5l). Showing the pause duration relative to baseline ISI allows readers to readily compare results across figures regardless of changes in SCIN baseline firing rate.

      Additionally, it is important to note that, in vivo, pause durations are typically inferred from perievent time histograms (PETHs), which represent population averages across many trials. In contrast, in our ex vivo studies, we measured pause duration on a trial-by-trial basis. This approach enables us to analyze how the pause duration varies as a function of the initial burst size in the same neuron—something not typically reported in in vivo studies. As described in the first two paragraphs of the Results, the same SCIN may respond with a different number of spikes in successive trials, and this variability is influenced by factors such as the timing of the last spontaneous spike relative to stimulation onset (Figure 1D–F). We are not aware of studies reporting trial-by-trial analyses of pause duration in behaving animals, particularly in relation to the strength of initial excitation. Therefore, while our slice preparation may yield pause durations that are longer than those observed in vivo, direct comparison to PETH-derived pause durations from behaving animals is not straightforward.

    1. eLife Assessment

      This timely and important study used functional near-infrared spectroscopy hyperscanning to examine the neural correlates of how group identification influences collective behavior. The work provides solid evidence to indicate that the synchronization of brain activity between different people underlies collective performance and that changes in brain activity patterns within individuals may, in turn, underlie this between-person synchrony, although the order in which different task stages were completed could not be counter-balanced. This study will be of interest to researchers investigating the neuroscience of social behaviour.

    2. Reviewer #1 (Public review):

      The article provides a timely and well-written examination of how group identification influences collective behaviors and performance using fNIRs and behavioral data.

      Strengths:

      (1) Timeliness and Relevance:<br /> The topic is highly relevant, particularly in today's interconnected and team-oriented work environments. Triadic hyperscanning is important to understand group dynamics, but most previous work has been limited to dyadic work.

      (2) Comprehensive Analysis:<br /> The authors have conducted extensive analyses, offering valuable insights into how group identification affects collective behaviors.

      (3) Clear Writing:<br /> The manuscript is well-written and easy to follow, making complex concepts accessible.

      Comments on previous revisions:

      Most reviewer concerns have been addressed in the revised manuscript, but some limitations persist with respect to core aspects of study design, such as the long block durations and lack of counter-balancing.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We are appreciative of the reviewers’ and editors’ constructive suggestions of manuscript, which have helped us to improve our manuscript. We have made considerable revisions to our details of data analyses.

      The reason that the reviews did not change is that there were really three central points that led to the "incomplete". These were (1) the fact that there was potentially a selection bias due to double dipping, and (2) there was potentially a time-confound due to the lack of counterbalancing (3) There is confusion about how the modeling was done, but it seems like the modelling was of the complete block (rather than tied to specific events in that block).

      (1) Double dipping

      We appreciate the opportunity to explain our robust safeguards against double-dipping and have provided detailed clarifications regarding the data analyses (pp.11-14).Our study ensures statistical independence between task-related region selection and hypothesis testing through three orthogonal mechanisms:

      (1) Regressor Orthogonality:Statistical Independence Between Selection and Testing

      The selection regressor (group mean activation) was mathematically independent from test regressors (group differences, behavioral scores). This was confirmed through our GLM implementation: First-level: Task vs. rest contrast (β values) for each participant; Second-level: One-sample t-tests (selection) vs. independent group/behavioral tests.

      (2) Multimodal Validation: Complementary Neural and Behavioral Measures

      We employed multiple distinct metrics to provide convergent yet independent validation of effects.

      Neural Measures: Three orthogonal indices assessed different neural dimensions.

      A. Single-brain activation examines neural activity patterns within individual decision-makers,

      B. while within-group neural synchronization (GNS) quantifies the temporal alignment of neural activity across interacting group members during shared decision processes.

      C. Functional connectivity (FC) analyses, by contrast, measure correlated activity between different brain regions within individual participants.

      Behavioral Safeguards: Behavioral metrics were analyzed in independent regressions, avoiding circularity.

      A. Individual performance was based on personal accuracy,

      B. collective performance represented the group-level average accuracy across raters, and

      C. their similarity was quantified as the Euclidean distance between individual and collective scores.

      (3) Statistical Safeguards

      We further ensured independence by applying strict FDR correction at both selection (p < 0.05) and testing stages (p < 0.05). Besides, permutation test was conducted, we tested 1,000 pseudo-group iterations for GNS null distributions.

      Drawing on both classic and latest NIRS (e.g., Jiang et al., 2015; Liu et al., 2023; Stolk et al., 2016; Xie et al., 2023) and NIRS hyperscanning studies (e.g., Liu et al., 2019; P’arnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), we performed the data analyses. Below, we provide the details of our data analysis:

      Single-brain activation. To identify task-related brain regions (channels), we used a one-sample t-test based on brain activation data from all participants during the task compared to the baseline (resting state).

      (1)  Data Collection: Each participant had brain activation data (HbO signals measured by fNIRS) during the task (the entire process of reading, sharing, discussing, and decision-making) and the resting state (baseline).

      (2)  Pre-processing: We sought to explore the neural mechanisms that manipulated group identification and its effect on collective performance. Data were preprocessed using the Homer2 package in MATLAB 2020b (Mathworks Inc., Natick, MA, USA). First, motion artifacts were detected and corrected using a discrete wavelet transformation filter procedure. After that, the raw intensity data were converted to optical density (OD) changes. Then, kurtosis-based wavelet filtering (Wav Kurt) was applied to remove motion artifacts with a kurtosis threshold of 3.3 (Chiarelli, Maclin, Fabiani, & Gratton, 2015). Based on a prior multi-brain study of social interactions (Cheng et al., 2022), the output was bandpass filtered using a Butterworth filter with order 5 and cut-offs at 0.01 and 0.5 Hz to remove longitudinal signal drift and instrument noise. Finally, OD data were converted to HbO concentrations.

      (3) Individual-Level Analysis: First, a GLM was used to compute the "task vs. rest" brain activation contrast for each participant [0,1], obtaining each individual's "task effect" value (β value, representing task activation strength).

      (4) Group-Level Analysis: These "task effect" values from all participants were then aggregated, and a one-sample t-test was performed for each brain region (or channel) to determine whether the average activation in that region was significantly greater than 0 (i.e., significantly more active during the task compared to the resting state).

      (5) Task-Related Regions: If the t-test result for a brain region was significant (p < 0.05, FDR-corrected), we considered that region "task-related" and suitable for further analysis.

      (6) Subsequent Tests:

      - Group Comparisons: We examined differences in activation between groups (e.g., high vs. low group identification) using independent t-tests on the same task vs. baseline contrast.

      - Behavioral Correlations: We analyzed relationships between task-related activation (β values) and behavioral scores (e.g., individual performance) using Pearson analyses.

      - Mediation model: We examined the relationship between an individual's perceived group identification and individual performance, which was mediated by task-related activation (β values).

      Within-Group Neural Synchronization (GNS).

      (1) Data Collection and Pre-processing as above

      (2) Calculation: WTC was applied to generate the brain-to-brain coupling of each pair in each triad (Coherence1&2, Coherence 1&3, and Coherence 2&3). Then, three coherence values from three pairs were averaged as the GNS for each triad, that is, GNS = (Coherence 1&2 + Coherence 1&3 + Coherence 2&3) / 3.

      (3) Task-Related Regions: Time-averaged GNS (also averaged across channels in each group) was compared between the baseline session (i.e., the resting phase) and the task session (from reading information to making decisions) using a series of one-sample t-tests. When determining the frequency band of interest, the time-averaged GNS was also averaged across channels. After that, we analyzed the time-averaged GNS of each channel. Then, channels showing significant GNS were regarded as regions of interest and included in subsequent analyses.

      (4) Permutation test: The nonparametric permutation test was conducted on the observed interaction effects on GNS of the real group against the 1,000 permutation samples.

      (5) Subsequent Tests:

      - Group Comparisons: We examined differences in activation between groups (e.g., high vs. low group identification) using independent t-tests on the same task vs. baseline contrast.

      - Behavioral Correlations: The Pearson’s correlation between GNS and collective performance (i.e., calculated by averaging the individual scores assigned by the three raters for each group) was performed.

      -  Mediation model: We examined how GNS mediated the relationship between group identification and collective performance.

      The brain activation connectivity.

      (1) Data Collection and Pre-processing as above

      (2) Calculation: Exploratory Pearson’s correlations between individual performance related HbO and collective performance-related HbO.

      (3) Moderation analysis: Single-brain activation × connectivity → GNS.

      (2) Counterbalancing.

      We sincerely appreciate this valuable methodological insight. Building on prior group decision-making research (De Wilde et al., 2017; Stasser et al., 1992), we refined all stages to enhance experimental control and procedural clarity throughout the process (i.e., a. Reading information, b. Sharing private information, c. Discussing information, d. Decision) (Xie et al., 2023). Importantly, we maintained a fixed task sequence to preserve ecological validity, as this progression mirrors natural group decision-making dynamics.

      While this design choice precludes sequential counterbalancing, several factors mitigate potential temporal confounds: (1) random assignment and uniform task timing across conditions minimize systematic between-group differences; (2) our whole-block GLM approach captures sustained decision-related neural activity rather than phase-specific effects; and (3) We fully acknowledge this limitation and will incorporate a detailed discussion of temporal considerations in the revised manuscript, while noting that our design provides unique advantages for studying naturalistic decision-making processes.

      (3) The modelling was of the complete block

      In our revised manuscript, we have explicitly stated that the analysis was performed at the block level rather than the event level, for the following reasons:

      (1) The hidden profile task is inherently a “group decision-making process” that unfolds dynamically across multiple stages (reading, sharing, discussing, and deciding). Prior research in this paradigm (De Wilde et al., 2017; Stasser & Titus, 1985; Xie et al., 2023) has consistently treated these phases as integrated blocks because the key cognitive and social processes (e.g., information integration, deliberation, and consensus formation) occur over extended interactions rather than discrete events.

      (2) Methodologically, our fNIRS hyperscanning approach requires longer blocks to reliably capture the slow hemodynamic response and the gradual emergence of inter-brain neural synchronization during naturalistic social exchanges (Cui et al., 2012; Liu et al., 2019). Event-related designs, while useful for transient stimuli, are less suited for studying prolonged, interactive decision-making where neural coupling develops over time. Thus, our block-based analysis aligns with both the cognitive demands of the task and the neuroimaging constraints, ensuring robust detection of group-level neural dynamics.

    1. eLife Assessment

      This manuscript presents a clever and powerful approach to examining differential roles of Nav1.2 and Nav1.6 channels in excitability of neocortical pyramidal neurons, by engineering mice in which a sulfonamide inhibitor of both channels has reduced affinity for one or the other channels. Overall, the results in the manuscript are compelling and give important information about differential roles of Nav1.6 and Nav1.2 channels. Activity-dependent inactivation of NaV1.6 was also found to attenuate seizure-like activity in cells, demonstrating the promise of activity-dependent NaV1.6-specific pharmacotherapy for epilepsy.

    2. Reviewer #1 (Public review):

      Summary:

      Prior research indicates that NaV1.2 and NaV1.6 have different compartmental distributions, expression timelines in development, and roles in neuron function. The lack of subtype-specific tools to control Nav1.2 and Nav1.6 activity however has hampered efforts to define the role of each channel in neuronal behavior. The authors attempt to address the problem of subtype specificity here by using aryl sulfonamides (ASCs) to stabilize channels in the inactivated state in combination with mice carrying a mutation that renders NaV1.2 and/or NaV1.6 genetically resistant to the drug. Using this innovative approach, the authors find that action potential initiation is controlled by NaV1.6 while both NaV1.2 and NaV1.6 are involved in back-propagation of the action potential to the soma, corroborating previous findings. Additionally, NaV1.2 inhibition paradoxically increases firing rate, as has also been observed in genetic knockout models. Finally, the potential anticonvulsant properties of ASCs were tested. NaV1.6 inhibition but not NaV1.2 inhibition was found to decrease action potential firing in prefrontal cortex layer 5b pyramidal neurons in response to current injections designed to mimic inputs during seizure. This result is consistent with studies of loss-of-function Nav1.6 models and knockdown studies showing that these animals are resistant to certain seizure types. These results lend further support for the therapeutic promise of activity-dependent, NaV1.6-selective, inhibitors for epilepsy.

      Strengths:

      (1) The chemogenetic approaches used to achieve selective inhibition of NaV1.2 and NaV1.6 are innovative and help to resolve long-standing questions regarding the role of Nav1.2 and Nav1.6 in neuronal electrogenesis.

      (2) The experimental design is overall rigorous, with appropriate controls included.

      (3) The assays to elucidate the effects of channel inactivation on typical and seizure-like activity were well selected.

      Weaknesses:

      (1) As discussed in the revised manuscript, the fact that channels are only partially blocked by the ASC and that ASCs act in a use-dependent manner complicates the interpretation of the effects of NaV1.2 versus NaV1.6 on neuronal activity.

      (2) The idea that use-dependent VGSC-acting drugs may be effective antiseizure medications is well established. Additional discussion of the existing, widely used, use-dependent VGSC drugs (e.g. Carbamazepine, Lamotrigine, Phenytoin) would improve the manuscript. Also, the idea that targeting NaV1.6 may be effective for seizures is established by studies using genetic models, knockdown, and partially selective pharmacology (e.g. NBI-921352). Additional discussion of how the results reported here are consistent with or differ from studies using these alternative approaches would improve the discussion.

    3. Reviewer #2 (Public review):

      The authors used a clever and powerful approach to explore how Nav1.2 and Nav1.6 channels, which are both present in neocortical pyramidal neurons, differentially control firing properties of the neurons. Overall, the approach worked very well, and the results show very interesting differences when one or the other channel is partially inhibited. The experimental data is solid and the experimental data is very nicely complemented by a computational model incorporating the different localization of the two types of sodium channels.

      The revised manuscript has re-organized figures that make the results and interpretation easier to follow.

    1. eLife Assessment

      This study presents a valuable investigation into how heavy metal stress may have influenced the domestication of maize from its wild ancestor, teosinte parviglumis, focusing on specific ATPase genes with proposed roles in heavy metal homeostasis. The evidence supporting the main claims is incomplete, with suggestive but not definitive data linking gene function to domestication traits, and limited environmental context for the hypothesized selection pressures. While the work introduces an interesting model connecting environmental stress responses to evolutionary transitions and highlights underexplored aspects of teosinte plasticity, the conclusions would benefit from more comprehensive analyses such as transcriptomics, a broader survey of loci, and stronger paleoenvironmental validation. The study will be of interest to researchers in plant evolution and domestication, but currently lacks the analytical depth to fully support its central hypothesis.

    2. Reviewer #1 (Public review):

      In this study, Acosta-Bayona et al. aim to better understand how environmental conditions could have influenced specific gene functions that may have been selected for during the domestication of teosinte parviglumis into domesticated maize. The authors are particularly interested in identifying the initial phenotypic changes that led to the original divergence of these two subspecies. They selected heavy metal (HM) stress as the condition to investigate. While the justification for this choice remains speculative, paleoenvironmental data would add value; the authors hypothesize that volcanic activity near the region of origin could have played a role.

      The authors exposed both maize and teosinte parviglumis to a fixed dose of copper and cadmium, representing an essential and a non-essential element, respectively. They assessed shoot and root phenotypic traits at a defined developmental stage in plants exposed to HM stress versus controls. They then focused on three genes already known to help plants manage HM stress: ZmHMA1, ZmHMA7, and ZmSKUs5. Two of these genes are located in a genomic region linked to traits selected during domestication. A closer examination of nucleotide variability in the coding and flanking regions of these genes provided evidence of selective pressure among teosinte parviglumis, maize, and the outgroup Tripsacum dactyloides.

      They further generated a null mutant for ZmHMA1 and showed, for the first time in maize, a pleiotropic phenotype reminiscent of traits associated with the domestication syndrome. Finally, using qPCR, they reported increased expression of the domestication gene Teosinte branched1 (tb1) in teosinte parviglumis under HM stress. Comparative studies focusing on teosinte parviglumis and the genes ZmHMA1, ZmHMA7, and ZmSKUs5 under HM stress are limited; thus, this phenotypic characterization provides a promising starting point for further understanding the genetic basis of the response.

      The dataset is of good quality, but the conclusions are not sufficiently supported by the data. Analyses should be expanded, and additional experiments included to strengthen the findings.

      (1) Although the paper presents some interesting findings, it is difficult to distinguish which observations are novel versus already known in the literature regarding maize HM stress responses. The rationale behind focusing on specific loci is often lacking. For example, a statistically significant region identified via LOD score on chromosome 5 contains over 50 genes, yet the authors focus on three known HM-related genes without discussing others in the region. It is unclear why ZmHMA1 was selected for mutagenesis over ZmHMA7 or ZmSKUs5.

      (2) The idea that HM stress impacted gene function and influenced human selection during domestication is of interest. However, the data presented do not convincingly link environmental factors with human-driven selection or the paleoenvironmental context of the transition. While lower nucleotide diversity values in maize could suggest selective pressure, it is not sufficient to infer human selection and could be due to other evolutionary processes. It is also unclear whether the statistical analysis was robust enough to rule out bias from a narrow locus selection. Furthermore, the addition of paleoclimate records (Paleoenvironmental Data Sources as a starting point) or conducting ecological niche modeling or crop growth models incorporating climate and soil scenarios would strengthen the arguments.

      (3) Despite the interest in examining HM stress in maize and the presence of a pleiotropic phenotype, the assessment of the impact of gene expression is limited. The authors rely on qPCR for two ZmHMA genes and the locus tb1, known to be associated with maize architecture. A transcriptomic analysis would be necessary to 1- strengthen the proposed connection and 2- identify other genes with linked QTLs, such as those in the short arm of chromosome 5.

    3. Reviewer #2 (Public review):

      Summary:

      This work explores the phenotypic developmental traits associated with Cu and Cd responses in teosinte parviglumis, a species evolutionary related to extant maize crops. Cu and Cd could serve as a proxy for heavy metals present in the soils. The manuscript explores potential genetic loci associated with heavy metal responses and domestication identified in previous studies. This includes heavy metal transporters, which are unregulated during stress. To study that, the authors compare the plant architecture of maize defective in ZmHMA1 and speculate on its association with domestication.

      Strengths:

      Very few studies covered the responses of teosintes to heavy metal stress. The physiological function of ZmHMA1 in maize also gives some novelty in this study. The idea and speculation section is interesting and well-implemented.

      Weaknesses:

      The authors explored Cu/Cd stress but not a more comprehensive panel of heavy metals, making the implications of this study quite narrow. Some techniques used, such as end-point RT-PCR and qPCR, are substandard for the field. The phenotypic changes explored are not clearly connected with the potential genetic mechanisms associated with them, with the exception of nodal roots. If teosintes in response to heavy metal have phenotypic similarity with modern landraces of maize, then heavy metal stress might have been a confounding factor in the selection of maize and not a potential driving factor. Similar to the positive selection of ZmHMA1 and its phenotypic traits. In that sense, there is no clear hypothesis of what the authors are looking for in this study, and it is hard to make conclusions based on the provided results to understand its importance. The authors do not provide any clear data on the potential influence of heavy metals in the field during the domestication of maize. The potential role of Tb-1 is not very clear either.

    4. Author response:

      Reviewer 1:

      The selection of heavy metal stress as the condition to investigate is not speculative. The elucidation of the genome from the Palomero toluqueño maize landrace revealed heavy metal effects during domestication (Vielle-Calzada et al., 2009). Differences concordant with its ancient origin identified chromosomal regions of low nucleotide variability that contained the three domestication loci included in this study; all three are involved in heavy-metal detoxification. Results presented in Vielle-Calzada et al 2009 indicated that environmental changes related to heavy metal stress were important selective forces acting on maize domestication. Our study expands those results by starting to elucidate the function of these heavy metal response genes and their role in the evolutionary transition from teosinte parviglumis to maize.

      Although the paper presents some interesting findings, it is difficult to distinguish which observations are novel versus already known in the literature regarding maize HM stress responses. The rationale behind focusing on specific loci is often lacking. For example, a statistically significant region identified via LOD score on chromosome 5 contains over 50 genes, yet the authors focus on three known HM-related genes without discussing others in the region. It is unclear why ZmHMA1 was selected for mutagenesis over ZmHMA7 or ZmSKUs5.

      We appreciate the value of this comment. We will modify the manuscript to clearly show which phenotypic observations are novel and which were previously reported for maize grown under HM stress. The rationale for focusing on three specific loci is related to results from Vielle-Calzada et al. 2009 (see comment above). Although we demonstrated that these three loci show unusual reduction in genetic variability when compared to the rest of chromosome 5 – including a separate class of genes previously identified as being affected by domestication (Hufford et al., 2012) -, we will expand the genetic and expression analysis to all genes included in a region precisely defined via LOD scores of five QTL 1.5-LOD support intervals that overlap with ZmHMA1.Within this region of 1.5 to 2 Mb, we will compare nucleotide variability and gene expression in response to HMs. Contrary to major domestication loci showing a single highly pleiotropic gene responsible for important domestication traits, in this chr.5 genomic region phenotypic effects are due to multiple linked QTLs (Lemmon and Doebley, 2014). The mutagenic analysis of ZmHMA7 and ZmSKUs5 will be included in a different publication; we can anticipate that the results reinforce the conclusions of this study.

      The idea that HM stress impacted gene function and influenced human selection during domestication is of interest. However, the data presented do not convincingly link environmental factors with human-driven selection or the paleoenvironmental context of the transition. While lower nucleotide diversity values in maize could suggest selective pressure, it is not sufficient to infer human selection and could be due to other evolutionary processes. It is also unclear whether the statistical analysis was robust enough to rule out bias from a narrow locus selection. Furthermore, the addition of paleoclimate records (Paleoenvironmental Data Sources as a starting point) or conducting ecological niche modeling or crop growth models incorporating climate and soil scenarios would strengthen the arguments.

      We agree that lower nucleotide diversity values in maize are not sufficient to infer human selection and could be due to other evolutionary processes. As a matter of fact, since these same HM response loci also show unusually low nucleotide variability in teosinte parviglumis (Fig 2), we cannot discard the possibility that natural selection forces related to environmental changes could have affected native teosinte parviglumis populations in the early Holocene, before maize emergence. This possibility supports a speculative model suggesting that phenotypic changes caused by HM stress could have preceded human selection and its consequences, contributing to initial subspeciation; the model is proposed in the “Ideas and Speculation” section of the manuscript. Fortunately, as suggested by the reviewer, a large body of paleoclimatic records and paleoenvironmental data is available for the Trans-Mexican Volcanic Belt  in the Holocene, including geographic regions where the emergence of maize presumably occurred. We will include an extensive analysis of available paleoenvironmental data and discuss it at the light of our current results regarding the effects of HM stress. We are also expanding the physical range of our statistical analysis to cover at least 60 Kb per locus - including neighboring genes for all three loci - to determine if our results could be due to narrow locus selection.

      Despite the interest in examining HM stress in maize and the presence of a pleiotropic phenotype, the assessment of the impact of gene expression is limited. The authors rely on qPCR for two ZmHMA genes and the locus tb1, known to be associated with maize architecture. A transcriptomic analysis would be necessary to 1- strengthen the proposed connection and 2- identify other genes with linked QTLs, such as those in the short arm of chromosome 5.

      Although real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, we will explore the possibility of complementing our analysis with available RNA-seq results that are pertinent for this study (see for example Li et al., 2022 and Zhang et al., 2024) and further explore causative effects between HM stress, Tb1 and ZmHMA1 expression. As also pointed by Reviewer#1, TEs are known to influence gene expression under abiotic stress and RNA-Seq analysis would allow to determine if TE activity could lead to similar outcomes.

      Reviewer #2:

      The authors explored Cu/Cd stress but not a more comprehensive panel of heavy metals, making the implications of this study quite narrow. Some techniques used, such as end-point RT-PCR and qPCR, are substandard for the field. The phenotypic changes explored are not clearly connected with the potential genetic mechanisms associated with them, with the exception of nodal roots. If teosintes in response to heavy metal have phenotypic similarity with modern landraces of maize, then heavy metal stress might have been a confounding factor in the selection of maize and not a potential driving factor. Similar to the positive selection of ZmHMA1 and its phenotypic traits. In that sense, there is no clear hypothesis of what the authors are looking for in this study, and it is hard to make conclusions based on the provided results to understand its importance. The authors do not provide any clear data on the potential influence of heavy metals in the field during the domestication of maize. The potential role of Tb-1 is not very clear either.

      Thank you for these comments. We will clearly emphasize our hypothesis that HM stress was an important factor driving the emergence of maize from teosinte parvglumis through action of HM response genes. A comprehensive panel of heavy metals would not be more accurate in terms of simulating the composition of volcanic soils evolving across 9,000 years in the region where maize presumably emerged. Copper (Cu) and cadmium (Cu) correspond each to a different affinity group for proteins of the ZmHMA family. ZmHMA1 has preferential affinity for Cu and Ag (silver), whereas ZmHMA7 has preferential affinity to Cd, Zn (zinc), Co (cobalt), and Pb (lead). Since these P1b-ATPase transporters mediate the movement of divalent cations, their function remains consistent regardless of the specific metal tested, provided it belongs to the respective affinity group. By applying sublethal concentrations of Cd (16 mg/kg) and Cu (400 mg/kg), we caused a measurable physiological response while allowing plants to complete their life cycle, including the reproductive phase, facilitating a comprehensive analysis of metal stress adaptation.

      Although real-time qPCR is an accurate and reliable approach to assess gene expression, we agree that RNA-Seq results would improve the scope of the analysis and better assess the role of Tb1 in relation to HM response (see comments for Reviewer#1). There are two phenotypic changes clearly connected with the genetic mechanisms involved in the parviglumis to maize transition: plant height and the number of seminal roots (not nodal roots). We will emphasize these phenotypic changes in a modified version of the manuscript. There is a possibility for HM stress to represent a confounding factor in the selection of maize and not a driving factor; however, if such is the case, we think it is rather unlikely that the real driving factor could have acted through mechanisms not related to abiotic stress or HM response. To address the possibility that HM stress was a cofounding factor, we will extensively analyze genetic diversity and gene expression in all loci containing genes mapping in close proximity to peak LOD scores of all 1.5-LOD support intervals located in chromosome 5 and showing pleiotropic effects on domestication traits (Lemmon and Doebley, 2014). These will also include those mapping in close proximity to ZmHMA1. The potential influence of heavy metals in the field is being investigated through the analysis of paleoenvironmental data (see response to Reviewer#1); we will include our results in a modified version of the manuscript.

      We thank both reviewers for their detailed revision the manuscript and their pertinent recommendations to improve its presentation and reading.

      References:

      Hufford, Matthew B., Xun Xu, Joost Van Heerwaarden, Tanja Pyhäjärvi, Jer-Ming Chia, Reed A. Cartwright, Robert J. Elshire, et al. 2012. Comparative population genomics of maize domestication and improvement. Nature Genetics 44(7): 808-11.

      Lemmon Zachary H., Doebley John F. 2014. Genetic dissection of a genomic region with pleiotropic effects on domestication traits in maize reveals multiple linked QTL. Genetics 198(1): 345-353.

      Lin Kaina, Zeng Meng, Williams Darron V., Hu Weimin, Shabala Sergey, Zhou Meixue, Cao Fangbin, et al. 2022. Integration of transcriptome and metabolome analyses reveals the mechanictic basis for cadmium accumulation in maize. iScience 25(12): 105484.

      Vielle-Calzada JP, De La Vega OM, Hernández-Guzmán G, Ibarra-LacLette E, Alvarez-Mejía C, Vega-Arreguín JC, Jiménez-Moraila B, Fernández-Cortés A, Corona-Armenta G, Herrera-Estrella L, Herrera-Estrella A. 2009. The Palomero genome suggests metal effects on domestication. Science 326: 1078.

      Zhang Mengyan, Zhao Lin, Yun Zhenyu, Wu Xi, Wu Qi, et al. 2024. Comparative transcriptome analysis of maize (Zea mays L.) seedlings in response to copper stress. Open Life Sciences 19(1): 20220953.

    1. eLife Assessment

      This study presents valuable findings about daily rhythm changes of the Drosophila melanogaster adult gut metabolome, which is shown to be dependent on the circadian clock genotype, dietary regime and composition, and gut microbiota. The phenomena observed are supported by convincing experimental evidence. The general descriptive approach limits the power of the proposed conclusions. The work will be of interest to a broad range of physiology specialists

    2. Reviewer #1 (Public review):

      The authors build on their previous study that showed the midgut microbiome does not oscillate in Drosophila. Here, they focus on metabolites and find that these rhythms are in fact microbiome-dependent. Tests of time-restricted feeding, a clock gene mutant, and diet reveal additional regulatory roles for factors that dictate the timing and rhythmicity of metabolites. The study is well-written and straightforward, adding to a growing body of literature that shows the time of food consumption affects microbial metabolism which in turn could affect the host.

      Some additional questions and considerations remain:

      (1) The main finding that the microbiome promotes metabolite rhythms is very interesting. Which microbiota are likely to be responsible for these effects? Future work could be done to link specific microbiota linked to some of the metabolic pathways investigated.

      (2) TF increases the number of rhythmic metabolites in both microbiome-containing and abiotic flies. This is somewhat surprising given that flies typically eat during the daytime rather than at night, very similar to TF conditions. Future work could be done to restrict feeding to other times of day to see if there is a subsequent shift in the timing of metabolites.

      (3) Along these lines, the authors show that Per loss of function reveals a change in the phase of rhythmic metabolites. The authors note that these changes are not due to altered daily feeding rhythms in per mutants. This data suggest Per itself is responsible for these changes. Future work could be done to characterize the mechanisms responsible for these effects.

      (4) The calorie content of each diet - normal vs high protein vs high-sugar are different. Future work in this area could consider the possibility of a calorie effect rather than difference in nutrition (protein/carbohydrate) or an effect of high protein/sugar on the microbiome itself.

      (5) The supplementary table provided outlining the specific metabolites will be useful for future research in this area.

    3. Reviewer #2 (Public review):

      The revised version of the paper clarifies the authors' discoveries regarding daily changes in metabolite concentrations in the gut of adult female Drosophila melanogaster. The authors have addressed all the questions and made the necessary changes, thereby strengthening the value of the article. They demonstrate that various factors influence metabolite oscillations: circadian clock genotype, dietary regime and composition, and gut microbiota.<br /> The notable strengths of this research article remain unchanged: the originality of the experimental design with multiple conditions tested, the variety of detected metabolites, and the clarity in data presentation.

      Among the weaknesses, one may consider the following:<br /> Limitations of potential reproducibility: It is unclear whether another research team would identify the same set of cycling metabolites, although similar conclusions appear robust.<br /> Limitations of generalisation: While the conclusions regarding the influence of microbiota, circadian genotype, and dietary regime may be valid, the specific metabolic pathways affected might differ, whereas specific mechanistic explanations remain elusive.<br /> Accuracy of data interpretation: Addressed in comments to the authors. This point corresponds to interpretations discussed by the authors in the text of the manuscript, including beneficial effects of cycling metabolites and phenomenon of oscillation as a whole, its physiological relevance and lack of proofs for existence of any compensative effects, their relevance to metabolism in the gut.<br /> Nevertheless, the authors have clearly and thoroughly addressed all the reviewers' concerns, enabling a better interpretation of the entire study.

    4. Reviewer #3 (Public review):

      Summary:

      Zhang et al sought to quantify the influence of the gut microbiome on metabolite cycling in a Drosophila model with extensive metabolomic profiling in 4 time points over a 24 hour period. The authors report that the microbiome enhances metabolite cycling in a context-dependent manner. The metabolomics data presented are comprehensive and complex, and they open up may new questions. The major strength of the work is the production of a large dataset of metabolites that can be the basis for hypothesis generation for more specific experiments. There are several weaknesses that make some of the conclusions speculative.

      Strengths:

      The revised manuscript is significantly improved due to the inclusion of new data and expanded analyses, particularly of time-resolved food intake. The dataset is comprehensive and of high value to the community. The experimental design includes multiple metabolomic comparisons across genetic and dietary conditions, specifically, germ-free versus microbially-colonized flies, time-restricted versus ad libitum feeding, high-sugar versus high protein diets, and wildtype genotype versus the per01 clock mutant. Additionally, the cycling of individual metabolites is presented, allowing readers to examine metabolites of interest. The datasets are made publicly available, allowing this resource to benefit the community.

      Weaknesses

      Many of the statistically significant differences, e.g. the effects of the microbiome on lipids and biogenic amines in Fig S5A, are quite small in magnitude, and, thus, it is difficult to believe that they are of biological significance without more mechanistic studies. Key conclusions, such as those pertaining to regulation or compensation by the microbiome, are not fully supported by mechanistic experiments. The manuscript uses terms like "regulate" or "compensate," which imply causality or a purpose of the microbiome that is not yet demonstrated, but this type of study opens up many important questions for which new hypotheses can be formed.

      A minor limitation is the modest temporal resolution (only four time points in 24 hours), which constrains interpretation of rhythmicity and phase. Additional experimental controls and targeted perturbation experiments are needed to support conclusions about functional impacts of metabolite oscillations. However, these types of limitations are expected from an early study in the field such as this one. Overall, the data are valuable, and the findings demonstrate the promise of the model for studying the interplay between the microbiome, metabolome, and circadian rhythm.

      Assessment of Aims

      The authors explore how the microbiome interacts with host circadian rhythms and diet to shape metabolite cycling. They largely succeed in characterizing broad trends and generating a valuable resource dataset. However, the conclusion that the microbiome actively regulates or compensates for cycling under specific conditions is not convincingly demonstrated with the current data.

      Impact and Utility

      The dataset will be a useful reference for researchers interested in microbiome-host interactions, metabolomics, and circadian biology. Its primary value lies in descriptive insight rather than mechanistic resolution. An alternative perspective is that per01 mutants serve as a useful negative control for rhythmicity detection, providing a baseline for distinguishing signal from experimental noise ---an idea that could be emphasized more in the interpretation.

      Contextual Considerations

      Metabolomics datasets are valuable for understanding the influence of the microbiome. Future follow-up work using higher resolution sampling and functional perturbations (e.g., more extensive genetic or microbial manipulations) will be essential to test hypotheses about the roles of specific metabolites, regulatory pathways, and microbiota members in circadian modulation. This paper lays a strong foundation for such studies.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors build on their previous study that showed the midgut microbiome does not oscillate in Drosophila. Here, they focus on metabolites and find that these rhythms are in fact microbiome-dependent. Tests of time-restricted feeding, a clock gene mutant, and diet reveal additional regulatory roles for factors that dictate the timing and rhythmicity of metabolites. The study is well-written and straightforward, adding to a growing body of literature that shows the time of food consumption affects microbial metabolism which in turn could affect the host.

      We thank the reviewer for the positive comments.

      Some additional questions and considerations remain:

      (1) The main finding that the microbiome promotes metabolite rhythms is very interesting. Which microbiota are likely to be responsible for these effects? The author's previous work in this area may shed light on this question. Are specific microbiota linked to some of the metabolic pathways investigated in Figure 5?

      This is a good question. Although the Drosophila microbiome shows limited diversity, comprised largely of two major families (Acetobacteraceae and Lactobacillaceae), effects on the host could arise from just a subset of species within these families. However, identifying these would require inoculating microbiome-free flies with single and mixed combinations of species and conducting metabolomics to examine cycling of each of the three categories of metabolites we studied-- primary, lipids and biogenic amines (each of these may respond differently to different species). We believe this is beyond the scope of this manuscript, which is focused on how cycles in these different types of metabolites change in the context of the microbiome, the circadian clock and different diets.

      (2) TF increases the number of rhythmic metabolites in both microbiome-containing and abiotic flies in Figure 1. This is somewhat surprising given that flies typically eat during the daytime rather than at night, very similar to TF conditions. I would have assumed that in a clock-functioning animal, the effect of restricting food availability should not make a huge difference in the time of food consumption, and thus downstream impacts on metabolism and microbiome. Can the authors measure food intake directly to compare the ad-lib vs TF flies to see if there are changes in food intake? Would restricting feeding to other times of day shift the timing of metabolites accordingly?

      Previous studies have indicated that there is no significant difference in food consumption between ad-lib and TF flies (Gill et al., 2015; Villaneuva et al 2019). We also found that the presence of a microbiome does not alter total food consumption when compared with germ-free flies (Zhang et al, 2023, and current manuscript). Though flies primarily feed during the day, some food consumption occurs at night (i.e the feeding rhythm is not tight) and so restricting food to the daytime can increase metabolite cycling. Restricting feeding to other times of day is expected to shift metabolite cycling. We previously showed that this shifts transcript cycling (Xu et al, Cell Metabolism 2011)

      (3) In Figure 2, Per loss of function reveals a change in the phase of rhythmic metabolites. In addition, the effect of the microbiome on these is very different = The per mutants show increased numbers of rhythmic metabolites when the microbiome is absent, unlike the controls. Is it possible that these changes are due to altered daily feeding rhythms in per mutants? Testing the time and amount of food consumed by the per mutant flies would address this question. Would TF in the per mutants rescue their metabolite rhythms and make them resemble clock-functioning controls?

      We previously showed that per<sup>01</sup> flies lose feeding rhythms in DD and LD conditions, but consume a lot more food (Barber et al, 2021). Given that locomotor rhythms are maintained in per<sup>01</sup> in LD (Konopka and Benzer 1971), these rhythms or other rhythms driven by LD cues likely account for the maintenance of metabolite rhythms. And the increased food consumption may contribute to the changes observed. To address the reviewer’s question about the microbiome, we assayed feeding rhythms in per<sup>01</sup> in the absence/presence of a microbiome on the diets that haven’t been tested before (high sugar and high protein diet). Surprisingly, feeding was rhythmic on a high protein diet, regardless of whether a microbiome was present (new Figure S10). On a high sugar diet, feeding appears to be somewhat rhythmic in the presence of a microbiome (although not significant) and not when the microbiome is absent. The same is true in iso31 controls, and in all cases, the phase is the same. Despite the similar effect of the microbiome on feeding rhythms in wild type and per<sup>01</sup>, the effect on cycling is very different. Thus, feeding rhythms do not appear to explain the effects of the microbiome on metabolite cycling in per<sup>01</sup>.

      (4) The calorie content of each diet-normal vs high protein vs high-sugar are different. The possibility of a calorie effect rather than a difference in nutrition (protein/carbohydrate) should be discussed. Another issue worth considering is the effect of high protein/sugar on the microbiome itself. While the microbiome doesn't seem to affect rhythms in the high-protein diet, the high-sugar diet seems highly microbiome-dependent in Supplementary Fig 8C vs D. Does the diet impact the microbiome and thus metabolite rhythmicity downstream?

      Thank you for these good suggestions. We have added to the discussion the possibility that caloric content, rather than nutrition (protein/carbohydrate), affects metabolite cycling in flies fed normal vs. high-protein vs. high-sugar diets. We have also discussed the possibility that effects of different diets on metabolite cycling are mediated by changes in the microbiome. We cite papers that show effects of diet on microbiomes.

      (5) It would be good if a supplementary table was provided outlining the specific metabolites that are shown in the radial plots. It is not clear if the rhythms shown in the figures refer to the same metabolites peaking at the same time, or rather the overall abundance of completely different metabolites. This information would be useful for future research in this area.

      We have added a supplementary Table 1-21 which includes all the raw metabolites.

      Reviewer #2 (Public Review):

      Summary:

      The paper addresses several factors that influence daily changes in concentration of metabolites in the Drosophila melanogaster gut. The authors describe metabolomes extracted from fly guts at four time-points during a 24-hour period, comparing profiles of primary metabolites, lipids, and biogenic amines. The study elucidates that the percentage of metabolites that exhibit a circadian cycle, peak phases of their increased appearance, and the cycling amplitude depends on the combination of factors (microbiome status, composition or timing of the diet, circadian clock genotype). Multiple general conclusions based on the data obtained with modern metabolomics techniques are provided in each part of the article. Descriptive analysis of the data supports the finding that microbiome increases the number of metabolites for which concentration oscillates during the day period. Results of the experiments show that timed feeding significantly enhanced metabolite cycling and changed its phase regardless of the presence of a microbiome. The authors suggest that the host circadian rhythm modifies both metabolite cycling period and the number of such metabolites.

      Strengths:

      The obvious strength of the study is the data on circadian cycling of the detected 843, 4510, and 4330 total primary metabolites, lipids, and biogenic amines respectively in iso31 flies and 623, 2245, and 2791 respective metabolites in per<sup>01</sup> mutants. The comparison of the abundance of these metabolites, their cycling phase, and the ratio of cycling/non-cycling metabolites is well described and illustrated. The conditions tested represent significant experimental interest for contemporary chronobiology: with/without microbiota, wild-type/mutant period gene, ad libitum/time-restricted feeding, and high-sugar/high-protein diet. The authors conclude that the complex interaction between these factors exists, and some metabolic implications of combinations of these factors can be perceived as reminiscent of metabolic implications of another combination ("...the microbiome and time-restricted feeding paradigms can compensate for each other, suggesting that different strategies can be leveraged to serve organismal health"). Enrichment analysis of cycling metabolites leads to an interesting suggestion that oscillation of metabolites related to amino acids is promoted by the absence of microbiota, alteration of circadian clock, and time-restricted feeding. In contrast, association with microbiota induces oscillation of alpha-linolenic acid-related metabolites. These results provide the initial step for hypothesising about functional explanations of the uncovered observations.

      We thank the reviewer for summarizing the contributions made by this manuscript.

      Weaknesses:

      Among the weaknesses of the study, one might point out too generalist interpretations of the results, which propose hypothetical conclusions without their mechanistic proof. The quantitative indices analysed are obviously of particular interest, however are not self-explaining and exhaustive. More specific biological examples would add valuable insights into the results of this study, making conclusions clearer. More specific comments on the weaknesses are listed below:

      (1) The criterion of the percentage of cycling metabolites used for comparisons has its own limitations. It is not clear, whether the cycling metabolites are the same in the guts with/without microbiota, or whether there are totally different groups of metabolites that cycle in each condition. GO enrichment analysis gives only a partial assessment, but is still not quantitative enough.

      Microbiome-containing flies and germ-free flies do share some cycling metabolites. Figure 6 provides GO analysis for the pathways enriched in each condition, and Figure S6 shows quantitative data on the number that overlap between different conditions. We have also expanded discussion of specific cycling groups (e.g. amino acid metabolism) to indicate that different metabolites of the same pathway may cycle under different conditions. In addition, we have added detailed information for all cycling metabolites in Supplemental Tables 1-21.

      (2) The period of cycling data is based on only 4 time points during 24 hours in 4 replicates (>200 guts per replicate) on the fifth day of the experiment (10-12-day-old adults). It does not convincingly prove that these metabolites cycle the following days or more finely within the day. Moreover, it is not clear how peaks in polar histogram plots were detected in between the timepoints of ZT0, ZT6, ZT12, and ZT18.

      We acknowledge these limitations, but note that these experiments are very challenging because of the amount of tissue/guts needed for each data point and the time it takes to dissect each gut. Thus, getting more closely spaced time points is difficult. And we believe the detection of daily peaks with four biological replicates provides good evidence for cycling. The peaks in polar histogram plots are based on the parameter of JTK_adjphase when conducting JTK cycle analysis; as the data are averaged across replicates, the average can sometimes fall in between two assayed time points. Details can be found in the attached Supplementary Tables.

      (3) Average expression and amplitude are analysed for groups of many metabolites, whereas the data on distinct metabolites is hidden behind these general comparisons. This kind of loss of information can be misleading, making interpretation of the mentioned parameters quite complicated for authors and their readers. Probably more particular datasets can be extracted to be discussed more thoroughly, rather than those general descriptions.

      We analyzed groups of metabolites, dividing them into primary metabolites, lipids and biogenic amines, to extract general take-home messages and also to simplify the presentation. Figure 6 demonstrates specific pathways whose cycling is affected in each condition assayed. And Figure S11 shows examples of cycling metabolites under different conditions. To highlight a dataset that is altered under different conditions, we expanded our discussion of amino acid metabolism, and show how the specific metabolites that cycle in this pathway may vary from one condition to another (Figure S11). For more quantitative data on individual metabolites, we now provide supplementary tables that display all the cycling metabolites. These include those uniquely cycling in one group, those shared between both two groups, and those uniquely cycling in the other group.

      (4) The metabolites' preservation is crucial for this type of analysis, and both proper sampling plus normalisation require more attention. More details about measures taken to avoid different degradation rates, different sizes of intestines, and different amounts of microbes inside them will be beneficial for data interpretation.

      We were careful to control for gut size and to preserve the samples so as to minimize degradation (We placed all the fly samples on ice during collection, and the entire dissection process was also conducted on ice. Once the gut sample collection was completed, we immediately transferred the samples to dry ice for storage. After we finished collecting all the samples, we stored them at -80°C). In general, gut sizes varied in the following order: females fed high-protein diets >females fed normal chow diets> female flies fed high-sugar diets. As the metabolomic facility suggested 10mg samples for each biological repeat, we dissected at least 180 female guts from flies fed high-protein diets, 200 female guts from flies fed normal chow diets, and at least 250 female guts from flies fed high-sugar diets. Also, as gut sizes were smaller in sterile flies, relative to microbiome-containing flies, on a high protein diet, we collected 200 guts from sterile flies under these conditions. Finally, the service that conducted the metabolomics (UC Davis) provided three detailed files to describe the extraction process for primary metabolites, lipids, and biogenic amines, respectively. We have submitted these files as supplemental materials in the revised manuscript.

      (5) The data in the article describes formal phenomena, not directly connected with organism physiology. The parameters discussed obviously depend on the behavior of flies. Food consumption, sleep, and locomotor activity could be additionally taken into account.

      These are very interesting suggestions. Previous results indicated that microbiome-containing flies do not change their total food consumption or exhibit changes in feeding rhythms when compared with germ-free flies (Zhang et al., 2023), which indicates that microbiome-mediated metabolite cycling is independent of feeding rhythms. As noted above, we examined the contribution of feeding to metabolite cycling in per<sup>01</sup> flies, and did not see an obvious link. We also assayed feeding rhythms and overall food consumption in wild type under AS and AM conditions and on different diets, and likewise could not account for changes in metabolite cycling based on altered food intake (new Figure S10). We acknowledge that behavior, including locomotor activity and sleep, could indeed influence metabolite cycling. We have added discussion of this.

      (6) Division of metabolites into three classes limits functional discussion of found differences. Since the enrichment analysis provided insights into groups of metabolites of particular interest (for example, amino acid metabolism), a comparison of their cycling characteristics can be shown separately and discussed.

      The intent of this work was to provide an overall account of changes in metabolite cycling that occur under different conditions/diets/genotypes. We have expanded the discussion of amino acid metabolism and show how different metabolites of this pathway cycle under different conditions (Figure S11). We believe that discussion/analysis of other specific groups would be good follow-up studies, which can build upon this work. Detailed datasets about all cycling metabolites are provided in Table S1-12.

      Reviewer #3 (Public Review):

      Summary:

      The authors. sought to quantify the influence of the gut microbiome on metabolite cycling in a Drosophila model with extensive metabolomic profiling over a 24-hour period. The major strength of the work is the production of a large dataset of metabolites that can be the basis for hypothesis generation for more specific experiments. There are several weaknesses that make the conclusions difficult to evaluate. Additional experiments to quantify food intake over time will be required to determine the direct role of the microbiome in metabolite cycling.

      Strengths:

      An extensive metabolomic dataset was collected with treatments designed to determine the influence of the gut microbiome on metabolite circadian cycling.

      Weaknesses:

      (1) The major strength of this study is the extensive metabolomic data, but as far as I can tell, the raw data is not made publicly available to the community. The presentation of highly processed data in the figures further underscores the need to provide the raw data (see comment 3).

      The raw data have been submitted to the public metabolite database. https://www.ebi.ac.uk/metabolights/. (ID: MTBLS8819)

      In addition, the normalized metabolite data have been added in the supplemental materials.

      (2) Feeding times heavily influence the metabolome. The authors use timed feeding to constrain when flies can eat, but there is no measurement of how much they ate and when. That needs to be addressed.

      Since food is the major source of metabolites, the timing of feeding needs to be measured for each of the treatment groups. In the previous paper (Zhang et al 2023 PNAS), the feeding activity of groups of 4 male flies was measured for the wildtype flies. That is not sufficient to determine to what extent feeding controls the metabolic profile of the flies. Additionally, timed feeding opportunities do not equate to the precise time of feeding. They may also result in dietary restriction, leading to the loss of stress resistance in the TF flies. The authors need to measure food consumption over time in the exact conditions under which transcriptomic and metabolomic cycling are measured. I suggest using the EX-Q assay as it is much less effort than the CAFE assay and can be more easily adapted to the rearing conditions of the experiments.

      As noted above, we have now added considerable additional data on feeding and feeding rhythms in microbiome-containing and sterile wild type and per<sup>01</sup> flies on different diets (Figure S10). Our previous study, using the EX-Q assay method, found no differences in either total food consumption or feeding rhythms between microbiome-containing flies and germ-free flies (Zhang et al., 2023). Also, previous work has demonstrated that there is no significant difference in food consumption between ad-lib and TF flies (Villaneuva et al 2019).

      (3) The data on the cycling of metabolites is presented in a heavily analyzed form, making it difficult to evaluate the validity of the findings, particularly when a lack of cycling is detected. The normalization to calculate the change in cycling due to particular treatments is particularly unclear and makes me question whether it is affecting the conclusions. More presentation of the raw data to show when cycling is occurring versus not would help address this concern, as would a more thorough explanation of how the normalization is calculated - the brief description in the methods is not sufficient.

      For instance, the authors state that "timed feeding had less effect on flies containing a microbiome relative to germ-free flies." One alternative interpretation of that result is that both treatments are cycling but that the normalization of one treatment to the other removes the apparent effect. This concern should be straightforward to address by showing the raw data for individual metabolites rather than the group.

      We have added Supplement Table1-21 that includes detailed information on metabolite identity and data processing. Also, we have included the raw data, encompassing all the cycling metabolites, in the Supplement Table1-21.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The abstract could be rewritten to clarify. I found the last part of the introduction better but struggled to understand the abstract.

      We apologize for this. The abstract was indeed quite dense; we have revised it for clarity.

      (2) Supplementary Figure 8 could be moved to the main text. Since all the comparisons are on one page it is much easier to see the similarities and differences in the conditions tested.

      We have moved Supplementary Figure 8 to main Figure 5.

      (3) The sex and age of the flies used in all experiments should be clarified. The authors mention female guts are collected in the methods (line 111) but it is not clear if this is throughout.

      All guts used in this study were female. We have clarified this in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Some minor notes that might be improved:

      (1) The order of obtaining eggs without microbiota might be different (first - bleaching, second - sterilisation with ethanol). Otherwise, it is not clear why dechorionating is needed after sterilisation.

      Protocols for generating axenic flies vary. We used the method Feltzin et al reported in 2019: “For newborn fly embryos (<12 hours). First, cleanse and sterilize any leftover agar from collection plates using 100% ethanol, second, dechorionate the fly embryos with 10% bleach, and then immediately rinse three times in germ-free PBS”.

      (2) References for the resources used might be provided (MetaboAnalyst5.0, JTK_CYCLEv3.1).

      We have added the reference for MetaboAnalyst5.0, JTK_CYCLEv3.1 (Pang et al., 2022)

      (3) References or justification for the chosen composition of the diets might be useful (standard diet, high-protein diet, high-sugar diet).

      We have added the references (Bedont et al, 2021, Morris et al, 2021).

      (4) Justification of the choice of iso31 line and per<sup>01</sup> mutant might be important.

      iso31 is the standard wild type line we use in the laboratory. To understand the role of the endogenous clock in microbiome-mediated metabolite cycling, we chose the classical canonical clock mutant per<sup>01</sup> as this displays fewer non-circadian phenotypes seen. For instance, loss of transcriptional activators of the clock produces additional effects (e.g. hyperactivity), likely because of the effect it has on overall expression of many genes. We have added this explanation to the manuscript.  

      (5) Abbreviation decoding might be introduced when it is used for the first time in the text (line 240 - TM, TS).

      We apologize for this omission and have rectified it. Thanks

      TM (timed feeding microbiome-containing flies)

      TS (timed feeding germ-free flies)

      (6) The term "germ-free" is recommended to be avoided in the context of the paper (germ-free = infertile for animals). It might be replaced with the terms "without microbiota" or "germ-free" for example.

      Given that the reviewer recommends use of the word “germ-free” in the second sentence, we assume that the first sentence was intended to say we should avoid “sterile” (and not “germ-free”). We have edited to “germ-free” in the manuscript.

      (7) When only one diet is assumed, it might be better to say so (line 324 - "the protein diet" instead of "protein diets").

      Sorry, we have edited accordingly.

      (8) Too many speculative conclusions are confusing (line 476 - what does it mean for "just as” - how exactly similar; line 477 - what kind of "compensation"; line 503 - how exactly it is related to "metabolic homeostasis" and to which kind of homeostasis).

      “just as” was not intended to refer to any degree of similarity but only to the fact that amino acid cycling occurs in the absence of a clock, as it does in the absence of a microbiome. We speculate that this “compensates” for something that is normally conferred by the clock and the microbiome, for instance maybe the clock drives cycling of a microbiome component that is important for protein metabolism. In the absence of either the clock or the microbiome, this is compensated for by amino acid cycling. We have clarified in the text.

      We used the term "metabolic homeostasis" to reflect steady maintenance of metabolic health via interaction and modulation of different factors. As in the case of the example given above for amino acid metabolism, a perturbation of one process might produce a change in another to optimize metabolism. We have changed the wording in the text to better convey our message (lines 576-579)

      (9) Particular examples of metabolites might be beneficial for supporting conclusions (a figure which shows, for instance, the specific data on linolenic acid: in which conditions it cycles, in which not, what is the period of cycling, what are the exact expression and JTK_amplitude values).

      All cycling metabolites, including linolenic acid, are now included in the supplemental tables.

      Reviewer #3 (Recommendations For The Authors):

      (1) The level of biological replication is unclear for the metabolomic experiments. I see that 200 guts per sample were collected and 4 repeat samples were made at each timepoint. Are these 4 biological replicates for each treatment (AS, AM, TS, TM) at each timepoint? 5 replicates are standard in metabolomics. Please be more explicit in the methods.

      There are 4 biological replicates for each time point of each of the 4 treatments. The metabolomics service recommended 4-6 replicates, so we prepared 4 replicates for each sample. As noted above, these preparations are quite difficult. We found that in general the biological replicates do not differ significantly from each other.

      (2) Wolbachia can have a significant influence on fly physiology. How was this variable addressed? Were flies checked for Wolbachia?

      All the flies are Wolbachia-free, as in our previous study (Zhang et al., 2023). Initially, we treated the flies with 1 mM kanamycin (11815024, ThermoFisher) to remove bacteria. Afterwards, we repopulated the flies with a Wolbachia-free microbiome containing Lactobacillus and Acetobacter bacteria from a medium previously occupied by other flies.

      (3) In Results section 1, the authors report changes in the percentages of metabolites that are cycling, but no statistical test is presented to show that these changes are indeed significant. The authors need to report statistics on the percentages of cycling metabolites.

      We used statistical tests, specifically JTK cycle, to determine cycling of each metabolite. The P value for cycling of each metabolite in this test is computed on the basis of all the biological replicates and all time points. Metabolites that showed a significant P value contribute to the percent cycling. As a result, there is only one value for the percentage cycling in each condition. Thus, statistical analysis cannot be done.

      (4) The authors report that the species proportions in the gut microbiome don't cycle, but do absolute CFU counts? By many accounts (see e.g. Blum et al 2013 mBio), the gut microbiome in lab flies is what they recently ate from the food. The abundance of bacteria in the gut would then be directly tied to the timing of feeding. Timed feeding should produce oscillations in individual flies, so individual flies should be analyzed.

      We assume the reviewer is suggesting that rhythmic feeding could result in rhythmic abundance of the microbiome, which could contribute to cycling. This is indeed a possibility and one we now discuss in the manuscript. Thanks! Analysis of the gut microbiome in individual flies would require quantitation of CFUs from single guts. We do not believe a single gut would yield enough material.

      (5) Line 252: the ZT9 peak could just be due to feeding and digestion.

      This is possible. We now acknowledge this

      (6) What is the expectation for metabolite cycling in per mutant flies? Shouldn't per mutant flies have no cycling on average? Does the cycling suggest there is an external factor causing cycling?

      Under light-dark conditions, metabolite cycling in per mutant flies may be driven by light: dark cues, either directly or through other light-driven rhythms e,g. locomotor activity is rhythmic in per<sup>01</sup> flies maintained in LD.

      (7) Performing food intake analysis on each of the treatments would provide critical data to address the direct role of the microbiome in metabolite cycling.

      As noted above, we now provide considerable additional data on food intake at different times of day in microbiome-containing and germ-free wild type and per<sup>01</sup> flies on different diets (Figure S11). Overall, our data indicate that food intake or feeding rhythms do not account for the effects we report here.

      (8) Please be more explicit about replication in the methods and figure legends.

      We have added n=4 for each condition in the methods and figure legends.

      (9) There are numerous minor grammatical errors such as incorrect verb tenses and usage of articles. Additional proofreading could correct these.

      Sorry! We have done a thorough proofreading and made corrections.

    1. eLife Assessment

      This study introduces a novel and broadly applicable metric-phenological lag-to partition the effects of spring warming from other abiotic constraints on plant phenology. While the dataset is extensive and the analytical framework is valuable conceptually, the manuscript lacks clarity in its aims and justification for the new metric, and key results are underdeveloped or poorly visualized. The strength of evidence is moderate to solid, but revisions are needed to clarify the study's contribution and improve interpretability.

    2. Reviewer #1 (Public review):

      Summary:

      Jiang et al. present a measure of phenological lag by quantifying the effects of abiotic constraints on the differences between observed and expected phenological changes, using a combination of previously published phenology change data for 980 species, and associated climate data for study sites. They found that, across all samples, observed phenological responses to climate warming were smaller than expected responses for both leafing and flowering spring events. They also show that data from experimental studies included in their analysis exhibited increased phenological lag compared to observational studies, possibly as a result of reduced sensitivity to climatic changes. Furthermore, the authors present compelling evidence that spatial trends in phenological responses to warming may differ from what would be expected from phenological sensitivity, due to the seasonal timing of when warming occurs. Thus, climate change may not result in geographic convergences of phenological responses. This study presents an interesting way to separate the individual effects of climate change and other abiotic changes on the phenological responses across sites and species.

      Strengths:

      A clearly defined and straightforward mathematical definition of phenological lag allows for this method to be applied in different scientific contexts. Where data exists, other researchers can partition the effects of various abiotic forcings on phenological responses that differ from those expected from warming sensitivity alone.

      Identifying phenological lag and associated contributing factors provides a method by which more nuanced predictions of phenological responses to climate change can be made. Thus, this study could improve ecological forecasting models.

      Weaknesses:

      The authors include very few data visualizations, and instead report results and model statistics in tables. This is difficult to interpret and may obscure underlying patterns in the data. Including visual representations of variable distributions and between-variable relationships, in addition to model statistics, provides stronger evidence than model statistics alone.

      The use of stepwise, automated regression may be less suitable than a hypothesis-driven approach to model selection, combined with expanded data visualization. The use of stepwise regression may produce inappropriate models based on factors of the sample data that may preclude or require different variable selection.

    3. Reviewer #2 (Public review):

      Summary:

      This is a meta-analysis of the relative contributions of spring forcing temperature, winter chilling, photoperiod and environmental variables in explaining plant flowering and leafing phenology. The authors develop a new summary variable called phenology lag to describe why species might have different responses than predicted by spring temperature.

      Strengths:

      The summary statistic is used to make a variety of comparisons, such as between observational studies and experimental studies.

      Weaknesses:

      By combining winter chilling effects, photoperiod effects, and environmental stresses that might affect phenology, the authors create a new variable that is hard to interpret. The authors do not provide information in the abstract about new insights that this variable provides.

      Comments:

      It would be useful to have a map showing the sites of the studies.

      The authors should provide a section in which the strengths and weaknesses of the approach are discussed. Is it possible that mixing different types of data, studies, sample sizes, number of years, experimental set-ups, and growth habits results in artifacts that influence the results?

      Now that the authors have created this new variable, phenological lag, which of the components that contribute to it has the most influence on it? Or which components are most influential in which circumstances? For example, what are some examples where photoperiod causes a phenological lag?

    1. eLife Assessment

      This is a potentially important study that explores the relevant range of parameter values for calibration and validation of cardiac electromechanics in ventricular models. Although much of the work presented is solid, the evidence provided to support the authors' key scientific claims is incomplete, especially as it relates to the emphasis on standardized validation and verification approaches. Notably, the level of model personalization presented in this work falls short of the threshold for what could reasonably be called a "digital twin", even by the relatively relaxed standards that have emerged in computational physiology and related fields in recent years.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Wang et al. investigates cardiac electromechanical modeling and simulation techniques, focusing on the calibration and validation of ventricular models according to ASME V&V40 standards. The researchers aim to calibrate model parameters to align with key biomarkers such as QRS duration and left ventricular ejection fraction, and validate the model against independent measurements such as displacement and strain metrics. The authors also examine the impact of parameter variations on deformation, ejection fraction, strains, and other biomarkers. The overarching aim of the study is to give "credibility to the underlying computational electromechanics framework" and to "pave the way towards credible cardiacelectromechanical Digital Twins."

      Strengths:

      (1) The study presents a solid validation strategy for cardiac models based on independent data.

      (2) It integrates electrophysiological, mechanical, and hemodynamic biomarkers for sensitivity analysis and calibration.

      Weaknesses and Limitations:

      (1) Model Assumptions: The study employs simplified modeling assumptions that are not state-of-the-art, e.g.,<br /> a) Isotropic scaling of the mesh to generate an unloaded reference geometry.<br /> b) Simple afterload and preload models that fail to produce physiological results.<br /> c) Simplified epicardial boundary conditions.

      (2) Numerical Framework:<br /> a) The mesh resolution and/or the numerical framework used for the mechanical part appears to suffer from known numerical artifacts (locking effects), leading to overly stiff or inaccurate behavior in finite element analysis. This results in an artificially stiff response to deformation, which is compensated by setting active contraction to ten times the value reported in the literature. The authors attribute this to limitations in using ex vivo tissue measurements to represent in vivo function, although similar issues were not observed in previous works.<br /> b) Further, the authors employ the monodomain model for the simulation of the electrical excitation and relaxation on a relatively coarse grid with an approximate edge length of 1mm. This resolution is known to be insufficient for reliable results in organ-scale electrophysiology modeling.

      (3) Geometrical model and digital twin: The geometrical model, taken from a public cohort and calibrated to an ECG of another individual along with population-averaged values from a databank (UK Biobank), and unrelated measurements from surgical procedures, can hardly be considered a digital twin. Further, validation of the model was then performed against data from yet another cohort.

      (4) Calibration procedure: There are apparent flaws in the calibration procedure, or it is not described in sufficient detail. The authors dedicate significant effort to motivating parameter ranges, but in the end they use mostly other parameters for the calibration process, aiming to maximize left ventricular ejection fraction. It is not clear whether the chosen parameters result in, e.g., physiological calcium traces or calibrated parameters that are within physiological ranges.

      (5) Goodness of fits, e.g., a direct comparison of the measured and the simulated ECG, are not provided to assess calibration quality.

      (6) Due to these limitations and weaknesses, the authors fall short of achieving some of their goals, particularly establishing credibility for the underlying computational framework and in reproducing healthy pressure-volume loops, and in achieving physiological simulations while using physiological or reported ranges for the calibrated parameters.

      For example, a key physiological requirement is that the right and left ventricular stroke volumes are approximately equal in a heart beating at a limit cycle, as the blood pumped by the right ventricle into the pulmonary circulation must match the amount pumped by the left ventricle into the systemic circulation. This balance is not achieved in this study.

      (7) The conclusive claim that "the study paves the way towards credible electromechanical cardiac Digital Twins" is not supported. The model exhibits non-physiological behavior, requires unsupported parameter alterations (such as a 10-fold active stress scaling), and does not represent a digital twin, as model data are drawn from various unrelated, non-patient-specific sources.

      Conclusion:

      Overall, this reviewer considers that the study requires a major revision, including improvements in numerical methods, modeling choices, and checks for physiological behavior. Nevertheless, the provided tables with averaged values from the UK Biobank and the presented validation strategy could be valuable to the research community.

    3. Reviewer #2 (Public review):

      The authors present an interesting study on calibrating and validating a biventricular cardiac electromechanical model. This is an important contribution, but some questions remain about the quantitative validation and verification aspects of the study.

      Major comments:

      (1) The title and paper stress the importance of validation on several occasions. However, the actual validation performed is limited to the section in lines 427-439. Furthermore, it is entirely qualitative, making assessing the model's quality difficult. Most of the paper is focused on sensitivity analysis, which is also interesting but unrelated to validation. Can you include a quantitative comparison with deformation biomarkers? E.g., spatially quantify strain differences between simulation and in vivo data, or overlay the current configuration of the geometry with MRI in various views, and calculate a displacement error norm.

      (2) You mention the ASME V&V40 standards throughout your paper. Yet, you only address the "second V" validation, ignoring the "first V" verification. How did you ensure that your computational models are implemented correctly?

      (3) All parameters discussed in this publication are physical parameters. What is the sensitivity of your model outputs concerning computational parameters?

    1. eLife Assessment

      This study presents significant and novel insights into the roles of zinc in mammalian meiosis/fertilization events. These findings are useful to our understanding of these processes. The evidence presented is solid, with experiments being well-designed, carefully described, and interpreted with appropriate rigor.

    2. Reviewer #1 (Public review):

      The authors investigated the role of the zinc transporter ZIP10 in regulating zinc sparks during fertilization in mice. By utilizing oocyte-specific Zip6 and Zip10 conditional knockout mice, the authors effectively demonstrate the importance of ZIP10 in zinc homeostasis, zinc spark generation, and early embryonic development. The study is overall useful as it identifies ZIP10 as an important component of oocyte processes that support embryo development, thus opening the door for further investigations. While the study provides solid evidence for the requirement of ZIP10 in the regulation of zinc sparks and zinc homeostasis, it falls short of revealing the underlying mechanism of how ZIP10 exerts this important function.

      (1) The zinc transporters the authors are knocking out are expressed in mouse oocytes through follicular development, and the Gdf9-cre driver used means these oocytes were grown in the absence of appropriate Zinc signaling. Thus, it would be difficult to assert that the lack of fertilization associated with zinc sparks is solely responsible for the failure of embryo development. Spindle morphology and other meiotic parameters do not necessarily report oocyte health, so normalcy of these features may not be a strong argument when it comes to metabolic issues.

      (2) While comparing ZIP6 and ZIP10 in the abstract provides context, focusing more on ZIP10 would improve reader comprehension, as ZIP10 is the primary focus of the study. Emphasizing the specific role of ZIP10 will help the reader grasp the core findings more clearly.

      (3) Zinc transporters ZIP6 and ZIP10 are expressed during follicular development, but the biological significance of the observation is not clearly addressed. The authors should investigate whether the ZIP6 and ZIP10 knockout affects follicular development and discuss the potential implications.

      (4) In Figure 3, the zinc fluorescence images are unclear, making it difficult for readers to interpret the data. Including snapshot images of calcium and zinc spikes as part of the main figure would improve clarity. Moreover, adding more comparative statements and a deeper explanation of why Zip10 KO mice exhibit normal calcium oscillations but lack zinc sparks would strengthen the manuscript.

      (5) While the study identifies the role of ZIP10 in zinc spark generation, it lacks a clear mechanistic insight. The topic itself is interesting, but without providing a more detailed explanation of the underlying mechanisms, the study leaves an important gap. Further discussion on the signaling pathways potentially involved in zinc spark regulation would add depth to the findings.

    3. Reviewer #2 (Public review):

      Summary:

      In this important study, the authors examine the role of two zinc uptake transporters, Zip6 and Zip10, which are important during the maturation of oocytes, and are critical for both successful fertilization and early embryogenesis.

      Strengths:

      The authors report that oocytes from Zip10 knockout mice exhibit lower labile zinc content during oocyte maturation, decreased amounts of zinc exocytosis during fertilization, and affect the rate of blastocyst generation in fertilized eggs relative to a control strain. They do not observe these changes in their Zip6 knockout animals. The authors present clear and well-documented results from a broad range of experimental modalities in support of their conclusions.

      Weaknesses:

      (1) The authors' statement that Zip10 is not expressed in the oocyte nuclei (line 252). Furthermore, in that study, ZIP10 was detected in the nuclear/nucleolar positions of oocytes of all follicular stages (Chen et al., 2023), which we did not observe. This is not supported by Figure 1, where some Zip10 signal is apparent in the primordial, primary, and secondary follicle oocytes. This statement should be corrected.

      (2) Based on the FluoZin-3AM data, there appears to be less labile zinc in the Zip10d/d oocyte, eggs, and embryos; however, FluoZin-3AM has a number of well-known artifacts and does not accurately capture the localization of labile zinc pools. The patterns do not correspond to the well-documented zinc-containing cortical vesicles. Another zinc probe, such as ZinPyr-4 or ZincBY-1 should be used to visualize the zinc vesicles and confirm that there is less labile zinc in these locations as well.

      (3) Line 268 The results indicate that ZIP10 is mostly responsible for the uptake of zinc ions in mouse oocytes. The situation seems a bit more complicated given that the differences in labile zinc content between oocytes from the WT and Zip10d/d animals are small (only 20-30 %) and that the zinc spark is diminished but still apparent at a low level in the Zip10d/d oocytes. Clearly, other factors are involved in zinc uptake at these stages. A variety of studies have suggested that Zip6 and Zip10 work together, perhaps even functioning as a heterodimer in some systems. The double KO would address this more clearly, but if it is not available, it might be more prudent to state that Zip10 plays some role in uptake of zinc in mouse oocytes while the role of Zip6 remains uncertain.

      (4) Zip6d/d oocytes did not have changes in labile zinc, nor did the lack of Zip6 have an impact on the zinc spark. However, Figure S1 does show a small amount of detectable Zip6 in the western blot. It is possible that this small amount could compensate for the complete lack of Zip6. Can ZIP6 be found in immunofluorescence of GV oocytes or MII eggs from the Zip6d/d animals? Additionally, it is possible that Zip6's role is only supplementary to that of Zip10. The authors should discuss this possibility. It would also be interesting to see if the Zip6/Zip10 double knockout displays greater defects compared to the Zip10 knockout when considering previous studies.

    1. eLife Assessment

      Inspired by bee's visual behavior, the goal of the manuscript is to develop a model of visual scanning, visual processing and learning to recognize visual patterns. In this model, pre-training with natural images leads to the formation of spatiotemporal receptive fields that can support associative learning. Due to an incomplete test of the necessity and sufficiency of the features included in the model, it cannot be concluded that the model is either the "minimal circuit" or the most biologically plausible circuit of this system. With a more in-depth analysis, the work has the potential of being important and very valuable to both experimental and computational neurobiologists.

    2. Reviewer #1 (Public Review):

      Insects, such as bees, are surprisingly good at recognizing visual patterns. How they achieve this challenging task with limited computational resources is not fully understood. Based on the actual bee's behaviour and visual circuit structure, MaBouDi et al. constructed a biologically plausible model where the circuit extracts essential visual features from scanned natural scenes. The model successfully discriminated a variety set of visual patterns as the actual bee does. By implementing a type of Hebb's rule for non-associative learning, an early layer of the model extracted orientational information from natural scenes essential to pattern recognition. Throughout the paper, the authors provided intuitive logic for how the relatively simple circuit could achieve pattern recognition. This work could draw broad attention not only in visual neuroscience but also in computer vision.

      However, there are a number of weaknesses in the manuscript. 1) The authors claim that the model is inspired by micromorphology, yet it does not rigorously follow the detailed anatomy of the insect brain revealed as of now. 2) Some claims sound a bit too strong compared to what the authors demonstrated with the model. For example, when the authors say the model is minimal, the authors simply investigated how many lobula neurons are required for pattern discrimination in the model. However, the manuscript appears to use this to claim that the presented model is the minimal one required for visual tasks. 3) It lacks explanations of what mechanisms in the model could discriminate some patterns but not others, making the descriptions very qualitative. 4) The authors did not provide compelling evidence that the algorithm is particularly tuned to natural scenes.

    3. Reviewer #2 (Public Review):

      This study is inspired by the scanning movements observed in bees when performing visual recognition tasks. It uses a multilayered network, representing stages of processing in the visual lobes (lamina, medulla, lobula), and uses the lobula output as input to a model of associative learning in the mushroom body (MB). The network is first trained with short "scanning" sequences of natural images, in a non-associative adaptation process, and then several experimental paradigms where images are rewarded or punished are simulated, with the output of the MB able to provide the appropriate discriminative decisions (in some but not all cases). The lobula receptive fields formed by the initial adaptation process show spatiotemporal tuning to edges moving at particular orientations and speeds that are comparable to recorded responses of such neurons in the insect brain.

      There are two main limitations to the study in my view. First, although described (caption fig 1) as a model "inspired by the micromorphology" of the insect brain, implying a significant degree of accuracy and detail, there are many arbitrary features (unsupported by current connectomics). For example, the strongly constrained delay line structure from medulla to­ lobula neurons, and the use of a single MB0N that has input synapses that undergo facilitation and decay according to different neuromodulators. Second, while it is reasonable to explore some arbitrary architectural features, given that not everything is yet known about these pathways, the presented work does not sufficiently assess the necessity and sufficiency of the different components, given the repeated claims that this is the "minimal circuit" required for the visual tasks explored.

      Regarding the mushroom body (MB) learning model, it is strange that no reference is made to recent models closely tied to connectomic and other data in fruit flies, which suggests separate MBONS encode positive vs. negative value; that learning is not dependent on MB0N activity (so is not STDP); that feedback from MBONs to dopaminergic signalling plays an important role, etc. Possibly the MB of the bee operates in a completely different way to the fly, but the presented model relies on relatively old data about MB function, mostly from insects other than bees (e.g. locust) so its relationship to the increasingly comprehensive understanding emerging for the fly MB needs to be clarified. It is implied that the complex interaction of the differential effects of dopamine and octopamine, as modelled here, are required to learn the more complex visual paradigms, but it is not actually tested if simpler rules might suffice. Also, given previous work on models of view recognition in the MB, inspired by bees and ants, it seems plausible that simply using static 25×25 medulla activity as input to produce sparse activity in the KCs would be sufficient for MB0N output to discriminate the patterns used in training, including the face stimulus. Thus it is not clear whether the spatiotemporal input and the lobula encoding are necessary to solve these tasks.

      It is also difficult to interpret the range of results in fig 3. The network sometimes learns well, sometimes just adequately (perhaps comparable to bees), and sometimes fails. The presentation of these results does not seem to identify any coherent pattern underlying success or failure, other than that the ability to generalise seems limited. That is, recognition (in most cases) requires the presentation of exactly the same stimulus in exactly the same way (same scanning pattern, distance and speed). In particular, it is hard to know what to conclude when the network appears able to learn some "complex patterns" (spirals, faces) but fails to learn the apparently simple plus vs. multiplication symbol discrimination if it is trained and tested with a scan passing across the whole pattern instead of just the lower half.

      In summary, although it is certainly interesting to explore how active vision (scanning a visual pattern) might affect the encoding of stimuli and the ability to learn to discriminate rewarding stimuli, some claims in the paper need to be tempered or better supported by the demonstration that alternative, equally plausible, models of the visual and mushroom body circuits are not sufficient to solve the given tasks.

    4. Reviewer #3 (Public Review):

      In this manuscript, the authors use the data collected and observations made on bees' scanning behaviour during visual learning to design a bio-inspired artificial neural network. The network follows the architecture of bees visual systems, where photoreceptors project into the lamina, then the medulla, medulla neurons connect to a set of spiking neurons in the lobula. Lobula neurons project to kenyon cells and then to MBON, which controls reward and punishment. The authors then test the performance of the network in comparison with real bee data, finding it to perform well in all tasks. The paper attempts to reproduce a living organism network with a practical application in mind, and it is quite impressive! I appreciate both the potential implications for the understanding of biological systems and the applications in the development of autonomous agents, making the paper absolutely worth reading.

      However, I believe that the current version somewhat lacks in clarity regarding the methodology and in some of the keywords used to describe the model.

      Definitions:

      Throughout the manuscript, the authors use some key terminology that I believe would benefit from some clarification.

      The generated model is described in the title and once in the introduction as "neuromorphic". The model is definitely bio-inspired, but at least in some layers of the neural network, the model is built very differently from actual brain connectivity. Generally, when we use the term neuromorphic we imply many advantages of neural tissue, like energy efficiency, that I am not sure the current model is achieving. I absolutely see how this work is going in that direction, and I also fundamentally agree with the choice of terminology, but this should be clearly explained to not risk over-implications

      The authors describe this as a model of "active vision". This is done in the title of the article, and in the many paragraph headings (methods, results). In the introduction, however, the term active vision is reserved to the description of bees' behavior. Indeed, the developed model is not a model of active vision, as this would require for the model to control the movement of the "camera". Here instead the stimuli display is given to the model in a fixed progression. What I suspect is that the authors' aim is to describe a model that supports the bees' active vision, not a model of active vision. I believe this should be very clear from the paper, and it may be appropriate to remove the term from the title.

      In the short title, it said that this network is minimal. This is then characterized in the introduction as the minimal network capable of enabling active vision in bees. The authors, however, in their experiment only vary the number of lobula neurons, without changing other parts of the architecture. Given this, we can only say that 16 lobula neurons is the minimal number required to solve the experimental task with the given model. I don't believe that this is generalizable to bees, nor that this network is minimal, as there may be different architectures (for the other layers especially) that require overall less neurons. Moreover, the tasks attempted in the minimal network experiment did not include any of the complex stimuli presented in figure 3, like faces. It may be that 16 lobula neurons are sufficient for the X vs + and clockwise vs counter-clockwise spirals, but we do not know if increasing stimuli complexity would result in a failure of the model with 16 neurons.

      Methodology:

      The current explanation of the model is currently a bit lacking in clarity and details. This risks impacting negatively on the relevance of the whole work which is interesting and worth reading! This issue affects also the interpretation of the results, as it is not clear to what extent each part of the network could affect the results shown. This is especially the case when the network under-performs with respect to the best performing scenario (e.g., when varying the speed and part of the pattern that is observed, such as in Fig 2C). Adding a detailed technical scheme/drawing specific to the network architecture could have been a way of significantly increasing the clarity of the Methods section and the interpretation of the results.

      On a similar note, the authors make some comparisons between the model and real bees. However, it remains unclear whether these similarities are actually indicative of an optimality in the bees visual scanning strategy, or just deriving from the authors design. This is for me particularly important in the experiments aimed at finding the best scanning procedure. If the initial model training is based on natural images it is performed by presenting left to right moving frames, the highest efficiency of lower-half scanning may be due to how the weights in the initial layers are structured and a low generalizability of the model, rather than to the strategy optimality

    5. Author response:

      Reviewer #1 (Public Review):

      Insects, such as bees, are surprisingly good at recognizing visual patterns. How they achieve this challenging task with limited computational resources is not fully understood. Based on the actual bee's behaviour and visual circuit structure, MaBouDi et al. constructed a biologically plausible model where the circuit extracts essential visual features from scanned natural scenes. The model successfully discriminated a variety set of visual patterns as the actual bee does. By implementing a type of Hebb's rule for non-associative learning, an early layer of the model extracted orientational information from natural scenes essential to pattern recognition. Throughout the paper, the authors provided intuitive logic for how the relatively simple circuit could achieve pattern recognition. This work could draw broad attention not only in visual neuroscience but also in computer vision.

      We appreciate your positive feedback.

      However, there are a number of weaknesses in the manuscript. 1) The authors claim that the model is inspired by micromorphology, yet it does not rigorously follow the detailed anatomy of the insect brain revealed as of now. 2) Some claims sound a bit too strong compared to what the authors demonstrated with the model. For example, when the authors say the model is minimal, the authors simply investigated how many lobula neurons are required for pattern discrimination in the model. However, the manuscript appears to use this to claim that the presented model is the minimal one required for visual tasks. 3) It lacks explanations of what mechanisms in the model could discriminate some patterns but not others, making the descriptions very qualitative. 4) The authors did not provide compelling evidence that the algorithm is particularly tuned to natural scenes.

      We appreciate the reviewer's constructive feedback and have revised the manuscript to clarify and strengthen our claims. Below, we address each of the concerns raised:

      (1) The model does not rigorously follow the detailed anatomy of the insect brain

      We acknowledge that our model is an abstraction rather than a direct reproduction of the full micromorphology of the insect brain. The goal of our study was not to replicate every anatomical feature but rather to extract the core computational principles underlying active vision, based on the functional activity of insect brain. Although the recent connectome studies provide detailed structural maps, they do not fully capture the functional dynamics of sensory processing and behavioural outcomes. Our model integrates key neurobiological insights, including the hierarchical structure of the optic lobes, lateral inhibition in the lobula, and non-associative learning mechanisms shaping spatiotemporal receptive fields.

      However, to address this concern, we have revised the introduction and discussion to explicitly acknowledge the model’s level of abstraction and its relationship to the known anatomy of the insect visual system. Furthermore, we highlight future directions in which connectomic data could refine our model.

      (2) Strength of claims regarding minimality of the model

      We appreciate the reviewer’s concern regarding the definition of a "minimal" model. Our intention was not to claim that this model represents the absolute minimal neural architecture for visual learning task but rather that it identifies a minimal set of necessary computational elements that enable pattern discrimination in insects. To clarify this, we have refined the text to ensure that our conclusions about minimality are explicitly tied to the specific constraints and assumptions of our model. For instance, in the revised manuscript, we emphasise that our findings demonstrate how the number of lobula neurons, inhibitory lateral connection, non-associative learning model, affect neural representation and discrimination performance, rather than establishing an absolute lower bound on the complexity required for visual processing in insects.

      (3) Mechanistic explanations for pattern discrimination

      Thank you for highlighting this point. We have conducted a more detailed analysis of the model’s response to different patterns and expanded our discussion of the underlying mechanisms. To address this, we have refined our explanation of how different scanning strategies and temporal integration mechanisms contribute to neural selectivity in the lobula and overall discrimination performance. Specifically:

      - Figure 3 illustrates how the model benefits from generating sparse coding in the visual network, leading to improved performance in pattern recognition tasks.

      - Figure 5 now includes a more detailed explanation of how different scanning strategies influence the selectivity and separability of lobula neuron responses. Additionally, we provide further analysis of why the model successfully discriminates certain patterns (e.g., simple oriented bars) but struggles with more complex spatially structured quadrant-based patterns.

      - We elaborate on how sequential sampling, temporal coding, and lateral inhibition collectively shape neural representations, enabling the model to distinguish between visual stimuli effectively.

      These refinements provide a clearer mechanistic explanation of the model’s strengths and limitations, ensuring a more comprehensive understanding of its function.

      (4) Evidence that the model is tuned to natural scenes

      We have revised the manuscript to provide stronger support for the claim that the model is particularly adapted to natural scenes. Specifically:

      - Figure 3 demonstrates that training on natural images leads to sparse, decorrelated responses in the lobula, a hallmark of efficient coding observed in biological systems.

      - Supplementary Figure 2-1B shows that training with shuffled images fails to produce structured receptive fields, reinforcing that the statistical structure of natural images is crucial for efficient learning.

      - We now explicitly discuss how the receptive fields emerging from non-associative learning align with known orientation-selective responses in insect visual neurons, supporting the idea that the model is optimised for processing natural visual inputs (Figures 3, 6) and discussion section.

      Taken together, these revisions clarify how the model captures fundamental principles of insect vision without making overly strong claims about biological fidelity. We thank the reviewer for these insightful comments; addressing them has significantly strengthened the clarity and depth of our manuscript.

      Reviewer #2 (Public Review):

      This study is inspired by the scanning movements observed in bees when performing visual recognition tasks. It uses a multilayered network, representing stages of processing in the visual lobes (lamina, medulla, lobula), and uses the lobula output as input to a model of associative learning in the mushroom body (MB). The network is first trained with short "scanning" sequences of natural images, in a non-associative adaptation process, and then several experimental paradigms where images are rewarded or punished are simulated, with the output of the MB able to provide the appropriate discriminative decisions (in some but not all cases). The lobula receptive fields formed by the initial adaptation process show spatiotemporal tuning to edges moving at particular orientations and speeds that are comparable to recorded responses of such neurons in the insect brain.

      There are two main limitations to the study in my view. First, although described (caption fig 1) as a model "inspired by the micromorphology" of the insect brain, implying a significant degree of accuracy and detail, there are many arbitrary features (unsupported by current connectomics). For example, the strongly constrained delay line structure from medulla to­ lobula neurons, and the use of a single MB0N that has input synapses that undergo facilitation and decay according to different neuromodulators. Second, while it is reasonable to explore some arbitrary architectural features, given that not everything is yet known about these pathways, the presented work does not sufficiently assess the necessity and sufficiency of the different components, given the repeated claims that this is the "minimal circuit" required for the visual tasks explored.

      We appreciate your feedback and have refined the manuscript to clarify model design choices and address concerns regarding minimality.

      (1) Model Architecture and Functional Simplifications<br /> While our model is inspired by insect visual system, it is not intended as an exact anatomical reconstruction but rather a functional abstraction to uncover key computational principles of active vision and visual learning. The delay-line structure and simplified MBON implementation were deliberate choices to enable spatiotemporal encoding and associative learning without overcomplicating the model. As connectome data alone do not fully reveal functional relationships, our approach serves as a hypothesis-generating tool for future neurobiological studies.

      (2) Necessity and Sufficiency of Model Components<br /> We have removed overstatements about minimality and now clarify that our model represents a functional circuit rather than the absolute minimal configuration. Additionally, we conducted new control experiments assessing the influence of different model components, and further justifying key mechanisms such as spatiotemporal encoding and lateral inhibition.

      For a more detailed discussion of these revisions and improvements, please refer to our response to the Journal, above.

      Regarding the mushroom body (MB) learning model, it is strange that no reference is made to recent models closely tied to connectomic and other data in fruit flies, which suggests separate MBONS encode positive vs. negative value; that learning is not dependent on MB0N activity (so is not STDP); that feedback from MBONs to dopaminergic signalling plays an important role, etc. Possibly the MB of the bee operates in a completely different way to the fly, but the presented model relies on relatively old data about MB function, mostly from insects other than bees (e.g. locust) so its relationship to the increasingly comprehensive understanding emerging for the fly MB needs to be clarified. It is implied that the complex interaction of the differential effects of dopamine and octopamine, as modelled here, are required to learn the more complex visual paradigms, but it is not actually tested if simpler rules might suffice. Also, given previous work on models of view recognition in the MB, inspired by bees and ants, it seems plausible that simply using static 25×25 medulla activity as input to produce sparse activity in the KCs would be sufficient for MB0N output to discriminate the patterns used in training, including the face stimulus. Thus it is not clear whether the spatiotemporal input and the lobula encoding are necessary to solve these tasks.

      Thank you for your suggestion. The primary focus of this study was not to uncover the exact mechanisms of associative learning in the mushroom body (MB) but rather to evaluate the role of lobula output activity in active vision. The associative learning component was included as a simplified mechanism to assess how the spatiotemporal encoding in the lobula contributes to visual pattern learning.

      We conducted a detailed analysis of lobula neuron activity, focusing on sparsity, decorrelation, and selectivity to demonstrate how the visual system extracts compact yet relevant signals before reaching the learning centre (see Figure 5). Theoretical predictions based on these findings suggest that such encoding enhances pattern recognition performance. While selecting this possible associative learning mechanism allowed us to explicitly evaluate this capability, it also facilitated comparison with previous active vision experiments and assessed the influence of different components on bee behaviour.

      We acknowledge that recent Drosophila connectomics studies suggest alternative MB architectures, including separate MBONs encoding positive vs. negative values, learning mechanisms independent of MBON activity, and feedback from MBONs to dopaminergic pathways. However, visual learning mechanisms in the MB remain poorly characterised, especially in bees, where the functional relevance of different MBON configurations is still unclear. The decision to simplify the MB learning process was intentional, allowing us to prioritise model interpretability over anatomical replication.

      These simplifications have been explicitly discussed in the revised manuscript, where we suggest future directions for integrating more biologically detailed MB models to enhance our understanding of active visual learning in insects. For a broader discussion of our rationale for prioritising computational simplifications over direct neurobiological replication, please refer to our response to the Journal, above.

      It is also difficult to interpret the range of results in fig 3. The network sometimes learns well, sometimes just adequately (perhaps comparable to bees), and sometimes fails. The presentation of these results does not seem to identify any coherent pattern underlying success or failure, other than that the ability to generalise seems limited. That is, recognition (in most cases) requires the presentation of exactly the same stimulus in exactly the same way (same scanning pattern, distance and speed). In particular, it is hard to know what to conclude when the network appears able to learn some "complex patterns" (spirals, faces) but fails to learn the apparently simple plus vs. multiplication symbol discrimination if it is trained and tested with a scan passing across the whole pattern instead of just the lower half.

      We acknowledge that the variability in the model’s performance across different tasks and conditions required a clearer explanation. In the revised manuscript, we have analysed the underlying factors influencing success and failure in greater detail and have expanded the discussion on the model’s generalisation limitations.

      To address this, we have conducted new control experiments and deeper analyses, now presented in Figure 5, Figure 6F, which illustrate how scanning conditions impact recognition performance. Specifically, we examine why the model can successfully learn complex patterns (e.g., spirals, faces) but struggles with apparently simpler tasks, such as distinguishing between a plus and multiplication symbol when scanning the entire pattern rather than just the lower half. Our results suggest that spatially constrained scanning enhances discriminability, while whole-pattern scanning reduces selectivity due to weaker and less sparse feature encoding in lobula neurons.

      We have also clarified in the Discussion section that while the model demonstrates robust pattern learning under specific conditions, its ability to generalise remains limited when tested with compex patterns (Figure 6F. Further investigation is needed to explore how adaptive scanning strategies or hierarchical processing might improve generalisation.

      In summary, although it is certainly interesting to explore how active vision (scanning a visual pattern) might affect the encoding of stimuli and the ability to learn to discriminate rewarding stimuli, some claims in the paper need to be tempered or better supported by the demonstration that alternative, equally plausible, models of the visual and mushroom body circuits are not sufficient to solve the given tasks.

      There is limited knowledge in the literature regarding the neural correlates of visual-related plasticity in the mushroom body (MB). The majority of our current understanding of the MB is derived from studies on olfactory learning, particularly in Drosophila, which does not provide sufficient data to directly implement or comprehensively compare alternative models for visual learning.

      However, the primary focus of our study is on active vision and how spatiotemporal signals are encoded in the insect visual system. Rather than aiming to replicate a detailed biological model of MB function, we intentionally employed a simplified associative learning network to investigate how neural activity emerging from our visual processing model can support pattern recognition. This approach also allows us to compare model performance with bee behaviour, drawing on insights from previous experimental work on active vision in bees.

      We now discuss the limitations of our approach and the rationale for selectively incorporating specific neural network components in lines 652-677. Additionally, we have provided further justification (see responses above) for prioritising a simplified model, rather than attempting to mimic a highly detailed, yet currently unverified, alternative learning circuit. These clarifications help ensure that our claims are appropriately tempered while still demonstrating the functional relevance of our model.

      Reviewer #3 (Public Review):

      In this manuscript, the authors use the data collected and observations made on bees' scanning behaviour during visual learning to design a bio-inspired artificial neural network. The network follows the architecture of bees visual systems, where photoreceptors project into the lamina, then the medulla, medulla neurons connect to a set of spiking neurons in the lobula. Lobula neurons project to kenyon cells and then to MBON, which controls reward and punishment. The authors then test the performance of the network in comparison with real bee data, finding it to perform well in all tasks. The paper attempts to reproduce a living organism network with a practical application in mind, and it is quite impressive! I appreciate both the potential implications for the understanding of biological systems and the applications in the development of autonomous agents, making the paper absolutely worth reading.

      Thank you for your positive feedback and appreciation of our work.

      However, I believe that the current version somewhat lacks in clarity regarding the methodology and in some of the keywords used to describe the model.

      Definitions:<br /> Throughout the manuscript, the authors use some key terminology that I believe would benefit from some clarification.<br /> The generated model is described in the title and once in the introduction as "neuromorphic". The model is definitely bio-inspired, but at least in some layers of the neural network, the model is built very differently from actual brain connectivity. Generally, when we use the term neuromorphic we imply many advantages of neural tissue, like energy efficiency, that I am not sure the current model is achieving. I absolutely see how this work is going in that direction, and I also fundamentally agree with the choice of terminology, but this should be clearly explained to not risk over-implications

      We appreciate the reviewer’s feedback and acknowledge the importance of clarifying key terminology in our manuscript. As outlined in our response to the Journal, we intentionally simplified the model to focus on understanding the core computational processes involved in active vision rather than precisely replicating the full complexity of insect neural circuits (see other reasons for simplification in the manuscript). This simplification allows us to systematically analyse the influence of specific components underlying active vision mechanisms.

      Despite these simplifications, our model incorporates key neuromorphic principles, including the use of a recurrent neural network architecture and a spiking neuron model at multiple processing levels. These elements enable biologically inspired information processing, aligning with the fundamental characteristics of neuromorphic computing, even if the model does not explicitly focus on hardware efficiency or energy constraints.

      The authors describe this as a model of "active vision". This is done in the title of the article, and in the many paragraph headings (methods, results). In the introduction, however, the term active vision is reserved to the description of bees' behavior. Indeed, the developed model is not a model of active vision, as this would require for the model to control the movement of the "camera". Here instead the stimuli display is given to the model in a fixed progression. What I suspect is that the authors' aim is to describe a model that supports the bees' active vision, not a model of active vision. I believe this should be very clear from the paper, and it may be appropriate to remove the term from the title.

      While our model does not actively control camera movement in the environment, it does simulate the effects of active vision by incorporating scanning dynamics. Our results demonstrate that model responses change significantly with variations in scanning speed and restricted scanning areas, highlighting the importance of movement in shaping visual encoding. However, we acknowledge that true active vision would involve adaptive, real-time control of gaze or trajectory, which the step after the current implementation for make more realistic model of active vison. To address your concern, we have discussed the potential for incorporating dynamic flight behaviours in future studies, allowing the model to actively adjust its scanning strategy based on learned visual cues.

      In the short title, it said that this network is minimal. This is then characterized in the introduction as the minimal network capable of enabling active vision in bees. The authors, however, in their experiment only vary the number of lobula neurons, without changing other parts of the architecture. Given this, we can only say that 16 lobula neurons is the minimal number required to solve the experimental task with the given model. I don't believe that this is generalizable to bees, nor that this network is minimal, as there may be different architectures (for the other layers especially) that require overall less neurons. Moreover, the tasks attempted in the minimal network experiment did not include any of the complex stimuli presented in figure 3, like faces. It may be that 16 lobula neurons are sufficient for the X vs + and clockwise vs counter-clockwise spirals, but we do not know if increasing stimuli complexity would result in a failure of the model with 16 neurons.

      We agree that analysing only the number of lobula neurons is not sufficient to establish a truly minimal model for active vision. To address this, we conducted further control experiments to evaluate the influence of other key components, including non-associative learning, scanning behaviour, and lateral connectivity, on model performance. Our results suggest that the proposed model represents a computationally minimal network capable of implementing a basic active vision process, but a more complex model would be required for higher-order visual tasks.

      However, to avoid potential misinterpretation, we have revised the short title and updated the manuscript to clarify that our model identifies a possible minimal functional circuit rather than the absolute minimal network for active vision. Additionally, we have added further discussion on the simplifications made in the model and emphasised the need for future studies to explore alternative architectures and assess their relevance for understanding active vision in insects.

      Methodology:

      The current explanation of the model is currently a bit lacking in clarity and details. This risks impacting negatively on the relevance of the whole work which is interesting and worth reading! This issue affects also the interpretation of the results, as it is not clear to what extent each part of the network could affect the results shown. This is especially the case when the network under-performs with respect to the best performing scenario (e.g., when varying the speed and part of the pattern that is observed, such as in Fig 2C). Adding a detailed technical scheme/drawing specific to the network architecture could have been a way of significantly increasing the clarity of the Methods section and the interpretation of the results.<br /> On a similar note, the authors make some comparisons between the model and real bees. However, it remains unclear whether these similarities are actually indicative of an optimality in the bees visual scanning strategy, or just deriving from the authors design. This is for me particularly important in the experiments aimed at finding the best scanning procedure. If the initial model training is based on natural images it is performed by presenting left to right moving frames, the highest efficiency of lower-half scanning may be due to how the weights in the initial layers are structured and a low generalizability of the model, rather than to the strategy optimality

      We appreciate the reviewer’s constructive feedback and have taken steps to enhance the clarity, interpretability, and transparency of our model description and results. Below, we address the concerns regarding model explanation, performance interpretation, and the comparison with real bee behaviour.

      (1) Improved Model Explanation and Network Clarity: We apologise that the previous version of the manuscript did not fully detail the architecture and functioning of the model. To address this, we have expanded the Methods section with a more detailed breakdown of the network components, their roles, and their contribution to active vision processing. Additionally, we have summarised the network architecture and its implementation for visual learning tasks at the beginning of the Results section, providing a clearer overview of the information flow from visual input to associative learning. Furthermore, we have explicitly analysed and discussed the role of key model components, including scanning strategies, lateral connectivity, and non-associative learning mechanisms, clarifying how each contributes to the observed results.

      (2) Interpretation of Model Performance Variability: Understanding the factors influencing performance variability is crucial, and to improve clarity, we have conducted further analysis of model performance across different conditions, particularly examining the effects of scanning speed, spatial constraints, and feature encoding (see Figure 2C). Additionally, we have expanded the discussion on how scanning conditions impact performance, providing explanations for why some conditions lead to higher or lower discrimination success. Furthermore, we have clarified why certain stimuli present greater challenges for the model, linking these difficulties to receptive field properties and scanning dynamics.

      (3) Comparison Between Model Behaviour and Real Bees: To address your concern regarding the link between scanning preferences and true biological optimality, we have included further analysis discussing the influence of training conditions on the model’s learned behaviours. Additionally, we propose future experiments to test alternative scanning strategies, including adaptive scanning mechanisms that adjust based on visual task demands. Furthermore, we have expanded the discussion on the simplifications made in this study, explicitly stating the limitations of the model and emphasising the need for future research to explore more flexible and biologically plausible scanning mechanisms.

      We believe these revisions significantly enhance the clarity and interpretability of the study, ensuring that the model’s findings are well contextualised within both computational and biological frameworks.

    1. eLife Assessment

      In their valuable study, Bracey et al. investigate how microtubule organization within pancreatic islet beta cells supports optimal insulin secretion. Using a combination of live imaging and photo-kinetic assays in an in vitro culture system, they provide compelling evidence that kinesin-1-mediated microtubule sliding, which plays critical roles in neurons and embryos, also plays a critical role in forming the sub-membranous microtubule band in response to glucose in beta cells. This work will be of interest to cell biologists studying cytoskeletal dynamics and organelle trafficking, as well as to translational biologists focused on diabetes.

    2. Joint Public Review:

      This elegant study provides important insights into the organization of sub-membrane microtubules in pancreatic β-cells, highlighting a key role for the motor protein KIF5B. The authors propose that KIF5B drives microtubule sliding and alignment along the plasma membrane, a process enhanced by high glucose levels. This precise microtubule arrangement is essential for regulated secretion in β-cells. Supporting this model, the authors show that KIF5B is more highly expressed than other kinesins in MIN6 cells, and its depletion via shRNA disrupts sub-membrane microtubule density and organization. In contrast, KIF5A knockdown alters overall microtubule architecture. Using a dominant-negative approach, they further demonstrate that KIF5B-mediated microtubule sliding relies on its tail domain and is stimulated by glucose, paralleling known glucose-dependent increases in kinesin-1 activity.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Specific comments:

      (1) It is difficult to appreciate that there is a "peripheral sub-membrane microtubule array" as it is not well defined in the manuscript. This reviewer assumes that this is in the respective field clear. Yet, while it is appreciated that there is an increased amount of MTs close to the cytoplasmic membrane, the densities appear very variable along the membrane. Please provide a clear description in the Introduction what is meant with "peripheral sub-membrane microtubule array".

      A definition has been added to the Introduction.

      (2) The authors described a "consistent presence of a significant peripheral array in the C57BL/ 6J control mice, while the KO counterparts exhibited a partial loss of this peripheral bundle.

      Specifically, the measured tubulin intensity at the cell periphery was significantly reduced in the KO mice compared to their wild-type counterparts". In vitro "control cells had convoluted nonradial MTs with a prominent sub-membrane array, typical for β cells (Fig. 2A), KIF5B-depleted cells featured extra-dense MTs in the cell center and sparse receding MTs at the periphery (Fig. 2B,C)". Please comment/discuss why in vivo there are no "extra-dense MTs in the cell center".

      We now add a discussion of this point, which we believe could be a manifestation of 3D shape of a beta cell in tissue and/or compensatory mechanisms in organisms.

      (3) Authors should include in the Discussion a paragraph discussing the fact that small changes in MT configuration can have strong effects.

      A paragraph added to the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: Even though the reviewer appreciates that minor changes of MT configuration have severe effects, still the overall effects appear minor (40 vs. <50% or 35% vs. around 28%). Notably, there are no statistically significant differences in the different groups in Fig. 1Suppl-Fig.1 D. This reviewer is not sure if the combination of many not significantly different data points can result in significant changes and this should be checked by a statistician. Authors should include in the Discussion a paragraph discussing the fact that small changes in MT configuration can have strong effects.

      We have now added the requested paragraph to the discussion. Indeed, the differences are small, and the significance is only detected in a data set with a large sample size in Fig. 1J,K (combined data sets with smaller sizes from Fig. 1-Suppl-Fig.1 D), consistent with the fact that a larger sample size generally provides more power to detect an effect.

      (2) Unfortunately, the authors cannot block kinesin-1 resulting in microtubule accumulation in the cell center and then release the block (best inhibiting microtubule formation), to show that the MTs accumulated in the cell center will be transported to the periphery.

      This is indeed the case at the moment, yes.

      Minor comments:

      - Abstract: β-cells vs. β cells (and throughout the manuscript)

      - Page 4: "MTOC, the Golgi, (Trogden et al. 2019), and"

      - Page 5: "β-cell specific"

      - MT-sliding vs. MT sliding

      - Kinesin 1 vs. kinesin-1

      - Page 6, line 1: "β cells. actively"

      - Page 7: "a microtubule probe", should be "MT"

      - Page 9: "1μm" vs. "1 μm"

      - Page 10: "demonstrate a dramatic effect" recommended is: "demonstrate a marked effect"

      - Page 13, line 1: dramatically vs. markedly

      - Page 13, line 5: "50μm" vs. "50 μm" (in general, there should be a space between number and unit?)

      - "37 degrees C" vs. "37{degree sign}C"

      - Animal protocol number?

      - "Mice were euthanized by isoflurane inhalation"? What concentration? How long? More details are needed (no cervical dislocation?).

      - Antibodies: more identifiers are needed.

      - Antibody information in Key reagents and under 5. Reagents and antibodies do not fit (1:500 and 1:1000).

      Thank you, we corrected all relevant information now.

    1. eLife Assessment

      This fundamental study provides new insights into the maturation of ribbon synapses in zebrafish neuromast hair cells. Live-cell imaging and pharmacological and genetic manipulations together provide compelling evidence that the formation of this synaptic organelle is a dynamic process involving the fusion of presynaptic elements and microtubule transport, though the evidence that ribbon precursors move in a directed motion toward the active zone is less persuasive. These findings will be of interest to neuroscientists studying synapse formation and function and should inspire further research into the molecular basis for synaptic ribbon maturation.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors set out to resolve a long-standing mystery in the field of sensory biology - how large, presynaptic bodies called "ribbon synapses" migrate to the basolateral end of hair cells. The ribbon synapse is found in sensory hair cells and photoreceptors, and is a critical structural feature of a readily releasable pool of glutamate that excites postsynaptic afferent neurons. For decades, we have known these structures exist, but the mechanisms that control how ribbon synapses coalesce at the bottom of hair cells is not well understood. The authors addressed this question by leveraging the highly-tractable zebrafish lateral line neuromast, which exhibits a small number of visible hair cells, easily observed in time-lapse imaging. The approach combined genetics, pharmacological manipulations, high-resolution imaging and careful quantifications. The manuscript commences with a developmental time course of ribbon synapse development, characterizing both immature and mature ribbon bodies (defined by position in the hair cell, apical vs. basal). Next, the authors show convincing (and frankly mesmerizing) imaging data of plus end-directed microtubule trafficking toward the basal end of the hair cells, and data highlighting the directed motion of ribbon bodies. The authors then use a series of pharmacological and genetic manipulations showing the role of microtubule stability and one particular kinesin (Kif1aa) in the transport and fusion of ribbon bodies, which is presumably all prerequisite for hair cell synaptic transmission. The data suggest that microtubules and their stability is necessary for normal numbers of mature ribbons, and that Kif1aa is likely required for fusion events associated with ribbon maturation. Overall, the data provide a new and interesting story on ribbon synapse dynamics.

      Strengths:

      (1) The manuscript offers comprehensive Introduction and Discussion sections that will inform generalists and specialists.<br /> (2) The use of Airyscan imaging in living samples to view and measure microtubule and ribbon dynamics in vivo represents a strength. With the rigorous quantification and thoughtful analyses, the authors generate datasets often only gotten in cultured cells or more diminutive animal models (e.g., C. elegans).<br /> (3) The number of biological replicates and the statistical analyses are strong. The combination of pharmacology and genetic manipulations also represents strong rigor.<br /> (4) One of the most important strengths is that the manuscript and data spur on other questions - namely, do (or how do) ribbon bodies attach to Kinesin proteins? Also, and as noted in the Discussion, do hair cell activity and subsequent intracellular calcium rises facilitate ribbon transport/fusion.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript uses live imaging to study the role of microtubules in the movement of ribeye aggregates in neuromast hair cells in zebrafish. The main findings are that

      (1) Ribeye aggregates, assumed to be ribbon precursors, move in a directed motion toward the active zone;<br /> (2) Disruption of microtubules and kif1aa increases the number of ribeye aggregates and decreases the number of mature synapses.

      The evidence for point 2 is compelling, while the evidence for point 1 is less convincing. In particular, the directed motion conclusion is dependent upon fitting of mean squared displacement that can be prone to error and variance to do stochasticity, which is not accounted for in the analysis. Only a small subset of the aggregates meet this criteria and one wonders whether the focus on this subset misses the bigger picture of what is happening with the majority of spots.

      Strengths:

      (1) The effects of Kif1aa removal and nocodozole on ribbon precursor number and size is convincing and novel.<br /> (2) The live imaging of Ribeye aggregate dynamics provides interesting insight into ribbon formation. The movies showing fusion of ribeye spots are convincing and the demonstrated effects of nocodozole and kif1aa removal on the frequency of these events is novel.<br /> (3) The effect of nocodozole and kif1aa removal on precursor fusion is novel and interesting.<br /> (4) The quality of the data is extremely high and the results are interesting.

      Weaknesses:

      (1) To image ribeye aggregates, the investigators overexpressed Ribeye-a TAGRFP under control of a MyoVI promoter. While it is understandable why they chose to do the experiments this way, expression is not under the same transcriptional regulation as the native protein and some caution is warranted in drawing some conclusions. For example, the reduction in the number of puncta with maturity may partially reflect regulation of the MyoVI promoter with hair cell maturity. Similarly, it is unknown whether overexpression has the potential to saturate binding sites (for example to motors), which could influence mobility. In the revised manuscript, the authors provide evidence to suggest that overexpression is not at unreasonably high levels, which is reasonable. However, I think it remains important to think of these caveats while reading the paper--especially keeping in mind that expression timing is undoubtedly influenced by the transcriptional control of the exogenous promoter .<br /> (2) The examples of punctae colocalizing with microtubules look clear (fig 1 F-G), but the presentation is anecdotal. It would be better and more informative, if quantified.<br /> (3) It appears that any directed transport may be rare. Simply having an alpha >1 is not sufficient to declare movement to be directed (motor driven transport typically has an alpha approaching 2). Due to randomness of a random walk and errors in fits in imperfect data will yield some spread in movement driven by Brownian motion. Many of the tracks in figure 3H look as thought they might be reasonably fit by a straight line (i.e. alpha = 1).<br /> (4) The "directed motion" shown here does not really resemble motor driven transport observed in other systems (axonal transport, for example) even in the subset that have been picked out as examples here. While the role for microtubules and kif1aa in synapse maturation is strong, it seems likely that this role may be something non-canonical (which would be interesting). In the revision, the authors do an excellent job of considering the issues brought up in point 3 and 4. While perhaps no longer a weakness, I am leaving the critiques here for context for the readers to consider. The added taxol results may not completely settle the issue, but are interesting and provide important information.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Hussain and collaborators aims at deciphering the microtubule-dependent ribbon formation in zebrafish hair cells. By using confocal imaging, pharmacology tools, and zebrafish mutants, the group of Katie Kindt convincingly demonstrated that ribbon, the organelle that concentrates glutamate-filled vesicles at the hair cell synapse, originates from the fusion of precursors that move along the microtubule network. This study goes hand in hand with a complementary paper (Voorn et al.) showing similar results in mouse hair cells.

      Strengths:

      This study clearly tracked the dynamics of the microtubules, and those of the microtubule-associated ribbons and demonstrated fusion ribbon events. In addition, the authors have identified the critical role of kinesin Kif1aa in the fusion events. The results are compelling and the images and movies are magnificent.

      Weaknesses:

      The lack of functional data regarding the role of Kif1aa. Although it is difficult to probe and interpret the behavior of zebrafish after nocodazole treatment, I wonder whether deletion of kif1aa in hair cells may result in a functional deficit that could be easily tested in zebrafish?

      We have examined functional deficits in kif1aa mutants in another paper that was recently accepted: David et al. 2024. https://pubmed.ncbi.nlm.nih.gov/39373584/

      In David et al., we found that in addition to a subtle role in ribbon fusion during development, Kif1aa plays a major role in enriching glutamate-filled synaptic vesicles at the presynaptic active zone of mature hair cells. In kif1aa mutants, synaptic vesicles are no longer enriched at the hair cell base, and there is a reduction in the number of synaptic vesicles associated with presynaptic ribbons. Further, we demonstrated that kif1aa mutants also have functional defects including reductions in spontaneous vesicle release (from hair cells) and evoked postsynaptic calcium responses. Behaviorally, kif1aa mutants exhibit impaired rheotaxis, indicating defects in the lateral-line system and an inability to accurately detect water flow. Because our current paper focuses on microtubule-associated ribbon movement and dynamics early in hair-cell development, we have only discussed the effects of Kif1aa directly related to ribbon dynamics during this time window. In our revision, we have referenced this recent work. Currently it is challenging to disentangle how the subtle defects in ribbon formation in kif1aa mutants contribute to the defects we observe in ribbon-synapse function.

      Added to results:

      “Recent work in our lab using this mutant has shown that Kif1aa is responsible for enriching glutamate-filled vesicles at the base of hair cells. In addition this work demonstrated that loss of Kif1aa results in functional defects in mature hair cells including a reduction in evoked post-synaptic calcium responses (David et al., 2024). We hypothesized that Kif1aa may also be playing an earlier role in ribbon formation.”

      Impact:

      The synaptogenesis in the auditory sensory cell remains still elusive. Here, this study indicates that the formation of the synaptic organelle is a dynamic process involving the fusion of presynaptic elements. This study will undoubtedly boost a new line of research aimed at identifying the specific molecular determinants that target ribbon precursors to the synapse and govern the fusion process.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors set out to resolve a long-standing mystery in the field of sensory biology - how large, presynaptic bodies called "ribbon synapses" migrate to the basolateral end of hair cells. The ribbon synapse is found in sensory hair cells and photoreceptors, and is a critical structural feature of a readily-releasable pool of glutamate that excites postsynaptic afferent neurons. For decades, we have known these structures exist, but the mechanisms that control how ribbon synapses coalesce at the bottom of hair cells are not well understood. The authors addressed this question by leveraging the highly-tractable zebrafish lateral line neuromast, which exhibits a small number of visible hair cells, easily observed in time-lapse imaging. The approach combined genetics, pharmacological manipulations, high-resolution imaging, and careful quantifications. The manuscript commences with a developmental time course of ribbon synapse development, characterizing both immature and mature ribbon bodies (defined by position in the hair cell, apical vs. basal). Next, the authors show convincing (and frankly mesmerizing) imaging data of plus end-directed microtubule trafficking toward the basal end of the hair cells, and data highlighting the directed motion of ribbon bodies. The authors then use a series of pharmacological and genetic manipulations showing the role of microtubule stability and one particular kinesin (Kif1aa) in the transport and fusion of ribbon bodies, which is presumably a prerequisite for hair cell synaptic transmission. The data suggest that microtubules and their stability are necessary for normal numbers of mature ribbons and that Kif1aa is likely required for fusion events associated with ribbon maturation. Overall, the data provide a new and interesting story on ribbon synapse dynamics.

      Strengths:

      (1) The manuscript offers a comprehensive Introduction and Discussion sections that will inform generalists and specialists.

      (2) The use of Airyscan imaging in living samples to view and measure microtubule and ribbon dynamics in vivo represents a strength. With rigorous quantification and thoughtful analyses, the authors generate datasets often only obtained in cultured cells or more diminutive animal models (e.g., C. elegans).

      (3) The number of biological replicates and the statistical analyses are strong. The combination of pharmacology and genetic manipulations also represents strong rigor.

      (4) One of the most important strengths is that the manuscript and data spur on other questions - namely, do (or how do) ribbon bodies attach to Kinesin proteins? Also, and as noted in the Discussion, do hair cell activity and subsequent intracellular calcium rises facilitate ribbon transport/fusion?

      These are important strengths and as stated we are currently investigating what other kinesins and adaptors and adaptor’s transport ribbons. We have ongoing work examining how hair-cell activity impacts ribbon fusion and transport!

      Weaknesses:

      (1) Neither the data or the Discussion address a direct or indirect link between Kinesins and ribbon bodies. Showing Kif1aa protein in proximity to the ribbon bodies would add strength.

      This is a great point. Previous immunohistochemistry work in mice demonstrated that ribbons and Kif1a colocalize in mouse hair cells (Michanski et al, 2019). Unfortunately, the antibody used in study work did not work in zebrafish. To further investigate this interaction, we also attempted to create a transgenic line expressing a fluorescently tagged Kif1aa to directly visualize its association with ribbons in vivo. At present, we were unable to detect transient expression of Kif1aa-GFP or establish a transgenic line using this approach. While we will continue to work towards understanding whether Kif1aa and ribbons colocalize in live hair cells, currently this goal is beyond the scope of this paper. In our revision we discuss this caveat.

      Added to discussion:

      “In addition, it will be useful to visualize these kinesins by fluorescently tagging them in live hair cells to observe whether they associate with ribbons.”

      (2) Neither the data or Discussion address the functional consequences of loss of Kif1aa or ribbon transport. Presumably, both manipulations would reduce afferent excitation.

      Excellent point. Please see the response above to Reviewer #1 public response weaknesses.

      (3) It is unknown whether the drug treatments or genetic manipulations are specific to hair cells, so we can't know for certain whether any phenotypic defects are secondary.

      This is correct and a caveat of our Kif1aa and drug experiments. In our recently published work, we confirmed that Kif1aa is expressed in hair cells and neurons, while kif1ab is present just is neurons. Therefore, it is likely that the ribbon formation defects in kif1aa mutants are restricted to hair cells. We added this expression information to our results:

      “ScRNA-seq in zebrafish has demonstrated widespread co-expression of kif1ab and kif1aa mRNA in the nervous system. Additionally, both scRNA-seq and fluorescent in situ hybridization have revealed that pLL hair cells exclusively express kif1aa mRNA (David et al., 2024; Lush et al., 2019; Sur et al., 2023).”

      Non-hair cell effects are a real concern in our pharmacology experiments. To mitigate this in our pharmacological experiments, we have performed drug treatments at 3 different timescales: long-term (overnight), short-term (4 hr) and fast (30 min) treatments. The fast experiments were done after 30 min nocodazole drug treatment, and after this treatment we observed reduced directional motion and fusions. This fast drug treatment should not incur any long-term changes or developmental defects as hair-cell development occurs over 12-16 hrs. However, we acknowledge that drug treatments could have secondary phenotypic effects or effects that are not hair-cell specific. In our revision, we discuss these issues.

      Added to discussion:

      “Another important consideration is the potential off-target effects of nocodazole. Even at non-cytotoxic doses, nocodazole toxicity may impact ribbons and synapses independently of its effects on microtubules. While this is less of a concern in the short- and medium-term experiments (30-70 min and 4 hr), long-term treatments (16 hrs) could introduce confounding effects. Additionally, nocodazole treatment is not hair cell-specific and could disrupt microtubule organization within afferent terminals as well. Thus, the reduction in ribbon-synapse formation following prolonged nocodazole treatment may result from microtubule disruption in hair cells, afferent terminals, or a combination of the two.”

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses live imaging to study the role of microtubules in the movement of ribeye aggregates in neuromast hair cells in zebrafish. The main findings are that

      (1) Ribeye aggregates, assumed to be ribbon precursors, move in a directed motion toward the active zone;

      (2) Disruption of microtubules and kif1aa increases the number of ribeye aggregates and decreases the number of mature synapses.

      The evidence for point 2 is compelling, while the evidence for point 1 is less convincing. In particular, the directed motion conclusion is dependent upon fitting of mean squared displacement that can be prone to error and variance to do stochasticity, which is not accounted for in the analysis. Only a small subset of the aggregates meet this criteria and one wonders whether the focus on this subset misses the bigger picture of what is happening with the majority of spots.

      Strengths:

      (1) The effects of Kif1aa removal and nocodozole on ribbon precursor number and size are convincing and novel.

      (2) The live imaging of Ribeye aggregate dynamics provides interesting insight into ribbon formation. The movies showing the fusion of ribeye spots are convincing and the demonstrated effects of nocodozole and kif1aa removal on the frequency of these events is novel.

      (3) The effect of nocodozole and kif1aa removal on precursor fusion is novel and interesting.

      (4) The quality of the data is extremely high and the results are interesting.

      Weaknesses:

      (1) To image ribeye aggregates, the investigators overexpressed Ribeye-a TAGRFP under the control of a MyoVI promoter. While it is understandable why they chose to do the experiments this way, expression is not under the same transcriptional regulation as the native protein, and some caution is warranted in drawing some conclusions. For example, the reduction in the number of puncta with maturity may partially reflect the regulation of the MyoVI promoter with hair cell maturity. Similarly, it is unknown whether overexpression has the potential to saturate binding sites (for example motors), which could influence mobility.

      We agree that overexpression of transgenes under using a non-endogenous promoter in transgenic lines is an important consideration. Ideally, we would do these experiments with endogenously expressed fluorescent proteins under a native promoter. However, this was not technically possible for us. The decrease in precursors is likely not due to regulation by the myo6a promoter. Although the myo6a promoter comes on early in hair cell development, the promoter only gets stronger as the hair cells mature. This would lead to a continued increase rather than a decrease in puncta numbers with development.

      Protein tags such as tagRFP always have the caveat of impacting protein function. This is in partly why we complemented our live imaging with analyses in fixed tissue without transgenes (kif1aa mutants and nocodazole/taxol treatments).

      In our revision, we did perform an immunolabel on myo6b:riba-tagRFP transgenic fish and found that Riba-tagRFP expression did not impact ribbon synapse numbers or ribbon size. This analysis argues that the transgene is expressed at a level that does not impact ribbon synapses. This data is summarized in Figure 1-S1.

      Added to the results:

      “Although this latter transgene expresses Riba-TagRFP under a non-endogenous promoter, neither the tag nor the promoter ultimately impacts cell numbers, synapse counts, or ribbon size (Figure 1-S1A-E).”

      Added to methods:

      Tg(myo6b:ctbp2a-TagRFP)<sup>idc11Tg</sup> reliably labels mature ribbons, similar to a pan-CTBP immunolabel at 5 dpf (Figure 1-S1B). This transgenic line does not alter the number of hair cells or complete synapses per hair cell (Figure 1-S1A-D). In addition, myo6b:ctbp2a-TagRFP does not alter the size of ribbons (Figure 1-S1E).”

      (2) The examples of punctae colocalizing with microtubules look clear (Figures 1 F-G), but the presentation is anecdotal. It would be better and more informative, if quantified.

      We did attempt a co-localization analysis between microtubules and ribbons but did not move forward with it due to several issues:

      (1) Hair cells have an extremely crowded environment, especially since the nucleus occupies the majority of the cell. All proteins are pushed together in the small space surrounding the nucleus and ultimately, we found that co-localization analyses were not meaningful because the distances were too small.

      (2) We also attempted to segment microtubules in these images and quantify how many ribbons were associated with microtubules, but 3D microtubule segmentation was not accurate in hair cells due to highly varying filament intensities, filament dynamics and the presence of diffuse cytoplasmic tubulin signal.

      Because of these challenges we concluded the best evidence of ribbon-microtubule association is through visualization of ribbons and their association with microtubules over time (in our timelapses). We see that ribbons localize to microtubules in all our timelapses, including the examples shown (Movies S2-S10). The only instance of ribbon dissociation it when ribbons switch from one filament to another. We did not observe free-floating ribbons in our study.

      (3) It appears that any directed transport may be rare. Simply having an alpha >1 is not sufficient to declare movement to be directed (motor-driven transport typically has an alpha approaching 2). Due to the randomness of a random walk and errors in fits in imperfect data will yield some spread in movement driven by Brownian motion. Many of the tracks in Figure 3H look as though they might be reasonably fit by a straight line (i.e. alpha = 1).

      (4) The "directed motion" shown here does not really resemble motor-driven transport observed in other systems (axonal transport, for example) even in the subset that has been picked out as examples here. While the role of microtubules and kif1aa in synapse maturation is strong, it seems likely that this role may be something non-canonical (which would be interesting).

      Yes, it is true, that directed transport of ribbon precursors is relatively rare. Only a small subset of the ribbon precursors moves directionally (α > 1, 20 %) or have a displacement distance > 1 µm (36 %) during the time windows we are imaging. The majority of the ribbons are stationary. To emphasize this result we have added bar graphs to Figure 3I,K to illustrate this result and state the numbers behind this result more clearly.

      “Upon quantification, 20.2 % of ribbon tracks show α > 1, indicative of directional motion, but the majority of ribbon tracks (79.8 %) show α < 1, indicating confinement on microtubules (Figure 3I, n = 10 neuromasts, 40 hair cells, and 203 tracks).

      To provide a more comprehensive analysis of precursor movement, we also examined displacement distance (Figure 3J). Here, as an additional measure of directed motion, we calculated the percent of tracks with a cumulative displacement > 1 µm. We found 35.6 % of tracks had a displacement > 1 µm (Figure 3K; n = 10 neuromasts, 40 hair cells, and 203 tracks).”

      We cannot say for certain what is happening with the stationary ribbons, but our hypothesis is that these ribbons eventually exhibit directed motion sufficient to reach the active zone. This idea is supported by the fact that we see ribbons that are stationary begin movement, and ribbons that are moving come to a stop during the acquisition of our timelapses (Movies S4 and S5). It is possible that ribbons that are stationary may not have enough motors attached, or there may be a ‘seeding’ phase where Ribeye aggregates are condensing on the ribbon.

      We also reexamined our MSD a values as the a values we observed in hair cells were lower than those seen canonical motor-driven transport (where a approaches 2). One reason for this difference may arise from the dynamic microtubule network in developing hair cells, which could affect directional ribbon movement. In our revision we plotted the distribution of a values which confirmed that in control hair cells, the majority of the a values we see are typically less than 2 (Figure 7-S1A). Interestingly we also compared the distribution a values between control and taxol-treated hair cells, where the microtubule network is more stable, and found that the distribution shifted towards higher a values (Figure 7-S1A). We also plotted only ‘directional’ tracks (with a > 1) and observed significantly higher a values in taxol-treated hair cells (Figure 7-S1B). This is an interesting result which indicates that although the proportion of directional tracks (with a > 1) is not significantly different between control and taxol-treated hair cells (which could be limited by the number of motor/adapter proteins), the ribbons that move directionally do so with greater velocities when the microtubules are more stable. This supports our idea that the stability of the microtubule network could be why ribbon movement does not resemble canonical motor transport. This analysis is presented as a new figure (Figure 7-S1A-B) and is referred to in the text in the results and the discussion.

      Results:

      “Interestingly, when we examined the distribution of α values, we observed that taxol treatment shifted the overall distribution towards higher α a values (Figure 7-S1A). In addition, when we plotted only tracks with directional motion (α > 1), we found significantly higher α values in hair cells treated with taxol compared to controls (Figure 7-S1B). This indicates that in taxol-treated hair cells, where the microtubule network is stabilized, ribbons with directional motion have higher velocities.”

      Discussion:

      “Our findings indicate that ribbons and precursors show directed motion indicative of motor-mediated transport (Figure 3 and 7). While a subset of ribbons moves directionally with α values > 1, canonical motor-driven transport in other systems, such as axonal transport, can achieve even higher α values approaching 2 (Bellotti et al., 2021; Corradi et al., 2020). We suggest that relatively lower α values arise from the highly dynamic nature of microtubules in hair cells. In axons, microtubules form stable, linear tracks that allow kinesins to transport cargo with high velocity. In contrast, the microtubule network in hair cells is highly dynamic, particularly near the cell base. Within a single time frame (50-100 s), we observe continuous movement and branching of these networks. This dynamic behavior adds complexity to ribbon motion, leading to frequent stalling, filament switching, and reversals in direction. As a result, ribbon transport appears less directional than the movement of traditional motor cargoes along stable axonal filaments, resulting in lower α values compared to canonical motor-mediated transport. Notably, treatment with taxol, which stabilizes microtubules, increased α values to levels closer to those observed in canonical motor-driven transport (Figure 7-S1). This finding supports the idea that the relatively lower α values in hair cells are a consequence of a more dynamic microtubule network. Overall, this dynamic network gives rise to a slower, non-canonical mode of transport.”

      (5) The effect of acute treatment with nocodozole on microtubules in movie 7 and Figure 6 is not obvious to me and it is clear that whatever effect it has on microtubules is incomplete.

      When using nocodazole, we worked to optimize the concentration of the drug to minimize cytotoxicity, while still being effective. While the more stable filaments at the cell apex remain largely intact after nocodazole treatment, there are almost no filaments at the hair cell base, which is different from the wild-type hair cells. In addition, nocodazole-treated hair cells have more cytoplasmic YFP-tubulin signal compared to wild type. We have clarified this in our results. To better illustrate the effect of nocodazole and taxol we have also added additional side-view images of hair cells expressing YFP-tubulin (Figure 4-S1F-G), that highlight cytoplasmic YFP-tubulin and long, stabilized microtubules after 3-4 hr treatment with nocodazole and taxol respectively. In these images we also point out microtubules at the apical region of hair cells that are very stable and do not completely destabilize with nocodazole treatment at concentrations that are tolerable to hair cells.

      “We verified the effectiveness of our in vivo pharmacological treatments using either 500 nM nocodazole or 25 µM taxol by imaging microtubule dynamics in pLL hair cells (myo6b:YFP-tubulin). After a 30-min pharmacological treatment, we used Airyscan confocal microscopy to acquire timelapses of YFP-tubulin (3 µm z-stacks, every 50-100 s for 30-70 min, Movie S8). Compared to controls, 500 nM nocodazole destabilized microtubules (presence of depolymerized YFP-tubulin in the cytosol, see arrows in Figure 4-S1F-G) and 25 µM taxol dramatically stabilized microtubules (indicated by long, rigid microtubules, see arrowheads in Figure 4-S1F,H) in pLL hair cells. We did still observe a subset of apical microtubules after nocodazole treatment, indicating that this population is particularly stable (see asterisks in Figure 4-S1F-H).”

      To further address concerns about verifying the efficacy of nocodazole and taxol treatment on microtubules, we added a quantification of our immunostaining data comparing the mean acetylated-a-tubulin intensities between control, nocodazole and taxol-treated hair cells. Our results show that nocodazole treatment reduces the mean acetylated-a-tubulin intensity in hair cells. This is included as a new figure (Figure 4-S1D-E) and this result is referred to in the text. To better illustrate the effect of nocodazole and taxol we have also added additional side-view images of hair cells after overnight treatment with nocodazole and taxol (Figure 4-S1A-C).

      “After a 16-hr treatment with 250 nM nocodazole we observed a decrease in acetylated-a-tubulin label (qualitative examples: Figure 4A,C, Figure 4-S1A-B). Quantification revealed significantly less mean acetylated-a-tubulin label in hair cells after nocodazole treatment (Figure 4-S1D). Less acetylated-a-tubulin label indicates that our nocodazole treatment successfully destabilized microtubules.”

      “Qualitatively more acetylated-a-tubulin label was observed after treatment, indicating that our taxol treatment successfully stabilized microtubules (qualitative examples: Figure 4-S1A,C). Quantification revealed an overall increase in mean acetylated-a-tubulin label in hair cells after taxol treatment, but this increase did not reach significance (Figure 4-S1E).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The manuscript is fairly dense. For instance, some information is repeated (page 3 ribbon synapses form along a condensed timeline in zebrafish hair cells: 12-18 hrs, and on .page 5. These hair cells form 3-4 ribbon synapses in just 12-18 hrs). Perhaps, the authors could condense some of the ideas? The introduction could be shortened.

      We have eliminated this repeated text in our revision. We have shortened the introduction 1275 to 1038 words (with references)

      (2) The mechanosensory structure on page 5 is not defined for readers outside the field.

      Great point, we have added addition information to define this structure in the results:

      “We staged hair cells based on the development of the apical, mechanosensory hair bundle. The hair bundle is composed of actin-based stereocilia and a tubulin-based kinocilium. We used the height of the kinocilium (see schematic in Figure 1B), the tallest part of the hair bundle, to estimate the developmental stage of hair cells as described previously…”

      (3) Figure 1E is quite interesting but I'd rather show Figure S1 B/C as they provide statistics. In addition, the authors define 4 stages : early, intermediate, late, and mature for counting but provide only 3 panels for representative examples by mixing late/mature.

      We were torn about which ribbon quantification graph to show. Ultimately, we decided to keep the summary data in Figure 1E. This is primarily because the supplementary Figure will be adjacent to the main Figure in the Elife format, and the statistics will be easy to find and view.

      Figure 1 now provides a representative image for both late and mature hair cells.

      (4.) The ribbon that jumps from one microtubule to another one is eye-catching. Can the authors provide any statistics on this (e.g. percentage)?

      Good point. In our revision, we have added quantification for these events. We observe 2.8 switching events per neuromast during our fast timelapses. This information is now in the text and is also shown in a graph in Figure 3-S1D.

      “Third, we often observed that precursors switched association between neighboring microtubules (2.8 switching events per neuromast, n= 10 neuromasts; Figure 3-S1C-D, Movie S7).”

      (5) With regard to acetyl-a-tub immunocytochemistry, I would suggest obtaining a profile of the fluorescence intensity on a horizontal plane (at the apical part and at the base).

      (6) Same issue with microtubule destruction by nocodazole. Can the authors provide fluorescence intensity measurements to convince readers of microtubule disruption for long and short-term application.

      Regarding quantification of microtubule disruption using nocodazole and taxol. We did attempt to create profiles of the acetylated tubulin or YFP-tubulin label along horizontal planes at the apex and base, but the amount variability among cells and the angle of the cell in the images made this type of display and quantification challenging. In our revision we as stated above in our response to Reviewer #1’s public comment, we have added representative side-view images to show the disruptions to microtubules more clearly after short and long-term drug experiments (Figure 4-S1A-C, F-H). In addition, we quantified the reduction in acetylated tubulin label after overnight treatment with nocodazole and found the signal was significantly reduced (Figure 3-S1D-E). Unfortunately, we were unable to do a similar quantification due to the variability in YFP-tubulin intensity due to variations in mounting. The following text has been added to the results:

      “Quantification revealed significantly less mean acetylated-a-tubulin label in hair cells after nocodazole treatment (Figure 4-S1D).”

      “Quantification revealed an overall increase in mean acetylated-a-tubulin label in hair cells after taxol treatment, but this increase did not reach significance (Figure 4-S1A,C,E).”

      (7) It is a bit difficult to understand that the long-term (overnight) microtubule destabilization leads to a reduction in the number of synapses (Figure 4F) whereas short-term (30 min) microtubule destabilization leads to the opposite phenotype with an increased number of ribbons (Figure 6G). Are these ribbons still synaptic in short-term experiments? What is the size of the ribbons in the short-term experiments? Alternatively, could the reduction in synapse number upon long-term application of nocodazole be a side-effect of the toxicity within the hair cell?

      Agreed-this is a bit confusing. In our revision, we have changed our analyses, so the comparisons are more similar between the short- and long-term experiments–we examined the number of ribbons and precursor per cells (apical and basal) in both experiments (Changed the panel in Figure 4G, Figure 4-S2G and Figure 5G). In our live experiments we cannot be sure that ribbons are synaptic as we do not have a postsynaptic co-label. Also, we are unable to reliably quantify ribbon and precursor size in our live images due to variability in mounting. We have changed the text to clarify as follows:

      Results:

      “In each developing cell, we quantified the total number of Riba-TagRFP puncta (apical and basal) before and after each treatment. In our control samples we observed on average no change in the number of Riba-TagRFP puncta per cell (Figure 6G). Interestingly, we observed that nocodazole treatment led to a significant increase in the total number of Riba-TagRFP puncta after 3-4 hrs (Figure 6G). This result is similar to our overnight nocodazole experiments in fixed samples, where we also observed an increase in the number of ribbons and precursors per hair cell. In contrast to our 3-4 hr nocodazole treatment, similar to controls, taxol treatment did not alter the total number of Riba-TagRFP puncta over 3-4 hrs (Figure 6G). Overall, our overnight and 3-4 hr pharmacology experiments demonstrate that microtubule destabilization has a more significant impact on ribbon numbers compared to microtubule stabilization.”

      Discussion:

      “Ribbons and microtubules may interact during development to promote fusion, to form larger ribbons. Disrupting microtubules could interfere with this process, preventing ribbon maturation. Consistent with this, short-term (3-4 hr) and long-term (overnight) nocodazole increased ribbon and precursor numbers (Figure 6AG; Figure 4G), suggesting reduced fusion. Long-term treatment (overnight) resulted in a shift toward smaller ribbons (Figure 4H-I), and ultimately fewer complete synapses (Figure 4F).”

      Nocodazole toxicity: in response to Reviewer # 2’s public comment we have added the following text in our discussion:

      Discussion:

      “Another important consideration is the potential off-target effects of nocodazole. Even at non-cytotoxic doses, nocodazole toxicity may impact ribbons and synapses independently of its effects on microtubules. While this is less of a concern in the short- and medium-term experiments (30 min to 4 hr), long-term treatments (16 hrs) could introduce confounding effects. Additionally, nocodazole treatment is not hair cell-specific and could disrupt microtubule organization within afferent terminals as well. Thus, the reduction in ribbon-synapse formation following prolonged nocodazole treatment may result from microtubule disruption in hair cells, afferent terminals, or a combination of the two.”

      (8) Does ribbon motion depend on size or location?

      It is challenging to reliability quantify the actual area of precursors in our live samples, as there is variability in mounting and precursors are quite small. But we did examine the location of ribbon precursors (using tracks > 1 µm as these tracks can easily be linked to cell location in Imaris) with motion in the cell. We found evidence of ribbons with tracks > 1 µm throughout the cell, both above and below the nucleus. This is now plotted in Figure 3M. We have also added the following test to the results:

      “In addition, we examined the location of precursors within the cell that exhibited displacements > 1 µm. We found that 38.9 % of these tracks were located above the nucleus, while 61.1 % were located below the nucleus (Figure 3M).”

      Although this is not an area or size measurement, this result suggests that both smaller precursors that are more apical, and larger precursors/ribbons that are more basal all show motion.

      (9) The fusion event needs to be analyzed in further detail: when one ribbon precursor fuses with another one, is there an increase in size or intensity (this should follow the law of mass conservation)? This is important to support the abstract sentence "ribbon precursors can fuse together on microtubules to form larger ribbons".

      As mentioned above it is challenging accurately estimate the absolute size or intensity of ribbon precursors in our live preparation. But we did examine whether there is a relative increase in area after ribbon fuse. We have plotted the change in area (within the same samples) for the two fusion events in shown in Figure 8-S1A-B. In these examples, the area of the puncta after fusion is larger than either of the two precursors that fuse. Although the areas are not additive, these plots do provide some evidence that fusion does act to form larger ribbons. To accompany these plots, we have added the following text to the results:

      “Although we could not accurately measure the areas of precursors before and after fusion, we observed that the relative area resulting from the fusion of two smaller precursors was greater than that of either precursor alone. This increase in area suggests that precursor fusion may serve as a mechanism for generating larger ribbons (see examples: Figure 8-S1A-B).”

      Because we were unable to provide more accurate evidence of precursor fusion resulting in larger ribbons, we have removed this statement from our abstract and lessened our claims elsewhere in the manuscript.

      (10) The title in Figure 8 is a bit confusing. If fusion events reflect ribbon precursors fusion, it is obvious it depends on ribbon precursors. I'd like to replace this title with something like "microtubules and kif1aa are required for fusion events"

      We have changed the figure title as suggested, good idea.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1C. The purple/magenta colors are hard to distinguish.

      We have made the magenta color much lighter in the Figure 1C to make it easier to distinguish purple and magenta.

      (2) There are places where some words are unnecessarily hyphenated. Examples: live-imaging and hair-cell in the abstract, time-course in the results.

      In our revision, we have done our best to remove unnecessary hyphens, including the ones pointed out here.

      (3) Figure 4H and elsewhere - what is "area of Ribeye puncta?" Related, I think, in the Discussion the authors refer to "ribbon volume" on line 484. But they never measured ribbon volume so this needs to be clarified.

      We have done best to clarify what is meant by area of Ribeye puncta in the results and the methods:

      Results:

      “We also observed that the average of individual Ribeyeb puncta (from 2D max-projected images) was significantly reduced compared to controls (Figure 4H). Further, the relative frequency of individual Ribeyeb puncta with smaller areas was higher in nocodazole treated hair cells compared to controls (Figure 4I).”

      Methods:

      “To quantify the area of each ribbon and precursor, images were processed in a FIJI ‘IJMacro_AIRYSCAN_simple3dSeg_ribbons only.ijm’ as previously described (Wong et al., 2019). Here each Airyscan z-stack was max-projected. A threshold was applied to each image, followed by segmentation to delineate individual Ribeyeb/CTBP puncta. The watershed function was used to separate adjacent puncta. A list of 2D objects of individual ROIs (minimum size filter of 0.002 μm2) was created to measure the 2D areas of each Ribeyeb/CTBP puncta.”

      We did refer to ribbon volume once in the discussion, but volume is not reflected in our analyses, so we have removed this mention of volume.

      (4) More validation data showing gene/protein removal for the crispants would be helpful.

      Great suggestion. As this is a relatively new method, we have created a figure that outlines how we genotype each individual crispant animal analyzed in our study Figure 6-S1. In the methods we have also added the following information:

      “fPCR fragments were run on a genetic analyzer (Applied Biosystems, 3500XL) using LIZ500 (Applied Biosystems, 4322682) as a dye standard. Analysis of this fPCR revealed an average peak height of 4740 a.u. in wild type, and an average peak height of 126 a.u. in kif1aa F0 crispants (Figure 6-S1). Any kif1aa F0 crispant without robust genomic cutting or a peak height > 500 a.u. was not included in our analyses.”

      Reviewer #3 (Recommendations For The Authors):

      Lines 208-209--should refer to the movie in the text.

      Movie S1 is now referenced here.

      It would be helpful if the authors could analyze and quantify the effect of nocodozole and taxol on microtubules (movie 7).

      See responses above to Reviewer #1’s similar request.

      Figure 7 caption says "500 mM" nocodozole.

      Thank you, we have changed the caption to 500 nM.

      One problem with the MSD analysis is that it is dependent upon fits of individual tracks that lead to inaccuracies in assigning diffusive, restricted, and directed motion. The authors might be able to get around these problems by looking at the ensemble averages of all the tracks and seeing how they change with the various treatments. Even if the effect is on a subset of ribeye spots, it would be reassuring to see significant effects that did not rely upon fitting.

      We are hesitant to average the MSD tracks as not all tracks have the same number of time steps (ribbon moving in and out of the z-stack during the timelapse). This makes it challenging for us to look at the ensembles of all averages accurately, especially for the duration of the timelapse. This is the main reason why added another analysis, displacements > 1µm as another readout of directional motion, a measure that does not rely upon fitting.

      The abstract states that directed movement is toward the synapse. The only real evidence for this is a statement in the results: "Of the tracks that showed directional motion, while the majority move to the cell base, we found that 21.2 % of ribbon tracks moved apically." A clearer demonstration of this would be to do the analysis of Figure 2G for the ribeye aggregates.

      If was not possible to do the same analysis to ribbon tracks that we did for the EB3-GFP analysis in Figure 2. In Figure 2 we did a 2D tracking analysis and measured the relative angles in 2D. In contrast, the ribbon tracking was done in 3D in Imaris not possible to get angles in the same way. Further the MSD analysis was outside of Imaris, making it extremely difficult to link ribbon trajectories to the 3D cellular landscape in Imaris. Instead, we examined the direction of the 3D vectors in Imaris with tracks > 1µm and determined the direction of the motion (apical, basal or undetermined). For clarity, this data is now included as a bar graph in Figure 3L. In our results, we have clarified the results of this analysis:

      “To provide a more comprehensive analysis of precursor movement, we also examined displacement distance (Figure 3J). Here, as an additional measure of directed motion, we calculated the percent of tracks with a cumulative displacement > 1 µm. We found 35.6 % of tracks had a displacement > 1 µm (Figure 3K; n = 10 neuromasts, 40 hair cells and 203 tracks). Of the tracks with displacement > 1 µm, the majority of ribbon tracks (45.8 %) moved to the cell base, but we also found a subset of ribbon tracks (20.8 %) that moved apically (33.4 % moved in an undetermined direction) (Figure 3L).”

      Some more detail about the F0 crispants should be provided. In particular, what degree of cutting was observed and what was the criteria for robust cutting?

      See our response to Reviewer 2 and the newly created Figure 6-S1.

    1. eLife Assessment

      This useful manuscript describes cryo-EM structures of archaeal proteasomes that reveal insights into how occupancy of binding pockets on the 20S protease regulates proteasome gating. The evidence supporting these claims is convincing, although inclusion of more quantitative comparisons would help strengthen the conclusions. This work will be of special interest to researchers interested in proteasome structure and regulation.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chua, Daugherty, and Smith analyze a new set of archaeal 20S proteasomes obtained by cryo-EM that illustrate how the occupancy of the HbYX binding pocket induces gate opening. They do so primarily through a V24Y mutation in the α-subunit. These results are supported by a limited set of mutations in K66 in the α subunit, bringing new emphasis to this unit.

      Strengths:

      The new structure's analysis is comprehensive, occupying the entire manuscript. As such, the scope of this manuscript is very narrow, but the strength of the data is solid, and they offer an interesting and important new piece to the gate-opening literature.

      Weaknesses:

      Major Concerns

      (1) This manuscript rests on one new cryo-EM structure, leading to a single (albeit convincing) experiment demonstrating the importance of occupying the pocket and moving K66. Could a corresponding bulky mutation at K66 not activate the 20S proteasome?

      (2) To emphasize the importance of this work, the authors highlight the importance of gate-opening to human 20S proteasomes. However, the key distinctions between these proteasomes are not given sufficient weight.<br /> (a) As the authors note, the six distinct Rpt C-termini can occupy seven different pickets. However, how these differences would impact activation is not thoroughly discussed.<br /> (b) With those other sites, the relative importance of various pockets, such as the one controlling the α3 N-terminus, should be discussed more thoroughly as a potential critical difference.<br /> (c) These differences can lead to eukaryote 20S gates shifting between closed and open and having a partially opened state. This becomes relevant if the goal is to lead to an activated 20S. It would have been interesting to have archaea 20S with a mix of WT and V24Y α-subunits. However, one might imagine the subclassification problem would be challenging and require an extraordinary number of particles.<br /> (d) Furthermore, the conservation of the amino acids around the binding pocket was not addressed. This seems particularly important in the relative contribution of a residue analogous to K66 or V24.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Chuah et al. reports the experimental results that suggest the occupancy of the HbYX pockets suffices for proteasome gate opening. The authors conducted cryo-EM reconstructions of two mutant archaeal proteasomes. The work is technically sound and may be of special interest in the field of structural biology of the proteasomes.

      Strengths:

      Overall, the work incrementally deepens our understanding of the proteasome activation and expands the structural foundation for therapeutic intervention of proteasome function. The evidence presented appears to be well aligned with the existing literature, which adds confidence in the presentation.

      Weaknesses:

      The paper may benefit from some minor revision by making improvements on the figures and necessary quantitative comparative studies.

    1. eLife Assessment

      The findings in this manuscript are fundamental because they identify an entry receptor MYL3 that belongs to the myosin family as a possible target that could inhibit a virus that has a high impact on aquaculture. The evidence is convincing as it contains strong in vitro and in vivo data that support their conclusions; however, studies on the presence of MYL3 in NNV target tissues will further strengthen their claims

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors discovered MYL3 of marine medaka (Oryzias melastigma) as a novel NNV entry receptor, elucidating its facilitation of RGNNV entry into host cells through macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 pathway.

      Strengths:

      In this manuscript, the authors have performed in vitro and in vivo experiments to prove that MnMYL3 may serve as a receptor for NNV via macropinocytosis pathway. These experiments with different methods include Co-IP, RNAi, pulldown, SPR, flow cytometry, immunofluorescence assays and so on. In general, the results are clearly presented in the manuscript.

      Comments on revisions:

      The authors have addressed all my comments.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript offers an important contribution to the field of virology, especially concerning NNV entry mechanisms. The major strength of the study lies in the identification of MmMYL3 as a functional receptor for RGNNV and its role in macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 signaling axis. This represents a significant advance in understanding NNV entry mechanisms beyond previously known receptors such as HSP90ab1 and HSC70. The data, supported by comprehensive in vitro and in vivo experiments, strongly justify the authors' claims about MYL3's role in NNV infection in marine medaka.

      Strengths:

      (1) The identification of MmMYL3 as a functional receptor for RGNNV is a significant contribution to the field. The study fills a crucial gap in understanding the molecular mechanisms governing NNV entry into host cells.

      (2) The work highlights the involvement of IGF1R in macropinocytosis-mediated NNV entry and downstream Rac1/Cdc42 activation, thus providing a thorough mechanistic understanding of NNV internalization process. This could pave the way for further exploration of antiviral targets.

      Comments on revisions:

      The authors have addressed the concerns from reviewers. This manuscript can be published in the current form.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript presents a detailed study on the role of MmMYL3 in the viral entry of NNV, focusing on its function as a receptor that mediates viral internalization through the macropinocytosis pathway. The use of both in vitro assays (e.g., Co-IP, SPR, and GST pull-down) and in vivo experiments (such as infection assays in marine medaka) adds robustness to the evidence for MmMYL3 as a novel receptor for RGNNV. The findings have important implications for understanding NNV infection mechanisms, which could pave the way for new antiviral strategies in aquaculture.

      Strengths:

      The authors show that MmMYL3 directly binds the viral capsid protein, facilitates NNV entry via the IGF1R-Rac1/Cdc42 pathway, and can render otherwise resistant cells susceptible to infection. This multifaceted approach effectively demonstrates the central role of MmMYL3 in NNV entry.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Specificity of MYL3 Selection:

      My previous question focused on why MYL3 was prioritized over other myosin family members. While the response broadly implicates myosins in viral entry, it does not justify why MYL3 was specifically chosen. For clarity, the "Introduction sections" should explicitly state the unique features of MYL3 (e.g., domain structure, binding affinity, or prior evidence linking it to NNV) that distinguish it from other myosins.

      Thank you for your valuable comment regarding the specificity of MYL3 selection. In response, we have revised the "Introduction" section to explicitly clarify the rationale for prioritizing MYL3 over other myosin family members. Specifically, we have now included prior evidence linking MYL3 to NNV infection, citing our studies that identified MYL3 as a potential host factor interacting with NNV CP protein. In our previous study, sixteen CP-interacting proteins were identified by Co-IP assays followed by MS, including HSP90ab1, Centrosomal protein 170B, MYL3 and so on. In addition to our findings, previous study by other researchers has also reported that Epinephelus coioides MYL3 can bind to NNV (page 3, lines 79–81). These revisions provide a clearer justification for the selection of MYL3 and distinguish it from other myosin proteins. The added content can be found in the revised manuscript on page 3, lines 81–84.

    1. eLife Assessment

      This valuable study provides in-vivo evidence that CCR4 regulates the early inflammatory response during atherosclerotic plaque formation. The authors propose that altered T-cell response plays a role in this process, shedding light on mechanisms that may be of interest to medical biologists, biochemists, cell biologists, and immunologists. The work is currently considered incomplete pending textual changes and the inclusion of proper controls.

    2. Reviewer #2 (Public review):

      Summary:

      Tanaka et al. investigated the role of CCR4 in early atherosclerosis, focusing on the immune modulation elicited by this chemokine receptor under hypercholesterolemia. The study found that Ccr4 deficiency led to qualitative changes in atherosclerotic plaques, characterized by an increased inflammatory phenotype. The authors further analyzed the CD4 T cell immune response in para-aortic lymph nodes and atherosclerotic aorta, showing an increase mainly in Th1 cells and the Th1/Treg ratio in Ccr4-/-Apoe-/- mice compared to Apoe-/- mice. They then focused on Tregs, demonstrating that Ccr4 deficiency impaired their immunosuppressive function in in vitro assays. Authors also states that Ccr4-deficient Tregs had, as expected, impaired migration to the atherosclerotic aorta. Adoptive cell transfer of Ccr4-/- Tregs to Apoe-/- mice mimicked early atherosclerosis development in Ccr4-/-Apoe-/- mice. Therefore, this work shows that CCR4 plays an important role in early atherosclerosis but not in advanced stages.

      Strengths:

      Several in vivo and in vitro approaches were used to address the role of CCR4 in early atherosclerosis. Particularly, through the adoptive cell transfer of CCR4+ or CCR4- Tregs, the authors aimed to demonstrate the role of CCR4 in Tregs' protection against early atherosclerosis.

      Weaknesses:

      Flow cytometry experiments are not well controlled. Dead cells and doublets were not excluded from analysis.

      Clinical relevance is unclear.

      Comments on revisions:

      I thank the authors for addressing my suggestions.<br /> I understand that excluding dead cells would require repeating the entire experiment. However, the authors can at least exclude doublets from the existing flow cytometry data.<br /> I also agree with the more cautious claim regarding the role of CCR4 in Treg migration.

    3. Reviewer #3 (Public review):

      Summary

      Tanaka and colleagues addressed the role of the C-C chemokine receptor 4 (CCR4) in early atherosclerotic plaque development using ApoE-deficient mice on a standard chow diet as a model. Because several CD4+ T cell subsets express CCR4, they examined whether CCR4-deficiency alters the immune response mediated by CD4+ T cells. By histological analysis of aortic lesions, they demonstrated that the absence of CCR4 promoted the development of early atherosclerosis, with heightened inflammation linked to increased macrophages and pro-inflammatory CD4+ T cells, along with reduced collagen content. Flow cytometry and mRNA expression analysis for identifying CD4+ T cell subsets showed that CCR4 deficiency promoted higher proliferation of pro-inflammatory effector CD4+ T cells in peripheral lymphoid tissues and accumulation of Th1 cells in the atherosclerotic lesions. Interestingly, the increased pro-inflammatory CD4+ T cell response occurred despite the expansion of T CD4+ Foxp3+ regulatory cells (Tregs), found in higher numbers in lymphoid tissues of CCR4-deficient mice, suggesting that CCR4 deficiency interfered with Treg's regulatory actions. The findings contrast with earlier studies in a murine model of advanced atherosclerosis, where CCR4 deficiency did not alter the development of the aortic lesions. The authors included a thoughtful discussion about hypothetical mechanisms explaining these contrasting results, including putative differences in the role played by the CCL17/CCL22-CCR4 axis along the stages of atherosclerosis development in this murine model.

      Major strengths

      • Demonstration of CCR4 deficiency's impact on early atherosclerosis. CCR4 deficiency effects on the early atherosclerosis development in the Apoe-/-mice model were demonstrated by a quantitative analysis of the lesion area, inflammatory cell content and the expression profile of several pro- and anti-inflammatory markers.<br /> • Analysis of the T CD4+ response in various lymphoid tissues (peripheral and para-aortic lymph nodes and spleen) and the atherosclerotic aorta during the early phase of atherosclerosis in the Apoe-/-mice model. This analysis, combining flow cytometry and mRNA expression, showed that CCR4 deficiency enhanced T CD4+ cell activation, favouring the amplification of the typical biased Th1-mediated inflammatory response observed in the lymphoid tissues of hypercholesterolemic mice.<br /> • Treg transference experiments. Transference of Treg from Apoe-/- or Ccr4-/- Apoe-/- mice to Apoe-/- mice under a standard chow diet was useful for addressing the relevance of CCR4 expression on Tregs for the atheroprotective effect of this regulatory T cell subset during early atherosclerosis.

      Major weaknesses

      • Methodological Limitations: The controls used in the flow cytometry analysis were suboptimal, as neither cell viability nor doublets were assessed. This may have introduced artifacts, particularly when measuring less-represented cell populations within complex samples, such as in assays evaluating Treg migration to the aorta in atherosclerotic mice.<br /> • Incomplete understanding of CCR4-Mediated Mechanisms: The mechanisms by which CCR4 regulates early inflammation and the development of atherosclerosis were not fully clarified.

      I have previously addressed the study limitations and their global impact in my earlier reviews.

    1. eLife Assessment

      This study presents a new, fundamental finding to the field interested in recurrent processing and its neuromodulatory underpinnings, finding unexpectedly that memantine (blocking NMDA-receptors) enhances the decoding of features thought to rely on NMDA-receptors. This interesting, compelling result identifies new directions for researchers studying consciousness, sensory processing, attention, and neurotransmitters.

    2. Reviewer #1 (Public review):

      The authors investigate the function and neural circuitry of reentrant signals in visual cortex. Recurrent signaling is thought to be necessary to common types of perceptual experience that are defined by long-range relationships or prior expectation. Contour illusions - where perceptual objects are implied by stimuli characteristics - are a good example of this. The perception of these illusions is thought to emerge as recurrent signals from higher cortical areas feedback onto early visual cortex, to tell early visual cortex that it should be seeing object contours where none are actually present.

      The authors test the involvement of reentrant cortical activity in this kind of perception using a drug challenge. Reentrance in visual cortex is thought to rely on NMDAR-mediated glutamate signalling. The authors accordingly employ an NMDA antagonist to stop this mechanism, looking for the effect of this manipulation on visually evoked activity recorded in EEG.

      The motivating hypothesis for the paper is that NMDA antagonism should stop recurrent activity, and that this should degrade perceptual activity supporting perception of a contour illusion, but not other types of visual experience. Results in fact show the opposite. Rather than degrading cortical activity evoked by the illusion, memantine makes it more likely that machine learning classification of EEG will correctly infer the presence of the illusion.

      On the face of it, this is confusing. But the paper does a good job of providing possible accounts based on specific details of neurochemical signalling and receptor populations.

      I broadly find the paper interesting, graceful, and creative. The hypotheses are clear and compelling, the techniques for both manipulation of brain state and observation of that impact are cutting edge and well suited, and the paper draws clear and convincing conclusions that are made necessary by the results. The work sits at the very interesting crux of systems neuroscience, neuroimaging, and pharmacology.

    3. Reviewer #2 (Public review):

      This study presents an important finding to the field interested in recurrent processing and the role of NMDA-receptors herein. The evidence for improved decoding under memantine is convincing, while some open questions remain to be followed up in future studies (the lack of a behavioural effect, why is decoding improved rather than decreased?). It is an excellent example of how an unexpected finding can generate novel research ideas to the mechanisms underlying recurrent processing, suggesting that the answer lies in the differences in the effects of ketamine and memantine, rather than their commonalities.

      I would like to thank the authors for the great care they have taken in addressing my concerns. I think the revised manuscript is significantly easier to follow now that specific hypothesis have been formulated in the introduction, and the direction of the results is explicitly stated throughout the manuscript. I further appreciate the dampening of some of the claims that are not completely supported by the appropriate interactions.

      I think the resulting manuscript is an incredibly exciting contribution to our understanding of NMDA-receptor function, and a great example of how an unexpected finding can raise questions that could potentially drive the field forward. It shows how NMDA's role in recurrent processing is much more complicate than previously assumed, and reveals that it is not the commonalities between memantine and ketamine that are important in understanding recurrent processing, but rather the differences. I look forward to future studies that will target these differences.

      Overall great job.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Stein and colleagues use a clever masking/attentional blink paradigm using Kanisza stimuli, coupled with EEG decoding and the NMDA antagonist memantine, to isolate putative neural markers of feedforward, lateral, and feedback processing.

      In two elegant experiments, they show that memantine selective influences EEG decoding of only illusory Kanisza surfaces (but not contour continuation or raw contrast), only when unmasked, only when attention is available (not when "blinked"), and only when task-relevant.

      This neatly implicates NMDA receptors in the feedback mechanisms that are believed to be involved in inferring illusory Kanisza surfaces, and builds a difficult bridge between the large body of human perceptual experiments and pharmacological and neurophysiological work in animals.

      Strengths:

      Three key strengths of the paper are 1) its elegant and thorough experimental design, which includes internal replication of some key findings, and 2) the clear pattern of results across the full set of experiments, and 3) its clear writing and presentation of results.

      The paper effectively reports a 4-way interaction, with memantine only influencing decoding of surfaces (1) that are unmasked (2), with attention available (3) and task-relevant (4). Nevertheless, the results are very clear, with a clear separation between null effects on other conditions and quite a strong (and thus highly selective) effect on this one intersection of conditions. This makes the pattern of findings very convincing.

      Weaknesses:

      Overall this is an impressive and important paper. However, to my mind there are two minor weaknesses.

      First, despite its clear pattern of neural effects, there is no corresponding perceptual effect. Although the manipulation fits neatly within the conceptual framework, and there are many reasons for not finding such an effect (floor and ceiling effects, narrow perceptual tasks etc), this does leave open the possibility that the observation is entirely epiphenomenal, and that the mechanisms being recorded here are not actually causally involved in perception per se.

      Second, although it is clear that there is an effect on decoding in this particular condition, what that means is not entirely clear - particularly since performance improves, rather than decreases. It should be noted here that improvements in decoding performance do not necessarily need to map onto functional improvements, and we should all be careful to remain agnostic about what is driving classifier performance. Here too, the effect of memantine on decoding might be epiphenomenal - unrelated to the information carried in the neural population, but somehow changing the balance of how that is electrically aggregated on the surface of the skull. *Something* is changing, but that might be a neurochemical or electrical side-effect unrelated to actual processing (particularly since no corresponding behavioural impact is observed.)

      Comments on revisions:

      I think the authors responsed fairly to my comments. Even if they weren't really able to add new insight into why behaviour didn't show the same effects as decoding, they discuss this in the revised text.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) I miss some treatment of the lack of behavioural correlate. What does it mean that metamine benefits EEG classification accuracy without improving performance? One possibility here is that there is an improvement in response latency, rather than perceptual sensitivity. Is there any hint of that in the RT results? In some sort of combined measure of RT and accuracy? 

      First, we would like to thank the reviewer for their positive assessment of our work and for their extremely helpful and constructive comments that helped to significantly improve the quality of our manuscript.  

      The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data, neither in the reported accuracy data nor in the RT data. We do not report RT results as participants were instructed to respond as accurately as possible, without speed pressure. We added a paragraph in the discussion section to point to possible reasons for this surprising finding:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that we found a tight link between these EEG decoding markers and behavioral performance in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine was just too subtle to show up in changes in overt behavior.”

      (2) An explanation is missing, about why memantine impacts the decoding of illusion but not collinearity. At a systems level, how would this work? How would NMDAR antagonism selectively impact long-range connectivity, but not lateral connectivity? Is this supported by our understanding of laminar connectivity and neurochemistry in the visual cortex?

      We have no straightforward or mechanistic explanation for this finding. In the revised discussion, we are highlighting this finding more clearly, and included some speculative explanations:

      “The present effect of memantine was largely specific to illusion decoding, our marker of feedback processing, while collinearity decoding, our marker of lateral processing, was not (experiment 1) or only weakly (experiment 2) affected by memantine. We have no straightforward explanation for why NMDA receptor blockade would impact inter-areal feedback connections more strongly than intra-areal lateral connections, considering their strong functional interdependency and interaction in grouping and segmentation processes (Liang et al., 2017). One possibility is that this finding reflects properties of our EEG decoding markers for feedback vs. lateral processing: for example, decoding of the Kanizsa illusion may have been more sensitive to the relatively subtle effect of our pharmacological manipulation, either because overall decoding was better than for collinearity or because NMDA receptor dependent recurrent processes more strongly contribute to illusion decoding than to collinearity decoding.”

      (3) The motivating idea for the paper is that the NMDAR antagonist might disrupt the modulation of the AMPA-mediated glu signal. This is in line with the motivating logic for Self et al., 2012, where NMDAR and AMPAR efficacy in macacque V1 was manipulated via microinfusion. But this logic seems to conflict with a broader understanding of NMDA antagonism. NMDA antagonism appears to generally have the net effect of increasing glu (and ACh) in the cortex through a selective effect on inhibitory GABAergic cells (eg. Olney, Newcomer, & Farber, 1999). Memantine, in particular, has a specific impact on extrasynaptic NMDARs (that is in contrast to ketamine; Milnerwood et al, 2010, Neuron), and this type of receptor is prominent in GABA cells (eg. Yao et al., 2022, JoN). The effect of NMDA antagonists on GABAergic cells generally appears to be much stronger than the effect on glutamergic cells (at least in the hippocampus; eg. Grunze et al., 1996).

      This all means that it's reasonable to expect that memantine might have a benefit to visually evoked activity. This idea is raised in the GD of the paper, based on a separate literature from that I mentioned above. But all of this could be better spelled out earlier in the paper, so that the result observed in the paper can be interpreted by the reader in this broader context.

      To my mind, the challenging task is for the authors to explain why memantine causes an increase in EEG decoding, where microinfusion of an NMDA antagonist into V1 reduced the neural signal Self et al., 2012. This might be as simple as the change in drug... memantine's specific efficacy on extrasynaptic NMDA receptors might not be shared with whatever NMDA antagonist was used in Self et al. 2012. Ketamine and memantine are already known to differ in this way. 

      We addressed the reviewer’s comments in the following way. First, we bring up our (to us, surprising) result already at the end of the Introduction, pointing the reader to the explanation mentioned by the reviewer:

      “We hypothesized that disrupting the reentrant glutamate signal via blocking NMDA receptors by memantine would impair illusion and possibly collinearity decoding, as putative markers of feedback and lateral processing, but would spare the decoding of local contrast differences, our marker of feedforward processing. To foreshadow our results, memantine indeed specifically affected illusion decoding, but enhancing rather than impairing it. In the Discussion, we offer explanations for this surprising finding, including the effect of memantine on extrasynaptic NMDA receptors in GABAergic cells, which may have resulted in boosted visual activity.”

      Second, as outlined in the response to the first point by Reviewer #2, we are now clear throughout the title, abstract, and paper that memantine “improved” rather than “modulated” illusion decoding.

      Third, and most importantly, we restructured and expanded the Discussion section to include the reviewer’s proposed mechanisms and explanations for the effect. We would like to thank the reviewer for pointing us to this literature. We also discuss the results of Self et al. (2012), specifically the distinct effects of the two NMDAR antagonists used in this study, more extensively, and speculate that their effects may have been similar to ketamine and thus possibly opposite of memantine (for the feedback signal):

      “Although both drugs are known to inhibit NMDA receptors by occupying the receptor’s ion channel and are thereby blocking current flow (Glasgow et al., 2017; Molina et al., 2020), the drugs have different actions at receptors other than NMDA, with ketamine acting on dopamine D2 and serotonin 5-HT2 receptors, and memantine inhibiting several subtypes of the acetylcholine (ACh) receptor as well as serotonin 5HT3 receptors. Memantine and ketamine are also known to target different NMDA receptor subpopulations, with their inhibitory action displaying different time courses and intensity (Glasgow et al., 2017; Johnson et al., 2015). Blockade of different NMDA receptor subpopulations can result in markedly different and even opposite results. For example, Self and colleagues (2012) found overall reduced or elevated visual activity after microinfusion of two different selective NMDA receptor antagonists (2-amino-5phosphonovalerate and ifendprodil) in macaque primary visual cortex. Although both drugs impaired the feedback-related response to figure vs. ground, similar to the effects of ketamine (Meuwese et al., 2013; van Loon et al., 2016) such opposite effects on overall activity demonstrate that the effects of NMDA antagonism strongly depend on the targeted receptor subpopulation, each with distinct functional properties.”

      Finally, we link these differences to the potential mechanism via GABAergic neurons:

      “As mentioned in the Introduction, this may be related to memantine modulating processing at other pre- or post-synaptic receptors present at NMDA-rich synapses, specifically affecting extrasynaptic NMDA receptors in GABAergic cells (Milnerwood et al, 2010; Yao et al., 2022). Memantine’s strong effect on extrasynaptic NMDA receptors in GABAergic cells leads to increases in ACh levels, which have been shown to increase firing rates and reduce firing rate variability in macaques (Herrero et al., 2013, 2008). This may represent a mechanism through which memantine (but not ketamine or the NMDA receptor antagonists used by Self and colleagues) could boost visually evoked activity.”

      (4) The paper's proposal is that the effect of memantine is mediated by an impact on the efficacy of reentrant signaling in visual cortex. But perhaps the best-known impact of NMDAR manipulation is on LTP, in the hippocampus particularly but also broadly.

      Perception and identification of the kanisza illusion may be sensitive to learning (eg. Maertens & Pollmann, 2005; Gellatly, 1982; Rubin, Nakayama, Shapley, 1997); what argues against an account of the results from an effect on perceptual learning? Generally, the paper proposes a very specific mechanism through which the drug influences perception. This is motivated by results from Self et al 2012 where an NMDA antagonist was infused into V1. But oral memantine will, of course, have a whole-brain effect, and some of these effects are well characterized and - on the surface - appear as potential sources of change in illusion perception. The paper needs some treatment of the known ancillary effects of diffuse NMDAR antagonism to convince the reader that the account provided is better than the other possibilities. 

      We cannot fully exclude an effect based on perceptual learning but consider this possibility highly unlikely for several reasons. First, subjects have performed more than a thousand trials in a localizer session before starting the main task (in experiment 2 even more than two thousand) containing the drug manipulation. Therefore, a large part of putative perceptual learning would have already occurred before starting the main experiment. Second, the main experiment was counterbalanced across drug sessions, so half of the participants first performed the memantine session and then the placebo session, and the other half of the subjects the other way around. If memantine would have improved perceptual learning in our experiments, one may actually expect to observe improved decoding in the placebo session and not in the memantine session. If memantine would have facilitated perceptual learning during the memantine session, the effect of that facilitated perceptual learning would have been most visible in the placebo session following the memantine session. Because we observed improved decoding in the memantine session itself, perceptual learning is likely not the main explanation for these findings. Third, perceptual learning is known to occur for several stimulus dimensions (e.g., orientation, spatial frequency or contrast). If these findings would have been driven by perceptual learning one would have expected to see perceptual learning for all three features, whereas the memantine effects were specific to illusion decoding. Especially in experiment 2, all features were equally often task relevant and in such a situation one would’ve expected to observe perceptual learning effects on those other features as well.  

      To further investigate any potential role of perceptual learning, we analyzed participants’ performance in detecting the Kanizsa illusion over the course of the experiments. To investigate this, we divided the experiments’ trials into four time bins, from the beginning until the end of the experiment. For the first experiment’s first target (T1), there was no interaction between the factors bin and drug (memantine/placebo; F<sub>3,84</sub>=0.89, P\=0.437; Figure S6A). For the second target (T2), we performed a repeatedmeasures ANOVA with the factors bin, drug, T1-T2 lag (short/long), and masks (present/absent). There was only a trend towards a bin by drug interaction (F<sub>3,84</sub>=2.57, P\=0.064; Figure S6B), reflecting worse performance under memantine in the first three bins and slightly better performance in the fourth bin. The other interactions that include the factors bin and drug factors were not significant (all P>0.117). For the second experiment, we performed a repeated-measures ANOVA with the factors bin, drug, masks, and task-relevant feature (local contrast/collinearity/illusion). None of the interactions that included the bin and drug factors were significant (all P>0.219; Figure S6C). Taken together, memantine does not appear to affect Kanizsa illusion detection performance through perceptual learning. Finally, there was no interaction between the factors bin and task-relevant feature (F<sub>6,150</sub>=0.76, P\=0.547; Figure S6D), implying there is no perceptual learning effect specific to Kanizsa illusion detection. We included these analyses in our revised Supplement as Fig. S6.

      (5) The cross-decoding approach to data analysis concerns me a little. The approach adopted here is to train models on a localizer task, in this case, a task where participants matched a kanisza figure to a target template (E1) or discriminated one of the three relevant stimuli features (E2). The resulting model was subsequently employed to classify the stimuli seen during separate tasks - an AB task in E1, and a feature discrimination task in E2. This scheme makes the localizer task very important. If models built from this task have any bias, this will taint classifier accuracy in the analysis of experimental data. My concern is that the emergence of the kanisza illusion in the localizer task was probably quite salient, respective to changes in stimuli rotation or collinearity. If the model was better at detecting the illusion to begin with, the data pattern - where drug manipulation impacts classification in this condition but not other conditions - may simply reflect model insensitivity to non-illusion features.

      I am also vaguely worried by manipulations implemented in the main task that do not emerge in the localizer - the use of RSVP in E1 and manipulation of the base rate and staircasing in E2. This all starts to introduce the possibility that localizer and experimental data just don't correspond, that this generates low classification accuracy in the experimental results and ineffective classification in some conditions (ie. when stimuli are masked; would collinearity decoding in the unmasked condition potentially differ if classification accuracy were not at a floor? See Figure 3c upper, Figure 5c lower).

      What is the motivation for the use of localizer validation at all? The same hypotheses can be tested using within-experiment cross-validation, rather than validation from a model built on localizer data. The argument may be that this kind of modelling will necessarily employ a smaller dataset, but, while true, this effect can be minimized at the expense of computational cost - many-fold cross-validation will mean that the vast majority of data contributes to model building in each instance. 

      It would be compelling if results were to reproduce when classification was validated in this kind of way. This kind of analysis would fit very well into the supplementary material.

      We thank the reviewer for this excellent question. We used separate localizers for several reasons, exactly to circumvent the kind of biases in decoding that the reviewer alludes to. Below we have detailed our rationale, first focusing on our general rationale and then focusing on the decisions we made in designing the specific experiments.  

      Using a localizer task in the design of decoding analysis offers several key advantages over relying solely on k-fold cross-validation within the main task:

      (1) Feature selection independence and better generalization: A separate localizer task allows for independent feature selection, ensuring that the features used for decoding are chosen without bias from the main task data. Specifically, the use of a localizer task allows us to determine the time-windows of interest independently based on the peaks of the decoding in the localizer. This allows for a better direct comparison between the memantine and placebo conditions because we can isolate the relevant time windows outside a drug manipulation. Further, training a classifier on a localizer task and testing it on a separate experimental task assesses whether neural representations generalize across contexts, rather than simply distinguishing conditions within a single dataset. This supports claims about the robustness of the decoded information.

      (2) Increased sensitivity and interpretability: The localizer task can be designed specifically to elicit strong, reliable responses in the relevant neural patterns. This can improve signal-to-noise ratio and make it easier to interpret the features being used for decoding in the test set. We facilitate this by having many more trials in the localizer tasks (1280 in E1 and 5184 in E2) than in the separate conditions of the main task, in which we would have to do k-folding (e.g., 2, mask, x 2 (lag) design in E1 leaves fewer than 256 trials, due to preprocessing, for specific comparisons) on very low trial numbers. The same holds for experiment 2 which has a 2x3 design, but also included the base-rate manipulation. Finally, we further facilitate sensitivity of the model by having the stimuli presented at full contrast without any manipulations of attention or masking during the localizer, which allows us to extract the feature specific EEG signals in the most optimal way.

      (3) Decoupling task-specific confounds: If decoding is performed within the main task using k-folding, there is a risk that task-related confounds (e.g., motor responses, attention shifts, drug) influence decoding performance. A localizer task allows us to separate the neural representation of interest from these taskrelated confounds.

      Experiment 1 

      In experiment 1, the Kanizsa was always task relevant in the main experiment in which we employed the pharmacological manipulation. To make sure that the classifiers were not biased towards Kanizsa figures from the start (which would be the case if we would have done k-folding in the main task), we used a training set in which all features were equally relevant for task performance. As can be seen in figure 1E, which plots the decoding accuracies of the localizer task, illusion decoding as well as rotation decoding were equally strong, whereas collinearity decoding was weaker. It may be that the Kanizsa illusion was quite salient in the localizer task, which we can’t know at present, but it was at least less salient and relevant than in the main task (where it was the only task-relevant feature). Based on the localizer decoding results one could argue that the rotation dimension and illusion dimension were most salient, because the decoding was highest for these dimensions. Clearly the model was not insensitive to nonillusory features. The localizer task of experiment 2 reveals that collinearity decoding tends to be generally lower, even when that feature is task relevant.  

      Experiment 2 

      In experiment 2, the localizer task and main task were also similar, with three exceptions: during the localizer task no drug was active, and no masking and no base rate manipulation were employed. To make sure that the classifier was not biased towards a certain stimulus category (due to the bias manipulation), e.g. the stimulus that is presented most often, we used a localizer task without this manipulation. As can be seen in figure 4D decoding of all the features was highly robust, also for example for the collinearity condition. Therefore the low decoding that we observe in the main experiment cannot be due to poor classifier training or feature extraction in the localizer. We believe this is actually an advantage instead of a disadvantage of the current decoding protocol.

      Based on the rationale presented above we are uncomfortable performing the suggested analyses using a k-folding approach in the main task, because according to our standards the trial numbers are too low and the risk that these results are somehow influenced by task specific confounds cannot be ruled out.  

      Line 301 - 'Interestingly, in both experiments the effect of memantine... was specific to... stimuli presented without a backward mask.' This rubs a bit, given that the mask broadly disrupted classification. The absence of memantine results in masked results may simply be a product of the floor ... some care is needed in the interpretation of this pattern. 

      In the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      While floor is less likely to account for the absence of an effect in the masked condition in experiment 2, where illusion decoding in the masked condition was significantly above chance, it is still possible that to obtain an effect of memantine, decoding accuracy needed to be higher. We therefore also added here:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      In the discussion, we changed the sentence to read “…the effect of memantine on illusion decoding tended to be specific to attended, task-relevant stimuli presented without a backward mask.”

      Line 441 - What were the contraindications/exclusion parameters for the administration of memantine? 

      Thanks for spotting this. We have added the relevant exclusion criteria in the revised version of the supplement. See also below.

      – Allergy for memantine or one of the inactive ingredients of these products;

      – (History of) psychiatric treatment;

      – First-degree relative with (history of) schizophrenia or major depression;

      – (History of) clinically significant hepatic, cardiac, obstructive respiratory, renal, cerebrovascular, metabolic or pulmonary disease, including, but not limited to fibrotic disorders;

      – Claustrophobia;

      –  Regular usage of medicines (antihistamines or occasional use of paracetamol);

      – (History of) neurological disease;

      –  (History of) epilepsy;

      –  Abnormal hearing or (uncorrected) vision;

      –  Average use of more than 15 alcoholic beverages weekly;

      – Smoking

      – History of drug (opiate, LSD, (meth)amphetamine, cocaine, solvents, cannabis, or barbiturate) or alcohol dependence;

      – Any known other serious health problem or mental/physical stress;

      – Used psychotropic medication, or recreational drugs over a period of 72 hours prior to each test session,  

      – Used alcohol within the last 24 hours prior to each test session;

      – (History of) pheochromocytoma.

      – Narrow-angle glaucoma;

      – (History of) ulcer disease;

      – Galactose intolerance, Lapp lactase deficiency or glucose­galactose malabsorption.

      – (History of) convulsion;

      Line 587 - The localizer task used to train the classifier in E2 was collected in different sessions. Was the number of trials from separate sessions ultimately equal? The issue here is that the localizer might pick up on subtle differences in electrode placement. If the test session happens to have electrode placement that is similar to the electrode placement that existed for a majority of one condition of the localizer... this will create bias. This is likely to be minor, but machine classifiers really love this kind of minor confound.

      Indeed, the trial counts in the separate sessions for the localizer in E2 were equal. We have added that information to the methods section.  

      Experiment 1: 1280 trials collected during the intake session.

      In experiment 2: 1728 trials were collected per session (intake, and 2 drug sessions), so there were 5184 trials across three sessions.

      Reviewer #2:

      To start off, I think the reader is being a bit tricked when reading the paper. Perhaps my priors are too strong, but I assumed, just like the authors, that NMDA-receptors would disrupt recurrent processing, in line with previous work. However, due to the continuous use of the ambiguous word 'affected' rather than the more clear increased or perturbed recurrent processing, the reader is left guessing what is actually found. That's until they read the results and discussion finding that decoding is actually improved. This seems like a really big deal, and I strongly urge the authors to reword their title, abstract, and introduction to make clear they hypothesized a disruption in decoding in the illusion condition, but found the opposite, namely an increase in decoding. I want to encourage the authors that this is still a fascinating finding.

      We thank the reviewer for the positive assessment of our manuscript, and for many helpful comments and suggestions.  

      We changed the title, abstract, and introduction in accordance with the reviewer’s comment, highlighting that “memantine […] improves decoding” and “enhances recurrent processing” in all three sections. We also changed the heading of the corresponding results section to “Memantine selectively improves decoding of the Kanizsa illusion”.

      Apologies if I have missed it, but it is not clear to me whether participants were given the drug or placebo during the localiser task. If they are given the drug this makes me question the logic of their analysis approach. How can one study the presence of a process, if their very means of detecting that process (the localiser) was disrupted in the first place? If participants were not given a drug during the localiser task, please make that clear. I'll proceed with the rest of my comments assuming the latter is the case. But if the former, please note that I am not sure how to interpret their findings in this paper.

      Thanks for asking this, this was indeed unclear. In experiment 1 the localizer was performed in the intake session in which no drugs were administered. In the second experiment the localizer was performed in all three sessions with equal trial numbers. In the intake session no drugs were administrated. In the other two sessions the localizer was performed directly after pill intake and therefore the memantine was not (or barely) active yet. We started the main task four hours after pill intake because that is the approximate peak time of memantine. Note that all three localizer tasks were averaged before using them as training set. We have clarified this in the revised manuscript.

      The main purpose of the paper is to study recurrent processing. The extent to which this study achieves this aim is completely dependent to what extent we can interpret decoding of illusory contours as uniquely capturing recurrent processing. While I am sure illusory contours rely on recurrent processing, it does not follow that decoding of illusory contours capture recurrent processing alone. Indeed, if the drug selectively manipulates recurrent processing, it's not obvious to me why the authors find the interaction with masking in experiment 2. Recurrent processing seems to still be happening in the masked condition, but is not affected by the NMDA-receptor here, so where does that leave us in interpreting the role of NMDA-receptors in recurrent processing? If the authors can not strengthen the claim that the effects are completely driven by affecting recurrent processing, I suggest that the paper will shift its focus to making claims about the encoding of illusory contours, rather than making primary claims about recurrent processing.

      We indeed used illusion decoding as a marker of recurrent processing. Clearly, such a marker based on a non-invasive and indirect method to record neural activity is not perfect. To directly and selectively manipulate recurrent processing, invasive methods and direct neural recordings would be required. However, as explained in the revised Introduction,

      “In recent work we have validated that the decoding profiles of these features of different complexities at different points in time, in combination with the associated topography, can indeed serve as EEG markers of feedforward, lateral and recurrent processes (Fahrenfort et al., 2017; Noorman et al., 2023).”  

      The timing and topography of the decoding results of the present study were consistent with our previous EEG decoding studies (Fahrenfort et al., 2017; Noorman et al., 2023). This validates the use of these EEG decoding signatures as (imperfect) markers of distinct neural processes, and we continue to use them as such. However, we expanded the discussion section to alert the reader to the indirect and imperfect nature of these EEG decoding signatures as markers of distinct neural processes: “Our approach relied on using EEG decoding of different stimulus features at different points in time, together with their topography, as markers of distinct neural processes. Although such non-invasive, indirect measures of neural activity cannot provide direct evidence for feedforward vs. recurrent processes, the timing, topography, and susceptibility to masking of the decoding signatures obtained in the present study are consistent with neurophysiology (e.g., Bosking et al., 1997; Kandel et al., 2000; Lamme & Roelfsema, 2000; Lee & Nguyen, 2001; Liang et al., 2017; Pak et al., 2020), as well as with our previous work (Fahrenfort et al., 2017; Noorman et al., 2023).” 

      The reviewer is also concerned about the lack of effect of memantine on illusion decoding in the masked condition in experiment 2. In our view, the strong effect of masking on illusion decoding (both in absolute terms, as well as when compared to its effect on local contrast decoding), provides strong support for our assumption that illusion decoding represents a marker of recurrent processing. Nevertheless, as the reviewer points out, weak but statistically significant illusion decoding was still possible in the masked condition, at least when the illusion was task-relevant. As the reviewer notes, this may reflect residual recurrent processing during masking, a conclusion consistent with the relatively high behavioral performance despite masking (d’ > 1). However, rather than invalidating the use of our EEG markers or challenging the role of NMDA-receptors in recurrent processing, this may simply reflect a floor effect. As outlined in our response to reviewer #1 (who was concerned about floor effects), in the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      And for experiment 1:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      An additional claim is being made with regards to the effects of the drug manipulation. The authors state that this effect is only present when the stimulus is 1) consciously accessed, and 2) attended. The evidence for claim 1 is not supported by experiment 1, as the masking manipulation did not interact in the cluster-analyses, and the analyses focussing on the peak of the timing window do not show a significant effect either. There is evidence for this claim coming from experiment 2 as masking interacts with the drug condition. Evidence for the second claim (about task relevance) is not presented, as there is no interaction with the task condition. A classical error seems to be made here, where interactions are not properly tested. Instead, the presence of a significant effect in one condition but not the other is taken as sufficient evidence for an interaction, which is not appropriate. I therefore urge the authors to dampen the claim about the importance of attending to the decoded features. Alternatively, I suggest the authors run their interactions of interest on the time-courses and conduct the appropriate clusterbased analyses.

      We thank the reviewer for pointing out the importance of key interaction effects. Following the reviewer’s suggestion, we dampened our claims about the role of attention. For experiment 1, we changed the heading of the relevant results section from “Memantine’s effect on illusion decoding requires attention” to “The role of consciousness and attention in memantine’s effect on illusion decoding”, and we added the following in the results section:

      “Also our time window-based analyses showed a significant effect of memantine only when the illusion was both unmasked and presented outside the AB (t_28\=-2.76, _P\=0.010, BF<sub>10</sub>=4.53; Fig. 3F). Note, however, that although these post-hoc tests of the effect of memantine on illusion decoding were significant, for our time window-based analyses we did not obtain a statistically significant interaction between the AB and memantine, and the interaction between masking and memantine only approached significance (P\= 0.068). Thus, although these memantine effects were slightly less robust than for T1, probably due to reduced trial counts, these results point to (but do not conclusively demonstrate) a selective effect of memantine on illusion-related feedback processing that depends on the availability of attention. In addition to the lack of the interaction effect, another potential concern…”

      For experiment 2, we added the following in the results section:

      “Note that, for our time window-based analyses of illusion decoding, although the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking, we did not obtain a statistically significant interaction between memantine and task-relevance. Thus, although the memantine effect was significant only when the illusion was unmasked and taskrelevant, just like for the effect of temporal attention in experiment 1, these results do not conclusively demonstrate a selective effect of memantine that depends attention (task-relevance).”

      In the discussion, we toned down claims about memantine’s effects being specific to attended conditions, we are highlighting the “preliminary” nature of these findings, and we are now alerting the reader explicitly to be careful with interpreting these effects, e.g.:

      “Although these results have to be interpreted with caution because the key interaction effects were not statistically significant, …”

      How were the length of the peak-timing windows established in Figure 1E? My understanding is that this forms the training-time window for the further decoding analyses, so it is important to justify why they have different lengths, and how they are determined. The same goes for the peak AUC time windows for the interaction analyses. A number of claims in the paper rely on the interactions found in these posthoc analyses, so the 223- to 323 time window needs justification.

      Thanks for this question. The length of these peak-timing windows is different because the decoding of rotation is temporarily very precise and short-lived, whereas the decoding of the other features last much longer and is more temporally variable. In fact, we have followed the same procedure as in a previously published study (Noorman et al., elife 2025) for defining the peak-timing and length of the windows. We followed the same procedure for both experiments reported in this paper, replicating the crucial findings and therefore excluding the possibility that these findings are in any way dependent on the time windows that are selected. We have added that information to the revised version of the manuscript.

      Reviewer #3:

      First, despite its clear pattern of neural effects, there is no corresponding perceptual effect. Although the manipulation fits neatly within the conceptual framework, and there are many reasons for not finding such an effect (floor and ceiling effects, narrow perceptual tasks, etc), this does leave open the possibility that the observation is entirely epiphenomenal, and that the mechanisms being recorded here are not actually causally involved in perception per se.

      We thank the reviewer for the positive assessment of our work. The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data. We agree with the possible reasons for the absence of such an effect highlighted by the reviewer, and expanded our discussion section accordingly:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that in our previous work we found a tight link between these EEG decoding markers and behavioral performance (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

      Second, although it is clear that there is an effect on decoding in this particular condition, what that means is not entirely clear - particularly since performance improves, rather than decreases. It should be noted here that improvements in decoding performance do not necessarily need to map onto functional improvements, and we should all be careful to remain agnostic about what is driving classifier performance. Here too, the effect of memantine on decoding might be epiphenomenal - unrelated to the information carried in the neural population, but somehow changing the balance of how that is electrically aggregated on the surface of the skull. *Something* is changing, but that might be a neurochemical or electrical side-effect unrelated to actual processing (particularly since no corresponding behavioural impact is observed.)

      We would like to refer to our reply to the previous point, and we would like to add that in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023) similar EEG decoding markers were often tightly linked to changes in behavioral performance. This indicates that these particular EEG decoding markers do not simply reflect some sideeffect not related to neural processing. However, as stated in the revised discussion section, “it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

    1. eLife Assessment

      This is a fundamental cell biological study of host responses during symbiotic microbial infection of plants. Compelling imaging-based approaches using genetically encoded cell cycle markers show that in Medicago truncatula root cortex cells, early rhizobial infection events are associated with cell-cycle re-entry, but once the infection is established, host cells exit the cell cycle. The work will be of interest to a wide range of readers working in fields from development and cell biology to plant-microbe interactions.

    2. Reviewer #1 (Public Review):

      Many studies reported findings implying that rhizobial infection is associated with cell cycle re-entry and progression, however, our understanding has been fragmented. This study provides exciting new insights as it represents a comprehensive description of the cell cycle progression during early stages of nodulation using fluorescence markers.

      To briefly summarize, the authors first monitor H3.1 / H3.3 replacement to distinguish between replicating (S phase) and non-replicating cells to show that M. truncatula cortex cells along the bacterial infection thread are non-replicating (while neighbors enter the S phase). Nuclear size measurements revealed that these non-replicative cells are in the post-replicative stage (G2) rather than in the pre-replicative G1 phase, which the authors confirm with the Plant Cell Cycle Indicator (PlaCCI) fluorescent marker to track cell cycle progression in more detail. Cortex cells in the trajectory of the infection thread did not accumulate the late G2 marker of the PlaCCI nor the G2/M marker KNOLLE, indicating that these cells indeed remain in G2. Because nuclear size measurements indicated that infected cells are polyploid, the authors used the centromere histone marker CENH3 to determine chromosome number. They find that cortex cells giving rise to the nodule primordium are endomitotic and tetraploid, probably because their cell cycle is halted at centromere separation. Although not a focus of this manuscript, the authors also use their fluorescent tools to track cell cycle progression during arbuscular mycorrhiza symbiosis. They confirm that infected cells transition from a replicating to a non-replicating state (H3.1 to H3.3) with progressing development of the arbuscules. In addition, the CENH3 marker confirms previous findings that cortex cells infected by fungi are endocycling (i.e., DNA synthesis without segregation of replicated parts). This represents an important confirmation of previous findings and contrasts with the situation during nodulation symbiosis, where chromosomes separate after replication.

      In general, all microscopy images are of very high quality and support the authors' conclusions. While individually each set of fluorescent markers has its limitations, combined they constitute a powerful tool to track various stages of cell cycle progression in individual root cells during symbiosis. Overall, this is a very strong manuscript that comprehensively elucidates root cell cycle changes during microbial infection.

    3. Reviewer #2 (Public Review):

      Cell cycle control during nitrogen-fixing symbiosis is an important topic, but our understanding of the process is poor and lacks resolution, as the nodule is a complex organ with many cell types that undergo profound changes. The authors aim to define the cell cycle state of individual plant cells in the emerging nodule primordium, as a transcellular infection thread passes through the meristem to reach cells deep in the incipient nodule and releases bacteria to form symbiosomes. The authors used a number of cell cycle reporters, such as different Histone 3 variants and cyclins, to follow cell cycle progress in exquisite detail. They showed that the host cells in the path of an infection thread exhibit a cell fate distinct from their immediate neighbors: after entering the S phase similar to their neighbors, these cells exit the cell cycle and enter a special differentiated state. This is likely an important shift that allows the proper passage of the infection thread. Although definitive proof needs more investigation, they showed that a pioneering transcription factor, NF-YA1, likely represses these endoreduplicated cells from completing the cell cycle.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      (…) In my view, the part about NF-YA1 is less strong - although I realize this is a compelling candidate to be a regulator of cell cycle progression, the experimental approaches used to address this question falls a bit short, in particular, compared to the very detailed approaches shown in the rest of the manuscript. The authors show that the transcription factor NF-YA1 regulates cell division in tobacco leaves; however, there is no experimental validation in the experimental system (nodules). All conclusions are based on a heterologous cell division system in tobacco leaves. The authors state that NF-YA1 has a nodule-specific role as a regulator of cell differentiation. I am concerned the tobacco system may not allow for adequate testing of this hypothesis.

      Reviewer #1 makes a valid point by asking to focus the manuscript more explicitly on the role of NF-YA1 as a differentiation factor in a symbiotic context. We have now addressed this formally and experimentally.

      The involvement of A-type NF-Y subunits in the transition to the early differentiation of nodule cells has been documented in model legumes through several publications that we refer to in the revised version of the discussion (lines 617/623). We fully agree that the CDEL system, because it is heterologous, does not allow us more than to propose a parallel explanation for these observations - i.e_., that the Medicago NF-YA1 subunit presumably acts in post-replicative cell-cycle regulation at the G2/M transition. Considering your recommendations and those of reviewer #2, we sought to support this conclusion by testing the impact of localized over-expression of _NF-YA1 on cortical cell division and infection competence at an early stage of root colonization. The results of these experiments are now presented in the new Figure 9 and Figure 9-figure supplement 1-5 and described from line 435 to 495.

      With the fluorescent tools the authors have at hand (in particular tools to detect G2/M transition, which the authors suggest is regulated by NF-YA1), it would be interesting to test what happens to cell division if NF-YA1 is over-expressed in Medicago roots?

      To limit pleiotropic effects of an ectopic over-expression, we used the symbiosis-induced, ENOD11 promoter to increase NF-YA1 expression levels more specifically along the trajectory of infected cells. We chose to remain in continuity with the experiments performed in the CDEL system by opting for a destabilized version of the KNOLLE transcriptional reporter to detect the G2/M transition. The results obtained are presented in Figure 9B (quantification of split infected cells), in Figure 9-figure supplement 1B (ENOD11 expression profile), in Figure 9-figure supplement 3B (representative confocal images) and Figure 9-figure supplement 4D (quantification of pKNOLLE reporter signal). There, we show that mitosis remains inhibited in cells accommodating infection threads, but is completed in a higher proportion of outer cortical cells positioned on the infection trajectory, where ENOD11 gene transcription is active before their physical colonization.

      Based on NF-YA1 expression data published previously and their results in tobacco epidermal cells, the authors hypothesize that NF-YA regulates the mitotic entry of nodule primordial cells. Given that much of the manuscript deals with earlier stages of the infection, I wonder if NF-YA1 could also have a role in regulating mitotic entry in cells adjacent to the infection thread?

      The expression profile of NF-YA1 at early stages of cortical infection (Laporte et al., 2014) is indeed similar to the one of ENOD11 (as shown in Figure 9-figure supplement 1C) in wild-type Medicago roots, with corresponding transcriptional reporters being both activated in cells adjacent to the infection thread. Under our experimental conditions, additional expression of NF-YA1 (driven by the ENOD11 promoter) in these neighbouring cells did not impact their propensity to enter mitosis and to complete cell division. These results are presented in Figure 9-figure supplement 4D (quantification of pKNOLLE reporter signal) and Figure 9-figure supplement 5 (quantification of split neighbouring cells).

      Reviewer #1 (Recommendations For The Authors):

      - In the first part, images show the qualitative presence/absence of H3.1 or H3.3 histones.

      Upon closer inspection, many cells seem to have both histones. In Fig1-S1 for example (root meristem), it is evident that there are many cells with low but clearly present H3.1 content in the green channel; however, in the overlay, the green is lost and H3.3 (pink) is mainly visible. What does this mean in terms of the cell cycle? 

      We fully agree with reviewer #1 on these points. Independent of whether they have low or high proliferation potential, most cells retain histone H3.1 particularly in silent regions of the genome, while H3.3 is constitutively produced and enriched at transcriptionally active regions. When channels are overlaid, cells in an active proliferation or endoreduplication state (in G1, S or G2, depending on the size of their nuclei) will appear mainly "green" (H3.1-eGFP positive). Cells with a low proliferation potential (e.g., in the QC), G2-arrested (e.g., IT-traversed) or terminally differentiating (e.g., containing symbiosomes or arbuscules) will appear mainly "magenta" (H3.1-low, medium to high H3.3-mCherry content).

      Furthermore, all nodule images only display the overlay image, and individual fluorescence channels are not shown. Does the same masking effect happen here? It may be helpful to quantify fluoresce intensity not only in green but also in red channels as done for other experiments.

      Quantifying fluorescence intensity in the mCherry channel may indeed help to highlight the likely replacement of H3.1-eGFP by H3.3-mCherry in infected cells, as described by Otero and colleagues (2016) at the onset of cellular differentiation. However, the quantification method as established (i.e., measuring the corrected total nuclear fluorescence at the equatorial plane) cannot be applied, most of the time, to infected cells' nuclei due to the overlapping presence of mCherry-producing S. meliloti in the same channel (e.g., in Figure 2B). Nevertheless, and to avoid this masking effect when the eGFP and mCherry channels are overlaid, we now present them as isolated channels in revised Figures 1-3 and associated figure supplements. As the cell-wall staining is regularly included and displayed in grayscale, we assigned to both of them the Green Fire Blue lookup table, which maps intensity values to a multiple-colour sequential scheme (with blue or yellow indicating low or high fluorescence levels, respectively). We hope that this will allow a better appreciation of the respective levels of H3.1- and H3.3-fusions in our confocal images.

      - Fig 1 B - it is hard to differentiate between S. meliloti-mCherry and H3.3-mCherry. Is there a way to label the different structures?

      In the revised version of Figure 1B, we used filled or empty arrowheads to point to histone H3-containing nuclei. To label rhizobia-associated structures, we used dashed lines to delineate nodule cells hosting symbiosomes and included the annotation “IT” for infection threads. We also indicated proliferating, endoreduplicating and differentiating tissues and cells using the following annotations: “CD” for cell division, “En” for endoreduplication and “TD” for terminal differentiation. All annotations are explained in the figure legend.

      - Fig 1 - supplement E and F - no statistics are shown.

      We performed non-parametric tests using the latest version of the GraphPad Prism software (version 10.4.1). Stars (Figure 1-figure supplement 1F) or different letters (Figure 1-figure supplement 1G) now indicate statistically significant differences. Results of the normality and non-parametric tests were included in the corresponding Source Data Files (Figure 1 – figure supplement 1 – source data 1 and 2). We have also updated the compact display of letters in other figures as indicated by the new software version. The raw data and the results of the statistical analyses remain unchanged and can be viewed in the corresponding source files.

      - Fig 2 A - overview and close-up image do not seem to be in the same focal plane. This is confusing because the nuclei position is different (so is the infection thread position).

      We fully agree that our former Figure may have confused reviewers #1 and #2 as well as readers. Figure 2A was designed to highlight, from the same nodule primordium, actively dividing cells of the inner cortex (optical section z 6-14) and cells of the outer cortex traversed, penetrated by or neighbouring an infection thread (optical section z 11-19). We initially wanted to show different magnification views of the same confocal image (i.e_._, a full-view of the inner cortex and a zoomed-view of the outer layers) to ensure that audiences can identify these details. In the revised version of Figure 2A, we displayed these full- and zoomed-views in upper and lower panels, respectively and we removed the solid-line inset to avoid confusion. 

      - Fig 1A and Fig 2E could be combined and shown at the beginning of the manuscript. Also, consider making the cell size increase more extreme, as it is important to differentiate G2 cells after H3.1 eviction and cells in G1. You have to look very closely at the graph to see the size differences.

      We have taken each of your suggestions into account. A combined version of our schematic representation with more pronounced nuclei size differences is now presented in Figure 1A.

      - Fig. 3 C is difficult to interpret. Can this be split into different panels?

      We realized that our previous choice of representation may have been confusing. Each value corresponds only to the H3.1-eGFP content, measured in an infected cell and reported to that of the neighbouring cell (IC / NC) within individual root samples. Therefore, we removed the green-magenta colour code and changed the legend accordingly. We hope that these slight modifications will facilitate the interpretation of the results - namely, that the relative level of H3.1 increases significantly in infected cells in the selected mutants compared to the wild-type. This mode of representation also highlights that in the mutants, there are more individual cases where the H3.1 content in an infected cell exceeds that of the neighbouring cell by more than two times. These cases would be masked if the couples of infected cells and associated neighbours would be split into different panels as in Figure 3B.

      - Line 357/359. I assume you mean ...'through the G2 phase can commit to nuclear division'.

      We have edited this sentence according to your suggestion, which now appears in line 370. 

      Reviewer #2 (Recommendations For The Authors):

      Cell cycle control during the nitrogen-fixing symbiosis is an important question but only poorly understood. This manuscript uses largely cell biological methods, which are always of the highest quality - to investigate host cell cycle progression during the early stages of nodule formation, where cortical infection threads penetrate the nodule primordium. The experiments were carefully conducted, the observations were detail oriented, and the results were thought-provoking. The study should be supported by mechanistic insights. 

      (1) One thought provoked by the authors' work is that while the study was carried out at an unprecedented resolution, the relationship between control of the cell cycle and infection thread penetration remains correlative. Is this reduced replicative potential among cells in the infection thread trajectory a consequence of hosting an infection thread, or a prerequisite to do so?

      We understand and share the point of view of reviewer #2. At this stage, we believe that our data won’t enable us to fully answer the question, thus this relationship remains rather correlative. The reasons are that 1) the access to the status of cortical cells below C2 is restricted to fixed material and therefore only represents a snapshot of the situation, and 2) we are currently unable to significantly interfere with mechanisms as intertwined as cell cycle control and infection control. What we can reasonably suggest from our images is that the most favorable window of the cell cycle for cells about to be crossed by an infection thread is post-replicative, i.e., the G2 phase. Typical markers of the G2 phase were recurrently observed at the onset of physical colonization – enlarged nucleus, containing less histone H3.1 than neighbouring cells in S phase (e.g., in Figure 2A). Reaching the G2 phase could therefore be a prerequisite for infection (and associated cellular rearrangements), while prolonged arrest in this same phase is likely a consequence of transcellular passage towards a forming nodule primordium.

      More importantly, in either scenario, what is the functional significance of exiting the cell cycle or endocycle? By stating that "local control of mitotic activity could be especially important for rhizobia to timely cross the middle cortex, where sustained cellular proliferation gives rise to the nodule meristem" (Line 239), the authors seem to believe that cortical cells need to stop the cell cycle to prepare for rhizobia infection. This is certainly reasonable, but the current study provides no proof, yet. To test the functional importance of cell cycle exit, one would interfere with G2/M transition in nodule cells,  and examine the effect on infection.

      We fully agree with reviewer #2 that the functional importance of a cell-cycle arrest on the infection thread trajectory remains to be demonstrated. Interfering with cell-cycle progression in a system as complex and fine-tuned as infected legume roots certainly requires the right timing – at the level of the tissue and of individual cells; the right dose; and the right molecular player(s) (i.e., bona fide activators or repressors of the G2/M transition). Using the symbiosis-specific NPL promoter, activated in the direct vicinity of cortical infection threads (Figure 9-figure supplement 1B), we tried to force infectable cells to recruit the cell division program by ectopically over-expressing the Arabidopsis CYCD3.1, “mimicking” the CDEL system. So far, this strategy has not resulted in a significant increase in the number of uninfected nodules in transgenic hairy roots - though the effect on symbiosome release remains to be investigated. Provided that a suitable promoter-cell cycle regulator combination is identified, we hope to be able to answer this question in the future.

      Given that the authors have already identified a candidate, and showed it represses cell division in the CDEL system, not testing the same gene in a more relevant context seems a lost opportunity. If one ectopically expressed NY-YA1 in hairy roots, thus repressing mitosis in general, would more cells become competent to host infection threads? This seems a straightforward experiment and readily feasible with the constructs that the authors already have. If this view is too naive, the authors should explain why such a functional investigation does not belong in this manuscript.

      Reviewer #2's point is entirely valid, and we decided to address it through additional experiments. To avoid possible side effects on development by affecting cell division in general, we placed NF-YA1 under control of the symbiosis-induced ENOD11 promoter. Based on the results obtained in the CDEL system, the pENOD11::FLAG-NF-YA1 cassette was coupled to a destabilized version of the KNOLLE transcriptional reporter to detect the G2/M transition. Competence for transcellular infection was maintained upon local NFYA1 overexpression, the latter leading to a slight (non-significant) increase in the number of infected cells per cortical layer. These results are presented in Figure 9-figure supplement 3A-B (representative confocal images) and in Figure 9-figure supplement 4A-

      G.

      (1b) A related comment: on Line 183, it was stated that "The H3.1-eGFP fusion protein was also visible in cells penetrated but not fully passed by an infection thread". Presumably, the authors were talking about the cell marked by the arrowhead. But its H3.1-GFP signal looks no different from the cell immediately to its left. It is hard to say which cells are ones "preparing for intracellular infection pass through S-phase", and which ones are just "regularly dividing cortical cells forming the nodule primordium". What can be concluded is that once a cell has been fully transversed by an infection thread, its H3.1 level is low. Whether this is the cause or consequence of infection cannot be resolved simply by timing the appearance or disappearance of H3.1-GFP.

      We basically agree with comment 1b. In an unsynchronized system such as infected hairy roots, it is challenging to detect the event where a cell is penetrated, but not yet completely crossed by an infection thread. What we wanted to emphasize in Figure 2A, is that host cells in the path of an infection thread re-enter the cell cycle and pass through S-phase just as their neighbours do (as pointed out by reviewer #2 in his summary). The larger nucleus with slightly lower H3.1-eGFP signal than the neighbouring cell (as indicated by the use of the Green Fire Blue lookup table) suggests that the infected cell marked by the arrowhead in Figure 2A is actually in the G2 phase. The main difference is indeed that cells allowing complete infection thread passage exit the cell cycle and largely evict H3.1 while their neighbours proceed to cell division (as exemplified by PlaCCI reporters in Figure 4CD and the new Figure 5-figure supplement 2). Whether cell-cycle exit in G2 is a cause, or a consequence of cortical infection is a question that cannot be easily answered from fixed samples, which is a limitation of our study.

      (2) The authors have convincingly demonstrated that cortical cells accommodating infection threads exit the cell cycle, inhibit cell division, and down-regulate KNOLLE expression. How do these observations reconcile with the feature called the pre-infection thread? The authors devoted one paragraph to this question in the Discussion, but this does seem sufficient given that the pre-infection thread is a prominent concept. Is the resemblance to the cell division plane superficial, or does it reflect a co-option of the normal cytokinesis machinery for accommodating rhizobia?

      From our point of view, cortical cells forming pre-infection threads are likely in an intermediate state. PIT structures undoubtedly share many similarities with cells establishing a cell division plane. The recruitment of at least some of the players normally associated with cytokinesis has been demonstrated and is consistent with the maintenance of infectable cells in a pre-mitotic phase in Medicago, as discussed in lines 558 to 568. We nevertheless think that the arrest of the cell cycle in the G2 phase, presumably occurring in crossed cortical cells, constitutes an event of cellular differentiation and specialization in transcellular infection. 

      The following are mainly points of presentation and description: 

      (3) Line 158: I can't see "subnuclear foci" in Figure 1-figure supplement 1C-E. However, they are visible in Fig. 1C.

      We hope that presenting the eGFP and mCherry channels in separate panels and assigning them the Green Fire Blue colour scheme provides better visibility and contrast of these detailed structures. We now refer to Figure 1C in addition to Figure 1–figure supplement 1E in the main text (line 161). 

      (4) Line 160: The authors should outline a larger region containing multiple QC cells, rather than pointing to a single cell, as there are other areas in the image containing cells with the same pattern.

      We updated Figure 1-figure supplement 1E accordingly.

      (5) Fig. 1B should include single channels, since within a single plant cell, the nucleus, the infection thread, and sometimes symbiosomes all have the same color. This makes it hard to see whether the nuclei in these cells are less green, or are simply overwhelmed by the magenta color.

      To improve the readability of Figure 1B and to address suggestions from individual reviewers, we now include separate channels and have annotated the different structures labeled by mCherry.

      (6) Fig. 2A: the close-up does not match the boxed area in the left panel. Based on the labeling, it seems that the two panels are different optical sections. But why choose a different optical depth for the left panel? This can be disorienting to the author, because one expects the close-up to be the same image, just under higher magnification.

      We fully agree that our previous choice of representation may have been confusing. As we also specified to reviewer #1, we wanted to show a full-view of proliferating cells in the inner cortex and a zoomed-view of infected cells in the outer layers of the same nodule primordium. In the revised version of Figure 2A, we displayed these full- and zoomedviews in separate panels and removed the boxed area to avoid confusion. 

      (7) Figure 2-figure supplement 1B: the cell indicated by the empty arrowhead has a striking pattern of H3.1 and H3.3 distribution on condensed chromosomes. Can you comment on that?

      Reviewer #2 may be referring to the apparent enrichment of H3.3 at telomeres, previously described in Arabidopsis, while pericentromeric regions are enriched in H3.1. This distribution is indeed visible on most of the condensed chromosomes shown in Figure 2-figure supplement 1B. We included this comment in the corresponding caption.

      (8) Fig. 4: It is not very easy to distinguish M phase. Can the authors describe how each phase is supposed to look like with the reporters?

      We agree with reviewer #2 and attempted to improve Figure 4, which is now dedicated to the Arabidopsis PlaCCI reporter. ECFP, mCherry, and YFP channels were presented separately and the corresponding cell-cycle phases (in interphase and mitosis) were annotated. The Green Fire Blue lookup table was assigned to each reporter to provide the best visibility of, for example, chromosomes in early prophase. We included a schematic representation corresponding to the distribution of each reporter, using the colors of the overlaid image to facilitate its interpretation.

      (9) Line 298: what is endopolyploid? This term is used at least three times throughout the manuscript. How is it different from polyploid?

      In the manuscript, we aimed to differentiate the (poly)ploidy of an organism (reflecting the number of copies of the basic genome and inherited through the germline) from endopolyploidy produced by individual somatic cells. As reviewed by Scholes and Paige, polyploidy and endopolyploidy differ in important ways, including allelic diversity and chromosome structural differences. In the Medicago truncatula root cortex for example, a tetraploid cell generated via endoreduplication from the diploid state would contain at most two alleles at any locus. The effects of endopolyploidy on cell size, gene expression, cell metabolism and the duration of the mitotic cell cycle are not shared among individual cells or organs, contrasting to a polyploid individual (Scholes and Paige, 2015).

      See Scholes, D. R., & Paige, K. N. (2015). Plasticity in ploidy : A generalized response to stress. Trends in Plant Science, 20(3), 165‑175. https://doi.org/10.1016/j.tplants.2014.11.007

      (10) Line 332: "chromosomes on mitotic figures" - what does this mean?

      Reviewer #2 is right to point out this redundant wording. Mitotic “figures” are recognized, by definition, based on chromosome condensation. We now use the term "mitotic chromosomes" (line 344).

      (11) Fig. 6A: could the authors consider labeling the doublets, at least some of them? I understand that this nucleus contains many doublets. However, this is the first image where one is supposed to recognize these doublets, and pointing out these features can facilitate understanding. Otherwise, a reader might think the image is comparable to nuclei with no doublets in the rest of the figure.

      Following this suggestion, five of these doublets are now labeled in Figure 7A (formerly Figure 6A).

  2. May 2025
    1. eLife Assessment

      In this convincing work by Yamaguchi et al. the cryo-EM structure of the heterohexameric 3:3 LGI1-ADAM22 complex is presented. The findings suggest that LGI1 can cluster ADAM22 in a trimeric fashion. The clustering of cell surface proteins is important in controlling signaling in the nervous system. This new version of the manuscript has been improved substantially and the figures have been enhanced and clarified.

    2. Reviewer #1 (Public review):

      The structure of a heterohexameric 3:3 LGI1-ADAM22 complex is resolved by Yamaguchi et al. It reveals the intermolecular LGI1 interactions and its role in bringing three ADAM22 molecules together. This may be relevant for the clustering of axonal Kv1 channels and control over their density. While it is currently not clear if the heterohexameric 3:3 LGI1-ADAM22 complex has a physiological role, the detailed structural information presented here allows to pinpoint mutations or other strategies to probe the relevance of the 3:3 complex in future work.

      The experimental work is done to a high standard, and all my comments have been addressed. This new version of the manuscript has been improved substantially, and the figures have been enhanced and clarified.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) A previously determined 2:2 heterodimeric complex of LGI1-ADAM22 was suggested to play a role in trans interactions. Could the authors discuss if the heterohexameric 3:3 LGI1-ADAM22 is more likely to represent a cis complex or a trans complex, or if both are possible?

      We noticed that there was no obvious structural feature strongly suggesting that the heterohexameric 3:3 LGI1-ADAM22 is more likely to represent a cis complex or a trans complex. Both are possible at the synapse (and similarly, for LGI3-ADAM23 at the jaxtaparanode of myelinated axons). Therefore, we revised the Introduction and Discussion sections as follows:

      Introduction: (about potential structural mechanisms of the 3:3 complex)

      “Similarly to the 2:2 complex, the 3:3 complex might serve as an extracellular scaffold to stabilize Kv1 channels or AMPA receptors in a trans-synaptic fashion. In addition, the 3:3 assembly in a cis fashion on the same membrane might regulate the accumulation of Kv1 channel complexes at axon initial segment. However, no clear evidence to prove these potential mechanistic roles of the 3:3 assembly has been provided, and the three-dimensional structure of the 3:3 complex has not yet been determined.”

      Discussion: (about a role of the LGI3–ADAM23 complex at the jaxtaparanode of myelinated axons)

      “In this context, as discussed in (30), either or both of the 2:2 and 3:3 complexes might be formed in a trans fashion at the juxtaparanode of myelinated axons and bridge the axon and the innermost myelin membrane. Alternatively, the 3:3 complex formed in a cis fashion might positively regulate the clustering of the axonal Kv channels at the juxtaparanode, possibly in a similar manner at the axon initial segment.”

      *Ref. 30: Y. Miyazaki et al., Oligodendrocyte-derived LGI3 and its receptor ADAM23 organize juxtaparanodal Kv1 channel clustering for short-term synaptic plasticity. Cell Rep 43, 113634 (2024).

      (2) It is not entirely clear to me if the LGI1-ADAM22 complex is also crosslinked in the HS-AFM experiments. Could this be more clearly indicated? In addition, if this is the case, could an explanation be given about how the complex can still dissociate?

      Thank you for the constructive suggestions. A non-crosslinked 3:3 LGI-ADAM22 complex was used for HS-AFM observations. To clarify the sample used for HS-AFM, we have modified the text as follows.

      P.8 “Dynamics of the LGI1‒ADAM22 higher-order complex observed by HS-AFM

      HS-AFM images of gel filtration chromatography fractions containing the 3:3 LGI1-ADAM22<sub>ECD</sub> complex (not chemically crosslinked with glutaraldehyde) predominantly…”

      P.10 Materials and methods

      “HS-AFM observations of the LGI1–ADAM22<sub>ECD</sub> complex (not chemically crosslinked with glutaraldehyde) were conducted on AP-mica,…”

      (3) The LGI1 and ADAM22 are of similar size. To me, this complicates the interpretation of dissociation of the complex in the HS-AFM data. How is the overinterpretation of this data prevented? In other words, what confidence do the authors have in the dissociation steps in the HS-AFM data?

      Our criteria for assigning HS-AFM images to the 3:3 LGI1–ADAM22<sub>ECD</sub> complex were based on a comparison of the simulated AFM image of the 3:3 complex obtained by cryo-EM. The automatized fitting process (42) identifies the optimal orientation of cryo-EM images that closely matches the HS-AFM image. In the present study, the concordance coefficient (CC) reached 0.8, indicating that the protein orientation in HS-AFM images of the 3:3 complex was objectively satisfactory.

      Regarding the dissociation step of ADAM22 from the 3:3 complex, we carefully analyzed the HS-AFM videos frame by frame and observed that the protrusion corresponding to ADAM22 in the 3:3 complex disappeared at a specific frame (4.5 s in the third molecule in Movie S1). The dissociation steps of ADAM22 were further confirmed by integrating multiple independent HS-AFM experiments and observations. Thus, although HS-AFM images alone cannot determine the orientation of LGI1 and ADAM22 in the 3:3 complex, the comparison of cryo-EM images with simulated AFM images enables objective assignment and orientation of proteins in the 3:3 complex through automated fitting.

      *Ref. 42: R. Amyot et al., Flechsig, Simulation atomic force microscopy for atomic reconstruction of biomolecular structures from resolution-limited experimental images. PLoS Comput Biol 18, e1009970 (2022).

      (4) What is the "LGI1 collapse" mentioned in Figure 4c?

      Thank you for the constructive suggestions. The term “LGI1 collapse” was intended the dissociation of LGI1 from the 3:3 complex. To avoid confusion, we have revised it to “LGI1 release”.

      (5) Am I correct that the structure indicates that the trimerization is entirely organized by LGI1? This would suggest LGI1 trimerizes on its own. Can this be discussed? Has this been observed?

      Yes. The present cryo-EM structure of the 3:3 complex indicates that the trimerization can be entirely organized by LGI1. In addition, during the HS-AFM imaging, the triangle shape seems to be maintained even if one ADAM22<sub>ECD</sub> molecule is released. These findings suggest the possibility that LGI1 could trimerize on its own although this possibility could not be tested due to the difficulty in the expression of the full-length LGI1 alone for biophysical analysis in our hands. On the other hand, considering the dynamic property of the 3:3 complex and spatial alignment of LGI1LRR and ADAM22, we cannot exclude the possibility that ADAM22 could act as a platform to facilitate the intermolecular interaction between LGI1<sub>LRR</sub> and LGI1*<sub>EPTP</sub> for the trimerization of LGI1. This discussion was added in the first paragraph of the subsection "Dynamics of the LGI1–ADAM22 higher-order complex by HS-AFM".

      (6) C3 symmetry was not applied in the cryo-EM reconstruction of the heterohexameric 3:3 LGI1-ADAM22 complex. How much is the complex deviating from C3 symmetry? What interactions stabilize the specific trimeric conformation reconstructed here, compared to other trimeric conformations?

      According to this comment, we compared the non-symmetric, present cryo-EM structure to the previously calculated _C_3 symmetry-restrained structure based on small-angle X-ray scattering analysis and the _C_3 symmetric structure generated by AlphaFold3. Their differences in the domain or protomer configuration are illustrated in Fig. S9.

      We did not find interactions that could obviously stabilize the specific trimeric conformation but the closure motion of LGI1<sub>LRR</sub> (relative to LGI1<sub>EPTP</sub>) in chain F appears to locate it in close proximity to LGI1LRR in chain D to make the triangular assembly slightly more compact. This (partly) compact configuration might stabilize the non-symmetric trimeric configuration observed in the cryo-EM structure. This was described in the last sentence in the subsection "Cryo-EM structure of the 3:3 LGI1– ADAM22<sub>ECD</sub> complex".

      Reviewer #2 (public review):

      The functional significance of these two complexes in the context of synapse remains speculative.

      To assess the functional significance of the 3:3 complex, we spent time and effort designing mutations that solely inhibit the 3:3 assembly but failed to find such mutations. In this paper, we just focused on structural characterization of the 3:3 complex.

      Additionally, the structural presentations in Figures 1-3 (especially Figures 2-3) lack the clarity needed for general readers to fully understand the authors' key points. Enhancing the quality of these visual representations would greatly improve accessibility and comprehension.

      We made an effort to improve Figures 1-3 accordingly. Specifically, we revised them based on the strategy suggested in the Editorial comment regarding this reviewer's comment.

      Editorial comments:

      We noticed that in the reconstruction of the 3:3 complex, which is claimed to be at 3.8A resolution, beta-strands are not separated in the map and local resolution estimates vary from 6-10A. Please clarify.

      We revised Fig. S8 to show the local resolution and volume quality, which correspond to nominal resolution of 3.8 Å, estimated from gold-standard FSC.

      Reviewer #1 (Recommendations for the authors):

      (1) PDB validation reports should be presented to allow further validation

      The PDB validation reports were attached to the revised manuscript (uploaded as "related manuscript file").

      (2) In Figure 4, models below the AFM figures are difficult to see because of the light coloring. In addition, in panel c, the orientation of some of the parts of the models below the 19.2 and 34.5 s. panels do not seem to correlate with the AFM figures. Could the models be adjusted so that they represent the data better?

      Thank you for the constructive suggestions. According to the Reviewer’s comments, we have revised the AFM figures (Fig. 4).

      (3) References are sometimes missing for important statements. Please check throughout.

      Some examples:

      P3, "it has been suggested that the 3:3 complex regulates the density of synaptic molecules such as scaffolding proteins and synaptic vesicles".

      P3. "Furthermore, LGI1 forms a complex with the voltage-gated potassium channel (VGKC) through ADAM22/23".

      According to this comment, we rewrote the description about potential physiological roles of the 3:3 complex and added references as follows:

      "Similarly to the 2:2 complex, the 3:3 complex might serve as an extracellular scaffold to stabilize Kv1 channels or AMPA receptors in a trans-synaptic fashion (9, 17, 19). In addition, the 3:3 assembly in a cis fashion on the same membrane might regulate the accumulation of Kv1 channel complexes at axon initial segment (18, 20). However, no clear evidence to prove these potential mechanistic roles of the 3:3 assembly has been provided, and the three-dimensional structure of the 3:3 complex has not yet been determined."

      We also added references to the following sentences:

      p.2, (the last sentence in the first paragraph of the Introduction) “Additionally, some epilepsy-related mutations have been identified in genes encoding non-ion channel proteins such as LGI1 (4-7).”

      p.3, ln 4-5, “The metalloprotease-like domain interacts with the EPTP domain of LGI1 in the extracellular space (11, 14).”

      p.3, ln 9-10, “Furthermore, LGI1 forms a complex with the voltage-gated potassium channel (VGKC) through ADAM22/23 (9, 17, 18)”

      p.3, ln 20-22, “The results revealed the structural basis of the interaction between the EPTP domain of one LGI1 and the LRR domain of the other LGI1, as well as the interaction between the EPTP domain of LGI1 and the metalloproteinase-like domain of ADAM22 (14)”

      (4) S5 for clarity please add an overview of the complex highlighting where the different parts shown in the panels are located.

      Fig. S5 was modified accordingly. Every panel showing a zoom-up view was indicated by a box in an overview of the complex.

      (5) S7 a+b, also here add models for the structures to indicate which parts are shown.

      Could labels be added to highlight important parts?

      We added an overview of the complex with boxes that indicate the parts shown as the panels, according to this comment. We also added labels to highlight residues that are important for the LGI1<sub>EPTP</sub>–ADAM22<sub>ECD</sub> interaction in the panel showing the LGI1<sub>EPTP</sub>–ADAM22<sub>ECD</sub> interface.

      (6) S7c also shows the cartoon of the structure. How is it possible that the local resolution is not much higher than 6 Å? The overall resolution was 3.8 Å? This looks like a figure of the density plotted at a low level, and not as stated a "surface representation". Could an extra panel be shown of the density plotted at a higher level? Also, please add Å to the legend in this figure.

      Local resolution maps of the 3:3 LGI1-ADAM22<sub>ECD</sub> complex were shown as Fig. S8 in the revised manuscript. According to this comment, the distribution of the resolution was plotted onto the density at high (0.06) and low (0.03) levels. "Å" was added to the legend in the figure.

      Reviewer #2 (Recommendations for the authors):

      (1) The study was conducted using the ectodomain (ECD) of ADAM22. It remains unclear whether the 3:3 complex could form if the transmembrane domain (TMD) of ADAM22 were included. In other words, it is difficult to assess whether the observed 3:3 complex represents plausible cis interactions.

      As mentioned in our reply to the first comment from Reviewer #1, we noticed that there was no obvious structural feature strongly suggesting that the heterohexameric 3:3 LGI1–ADAM22 is more likely to represent a cis complex or a trans complex. Both are possible at the synapse (and similarly, for LGI3–ADAM23 at the jaxtaparanode of myelinated axons). Therefore, we revised the Introduction and Discussion sections as follows:

      Introduction: (about potential structural mechanisms of the 3:3 complex)

      “Similarly to the 2:2 complex, the 3:3 complex might serve as an extracellular scaffold to stabilize Kv1 channels or AMPA receptors in a trans-synaptic fashion. In addition, the 3:3 assembly in a cis fashion on the same membrane might regulate the accumulation of Kv1 channel complexes at axon initial segment. However, no clear evidence to prove these potential mechanistic roles of the 3:3 assembly has been provided, and the three-dimensional structure of the 3:3 complex has not yet been determined.”

      Discussion: (about a role of the LGI3–ADAM23 complex at the jaxtaparanode of myelinated axons)

      “In this context, as discussed in (30), either or both of the 2:2 and 3:3 complexes might be formed in a trans fashion at the juxtaparanode of myelinated axons and bridge the axon and the innermost myelin membrane. Alternatively, the 3:3 complex formed in a cis fashion might positively regulate the clustering of the axonal Kv channels at the juxtaparanode, possibly in a similar manner at the axon initial segment.”

      *Ref. 30: Y. Miyazaki et al., Oligodendrocyte-derived LGI3 and its receptor ADAM23 organize juxtaparanodal Kv1 channel clustering for short-term synaptic plasticity. Cell Rep 43, 113634 (2024).

      (2) Page 2, line 1: "...caused by genetic mutations." - Specify the mutations involved. Which genes are mutated? Providing this information would enhance clarity and context.

      According to this comment, we rephrased the sentence as follows:

      "LGI1 is linked to epilepsy, a neurological disorder that can be caused by genetic mutations of genes regulating neuronal excitability (e.g., voltage- or ligand-gated ion channels)."

      (3) The experimental strategy and data for both cryo-EM and HS-AFM are of high quality. However, improvements are needed in the cryo-EM/structural figures to enhance clarity. Structural components should be labeled, and the protein interfaces should be identified within the overall complex figures in Figures 2 and 3, as the current presentation is challenging for general readers to follow. For example, in Figure 2, panel a would benefit from clear labeling to indicate the locations of ADAM22 and LGI1. Panels b and c lack context unless the authors specify which interface corresponds to panel a. Additionally, panels e and f are unlabelled, making it difficult to interpret the figures. Improved annotations and descriptions would significantly enhance figure accessibility and comprehension.

      Thank you for the constructive suggestion for enhancing accessibility and comprehension of cryo-EM/structural figures. According to this comment, we labeled structural components and indicated the protein interfaces as boxes in the overall complex figures in Figures 2 and 3. Further, in Figure 2, the locations that panels b and c show were indicated as two boxes in the close-up view in panel a.

    1. eLife Assessment

      This solid study assesses a mitochondrial polymerase inhibitor in combination with the BCL-2 inhibitor venetoclax, with the aim to increase the elimination of acute myeloid leukemia. It provides valuable findings of combinatorial efficacy using preclinical models in vitro and in vivo, confirming the overall importance of targeting oxidative phosphorylation to overcome venetoclax resistance in acute myeloid leukemia, and could be strengthened through mechanistic studies demonstrating on target effects and pharmacodynamic efficacy in vivo. The study is of interest to hematologists because it addresses a key biomedical issue in acute myeloid leukemia (venetoclax resistance) and provides data regarding the safety and activity of a novel inhibitor of the mitochondrial polymerase in combination with venetoclax.

    2. Reviewer #1 (Public review):

      This study exploits novel agent (IMT) that inhibits mitochondrial activity in combination with venetoclax. While the concept is not novel, the agent is novel (inhibitor of the mitochondrial RNA polymerase, described in Nature in other tumor models), and quest for safe mitochondrial inhibitors is highly warranted. The strength is in vivo activity data shown in CLDX and in one of the two AML PDX models tested, and apparent safety of the combination. However, the impact on survival is impressive in CLDX but not in PDX, and unclear why Ven-sensitive PDX is resistant to combination (opposite what cell line data show). There is no real evidence that this agent overcome Ven resistance, which could be done for example in primary AML cells. Finally, no on-target pharmacodynamic endpoints are measured in vivo to support the activity of the compound on mitochondrial activity at the doses used (which are safe).

      Both Reviewers requested to demonstrate that IMT1 inhibits the target at doses used in vitro or in vivo; while the prior paper showed this for original compound, it is imperative to demonstrate this for this modified agent in a different tumor type such as AML.

      These points have not been addressed in the Revision.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Arabanian and colleagues presents studies showing how inhibition of mitochondrial transcription and replication with a novel inhibitor of the mitochondrial polymerase, IMT, can promote AML cell death in combination with the Bcl2 inhibitor venetoclax. They further show that this combinatorial efficacy is evident in vivo in both the AML cell line MV411 and in a PDX model. Given the multiple studies showing the importance of Oxphos in maintaining AML cell survival, the current studies provide an additional strategy to inhibit Oxphos and thus improve the therapeutic management of AML.

      Strengths:

      A novel aspect of this work is that IMT is a new class of mitochondrial inhibitor that acts through inhibiting the mitochondrial polymerase. In addition, the demonstration of therapeutic efficacy both in vitro and in vivo (including with PDX), together with some data showing minimal toxicity, adds to the impact of this work. Their overall conclusion that IMT increases the potency of Vex in treating AMLs is supported.

      Comments on revisions:

      In all, the authors responded to most of the critiques, while two of the major critiques were not experimentally addressed. The work will still have potential impact, but will depend on further studies under more clinically relevant conditions and with a better understanding of drug effects.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) The data are generated using ATP read-out (CTG assay). For any inhibitor of mitochondrial function, ATP assays are highly sensitive reflecting metabolic stress, yet these do not necessarily translate into cell growth inhibition using standard Trypan blue assays and tend to overestimate the effects. Please show orthogonal more robust assays of cell growth or proliferation.

      We acknowledge the sensitivity of the ATP read-out assay in reflecting metabolic stress. While additional cell growth assays such as Trypan blue exclusion could provide further insights, we believe that the current ATP assay data robustly demonstrate the effect of the IMT and venetoclax combination on cellular metabolism, which is a critical aspect of our study. The scope of our current work focused on metabolic inhibition, and we suggest that future studies could further explore cell proliferation assays to complement these findings.

      (2) It is concluded that AML cells do not utilize glucose for ATP production. Please provide formal measurements of glycolysis/lactate upon combinatorial treatment.

      We appreciate the reviewer’s suggestion to include glycolysis and lactate measurements, which could indeed add further granularity to our metabolic analysis. However, the primary focus of our study is on mitochondrial function and oxidative phosphorylation (OXPHOS) in AML cells treated with IMT and venetoclax. We believe the data presented in Figure 3 provide strong support for the conclusion that glycolysis is not a major energy source in these cells.

      Specifically, in Figure 3C, we demonstrate that AML cells maintain ATP levels and viability when cultured in galactose, a condition that restricts ATP production through glycolysis and forces cells to rely on OXPHOS. This result strongly suggests that these AML cells are not dependent on glycolysis for ATP production. Furthermore, in Supplementary Figure S3B, we show that oxygen consumption rate (OCR) measurements remain stable in the presence of excess glucose, further supporting our conclusion that the cells do not switch to glycolysis when OXPHOS is inhibited.

      These findings collectively indicate a primary reliance on OXPHOS for energy generation in AML cells, consistent with our study’s objectives to explore mitochondrial dependency and the therapeutic potential of targeting mitochondrial transcription in AML. Future studies could certainly expand on these insights by incorporating a more detailed analysis of glycolytic flux and lactate production under combinatorial treatment, but we believe the current data are sufficient to support our main conclusions.

      (3) The transcriptome data are shown without any analysis of pathways. The conclusion from this data beyond the higher number of genes impacted in the combination arm is unclear. Please provide analysis for example GO pathways and interpret in the context of the drugs' mechanism of action.

      In response to the reviewer’s question, we have added gene ontology (GO) pathway analysis to clarify the transcriptomic impact of our combination treatment with IMT and venetoclax. Functional annotation identified significant enrichment in pathways relevant to innate immune response, mitochondrial function, and cellular signaling processes. Specifically, pathways associated with immune defense, mitochondrial signaling, and intracellular signaling were notably affected. These findings suggest that the combination treatment not only disrupts cellular energy metabolism but also potentially primes immune signaling mechanisms. This aligns with the proposed mechanism, where IMT targets mitochondrial transcription and venetoclax induces apoptosis, together enhancing sensitivity in AML cells. The enriched pathways, therefore, support the mechanism of action of both drugs, showing how the combined inhibition of BCL-2 and mitochondrial transcription creates a compounded cellular disruption that enhances the therapeutic effect.

      (4) Please demonstrate (could be in supplement) matrix of combination to support the statement that the combination is synergistic using Bliss index. The actual Bliss values are missing.

      For the revision, we have now included a matrix of combination treatment effects with the corresponding Bliss synergy index values to substantiate our claim of synergy between IMT and venetoclax. This analysis, provided in the supplement, demonstrates that the observed effects exceed the expected additive impact of each drug alone, as calculated by the Bliss independence model. Specifically, the Bliss values confirm a synergistic interaction in venetoclax-sensitive AML cell lines, highlighting that the combined treatment significantly enhances inhibition of cell viability and apoptosis induction compared to single treatments. This data supports our interpretation of synergy and strengthens the mechanistic conclusions drawn from our findings on the combination therapy’s efficacy.

      (5) Please show KG1 data (OCR), here or in Supplement.

      In response to the reviewer’s request to include OCR data for the KG-1 cell line, we would like to clarify that OCR measurements were attempted; however, they did not yield conclusive results. This is noted in the revised manuscript (Results section), where we explain that the KG-1 cell line did not provide usable OCR data, likely due to limitations in detecting reliable mitochondrial respiration in this particular line under our experimental conditions. Therefore, we were unable to include KG-1 OCR data in the main figures or the supplement.

      Reviewer #2:

      (1) It's important that the authors show that the drug's effects in AML are due to on-target inhibition. It's critical that they show that IMT actually inhibits the mito polymerase in the AML cells in the dose range employed.

      We appreciate the importance of demonstrating on-target inhibition of mitochondrial RNA polymerase by IMT1, especially in light of the detailed characterization of IMT1b, a closely related compound, as presented in Bonekamp et al., Nature 2020. The work by Bonekamp et al. established the specificity and efficacy of IMT1b in targeting mitochondrial RNA polymerase across various tumor models. Building on these findings, we designed our study to primarily evaluate the combinatorial efficacy of IMT1 with venetoclax in AML models, assuming a similar mechanism of action as described for IMT1b. While direct confirmation of on-target inhibition in AML cells by IMT1 would undoubtedly provide additional mechanistic insight, we focused on translational aspects in this study. We believe that the foundational work provided by Bonekamp et al. supports the assumption of on-target effects by IMT1, and we suggest that future studies could explicitly verify this in the context of AML.

      (2) For Fig 1, the stated synergism between Venetoclax (Vex) and IMT in p53 mutant THP1 cells is really not evident, despite what the statistical analysis says. In some ways, the more interesting conclusion is that inhibiting mitochondrial transcription does NOT potentiate the efficacy of Bcl2 inhibition in TP53 mutant AML.

      We appreciate the reviewer’s observation regarding the lack of evident synergy between IMT and venetoclax in TP53 mutant THP-1 cells. In line with this comment, we have now expanded the discussion to emphasize that, while statistical analysis suggested a potential interaction, the biological response in TP53 mutant cells was minimal. This contrasts with the strong synergy observed in TP53 wild-type cell lines, such as MV4-11 and MOLM-13. We have now highlighted that TP53 mutation status may limit the effectiveness of mitochondrial transcription inhibition in potentiating BCL-2 inhibition. This addition underscores the importance of mutation profiles, such as TP53 status, in predicting response to combination therapies in AML and is now clearly addressed in the revised discussion.

      (3) They combine IMT with Vex, but Vex plus azacytidine or decitabine is the approved therapy for AML. Any clinical trial would likely start with this backbone (like Vex+Aza). They should test combinations of IMT with Vex/Aza or Vex/Dec.

      While we recognize the importance of testing IMT in combination with clinically approved therapies like Vex+Aza, our current study was designed to explore the potential of IMT in combination with venetoclax alone. Expanding to other combinations would be an excellent direction for future research but is beyond the scope of our current investigation.

      (4) It's interesting that AML cell lines do not show any reliance on ATP generation from glycolysis, but would this still be the case when OxPhos is inhibited with IMT? Such a simple experiment would be much more interesting and could help them better understand the mechanism of IMT efficacy.

      We thank the reviewer for highlighting this point regarding the reliance of AML cell lines on glycolysis under OxPhos inhibition. In our study, we observed that AML cells predominantly rely on OxPhos, and we did test for ATP production in conditions that favored glycolysis by growing AML cells with galactose instead of glucose in the medium. As described in the manuscript, we did not observe significant ATP production or cell viability from glycolysis, even under these conditions. This finding suggests that AML cells have a low capacity to adapt to glycolytic ATP generation when OxPhos is disrupted by IMT, reinforcing the view that they are highly dependent on mitochondrial function for energy production. We agree that this adaptation—or lack thereof—is an intriguing aspect of IMT efficacy in targeting energy metabolism in AML cells, and we have clarified this point in the discussion.

      (5) OxPhos measurements need statistical analyses.

      We appreciate the reviewer’s suggestion to include statistical analyses for the OXPHOS measurements. We would like to clarify that statistical analyses were included in the initial submission. These are detailed in Figure 3 and its legend, as well as in the Statistical Analysis section, where we specify methods such as the calculation of standard error across replicates. This approach was implemented to ensure the rigor of our OCR data and its conclusions on OXPHOS inhibition in AML cells.

      (6) Given that the combo-treated mice do not exhibit much leukemia in the blood through ~180 days, and yet start dying after 100 days, the authors should comment on this, given that the bone marrow has been shown to be a refuge that protects leukemia cells from various therapies.

      We thank the reviewer for highlighting the observed discrepancy between peripheral blood leukemia levels and survival in combo-treated mice. While leukemic cells were minimally detected in the blood up to approximately 180 days, treated mice began to show signs of disease progression and reduced survival around 100 days. This may suggest that residual leukemic cells persist within the bone marrow, which has been established as a sanctuary site for leukemic cells, providing protection from various therapies. The bone marrow environment likely supports a survival niche, enabling these residual cells to evade treatment effects and potentially initiate disease relapse. We have added this interpretation to the discussion to acknowledge the possibility of bone marrow as a protective refuge, which may limit the full eradication of leukemia in these models despite apparent peripheral blood clearance.

      (7) For Fig 5C, the authors should statistically compare the Combo with Vex alone.

      We have now included statistical comparisons between the combination treatment and venetoclax alone in Fig 5C to provide a clearer interpretation of the data.

      (8) The analyses of gene expression using RNAseq of harvested leukemia cells from the PDX model (Table S2), some more discussion of these results would be helpful, particularly given that neither drug is directly targeting nuclear gene expression.

      We thank the reviewer for their suggestion to discuss the RNAseq findings in more detail. In the revised manuscript, we have expanded on the functional annotation of the gene expression changes observed in leukemia cells from the PDX model following combination treatment (Table S2). The enriched pathways include innate immune involvement, mitochondrial function and immune signaling, and intracellular signaling. This suggests that while neither IMT nor venetoclax directly targets nuclear gene expression, the combined treatment induces secondary effects that alter these pathways, potentially contributing to the treatment’s efficacy in AML. This expanded discussion provides greater insight into how the drug combination impacts gene expression and cellular pathways.

      (9) We need more information on the PDX models, in terms of the classification (M1 to M6) of the patient AMLs and genetics (specific mutations, not just the genes mutated, and chromosomal alterations).

      Additional details regarding the classification and genetic background of the PDX models have been included in the manuscript to better contextualize our findings.

      (10) The authors should discuss whether or not IMT represents an improvement over other therapies intended to target Oxphos in AML (clearly, the low toxicity of IMT is a plus, at least in mice).

      We appreciate the reviewer’s suggestion to discuss IMT in comparison with other OXPHOS-targeting therapies for AML. In the revised discussion, we highlight IMT’s unique properties, particularly its low toxicity profile, which may offer advantages over other OXPHOS inhibitors. This low toxicity, demonstrated in preclinical studies, suggests that IMT might improve patient tolerability compared to existing therapies that target mitochondrial function.

      (11) The authors examined toxicity by weighing the mice and performing CBCs. Measurements of liver and kidney toxicity will be necessary for further clinical development.

      We thank the reviewer for the suggestion to further investigate liver and kidney toxicity. In our study, we assessed toxicity through regular weight monitoring and complete blood counts (CBCs) to evaluate overall health status. While additional liver and kidney toxicity measurements will indeed be important in future studies, resource limitations currently prevent us from performing these additional analyses in this model. We agree that these assessments will be essential as we progress towards clinical development, and we plan to address them in upcoming preclinical studies.

    1. eLife Assessment

      The study presents extensive gene expression profiling and bioinformatic analyses, offering insights into the roles of fibroblasts in cardiac development. The large volume of scRNA-seq data is both compelling and important to the scientific community. All three reviewers agree that the revised manuscript represents a significant improvement and addresses most, if not all, of their previous concerns. The reviewers also acknowledge that detailed mechanistic studies on how fibroblast-derived collagen regulates myocardial and coronary vasculature development are beyond the scope of the current study.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Deng et al reports single cell expression analysis of developing mouse hearts and examines the requirements for cardiac fibroblasts in heart maturation. The work includes extensive gene expression profiling and bioinformatic analysis. The prenatal fibroblast ablation studies show new information on the requirement of these cells on heart maturation before birth.

      The strengths of the manuscript are the new single cell datasets and comprehensive approach to ablating cardiac fibroblasts in pre and postnatal development in mice. Extensive data are presented on mouse embryo fibroblast diversity and morphology in response to fibroblast ablation. Histological data support localization of major cardiac cell types and effects of fibroblast ablation on cardiac gene expression at different times of development.

      A weakness of the study is that the major conclusions regarding collagen signaling and heart maturation are based on gene expression patterns and are not functionally validated.

    3. Reviewer #2 (Public review):

      This study aims to elucidate the role of fibroblasts in regulating myocardium and vascular development through signaling to cardiomyocytes and endothelial cells. This focus is significant, given that fibroblasts, cardiomyocytes, and vascular endothelial cells are the three primary cell types in the heart. The authors employed a Pdgfra-CreER-controlled diphtheria toxin A (DTA) system to ablate fibroblasts at various embryonic and postnatal stages, characterizing the resulting cardiac defects, particularly in myocardium and vasculature development. Single-cell RNA sequencing (scRNA-seq) analysis of the ablated hearts identified collagen as a crucial signaling molecule from fibroblasts that influences the development of cardiomyocytes and vascular endothelial cells.

      This is an interesting manuscript; however, there are several major issues, including an over-reliance on the scRNA-seq data, which shows inconsistencies between replicates.

      Some of the major issues are described below.

      (1) The CD31 immunostaining data (Figure 3B-G) indicate a reduction in endothelial cell numbers following fibroblast deletion using PdgfraCreER+/-; RosaDTA+/- mice. However, the scRNA-seq data show no percentage change in the endothelial cell population (Figure 4D). Furthermore, while the percentage of Vas_ECs decreased in ablated samples at E16.5, the results at E18.5 were inconsistent, showing an increase in one replicate and a decrease in another, raising concerns about the reliability of the RNA-seq findings.

      (2) Similarly, while the percentage of Ven_CMs increased at E18.5, it exhibited differing trends at E16.5 (Fig. 4E), further highlighting the inconsistency of the scRNA-seq analysis with the other data.

      (3) Furthermore, the authors noted that the ablated samples had slightly higher percentages of cardiomyocytes in the G1 phase compared to controls (Fig. 4H, S11D), which aligns with the enrichment of pathways related to heart development, sarcomere organization, heart tube morphogenesis, and cell proliferation. However, it is unclear how this correlates with heart development, given that the hearts of ablated mice are significantly smaller than those of controls (Figure 3E). Additionally, the heart sections from ablated samples used for CD31/DAPI staining in Figure 3F appear much larger than those of the controls, raising further inconsistencies in the manuscript.

      (4) The manuscript relies heavily on the scRNA-seq dataset, which shows inconsistencies between the two replicates. Furthermore, the morphological and histological analyses do not align with the scRNA-seq findings.

      (5) There is a lack of mechanistic insight into how collagen, as a key signaling molecule from fibroblasts, affects the development of cardiomyocytes and vascular endothelial cells.

      (6) In Figure 1B, Col1a1 expression is observed in the epicardial cells (Figure 1A, E11.5), but this is not represented in the accompanying cartoon.

      (7) Do the PdgfraCreER+/-; RosaDTA+/- mice survive after birth when induced at E15.5, and do they exhibit any cardiac defects?

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigated fibroblasts' communication with key cell types in developing and neonatal hearts, with focus on critical roles of fibroblast-cardiomyocyte and fibroblast-endothelial cells network in cardiac morphogenesis. They tried to map the spatial distribution of these cell types and reported the major pathways and signaling molecules driving the communication. They also used Cre-DTA system to ablate Pdgfra labeled cells and observed myocardial and endothelial cell defects at development. They screened the pathways and genes using sequencing data of ablated heart. Lastly they reported a compensatory collagen expression in long term ablated neonate heart. Overall, this study provides us with important insight on fibroblasts' roles in cardiac development and will be a powerful resource for collagens and ECM focused research.

      Strengths:

      The authors utilized good analyzing tools to investigate on multiple database of single cell sequencing and Multi-seq. They identified significant pathways, cellular and molecular interactions of fibroblasts. Additionally, they compared some of their analytic findings with human database, and identified several groups of ECM genes with varying roles in mice.

      Weaknesses:

      This study is majorly based on sequencing data analysis. At the bench, they used very strident technique to study fibroblast functions by ablating one of the major cell population of heart. Also, experimental validation of their analyzed downstream pathways will be required eventually.

    1. eLife Assessment

      This work demonstrates the therapeutic potential of recombinant human PDGF-AB/BB proteins in alleviating the senescent signatures of primary human intervertebral disc cells. The study represents a fundamental, significant advance in the treatment of intervertebral disc degeneration through the suppression of senescence. The strength of evidence supporting these conclusions is compelling, as it is primarily based on transcriptomic analysis and direct protein measurements from relatively homogeneous cell populations. This work will be of interest to spine basic scientists and clinicians, as well as to musculoskeletal scientists more broadly. The revised manuscript adds greater clarity, and the impact of the study is greatly enhanced.

    2. Reviewer #1 (Public review):

      The authors, Zhang et al., demonstrate the beneficial effects of treating degenerate human primary intervertebral disc (IVD) cells with recombinant human PDGF-AB/BB on the senescence transcriptomic signatures. Utilizing a combination of degenerate cells from elderly humans and experimentally induced senescence in young, healthy IVD cells, the authors show the therapeutic effects on mRNA transcription as well as cellular processes through informatics approaches.

      One notable strength of this study is the use of human primary cells and recombinant forms of human PDGF-AB/BB proteins, which increases the translational potential of these in vitro studies. The manuscript is well-written, and the informatics analyses are thorough and clearly presented.

      Comments on revisions:

      The revised manuscript adds greater clarity, and the impact of the study is greatly enhanced.

    3. Reviewer #2 (Public review):

      Summary:

      This work highlights a novel role for platelet-derived growth factor (PDGF) in mitigating cellular senescence associated with age-related and painful intervertebral disc degeneration. Prior literature has demonstrated the importance of accumulation of senescent cells in mediating many of the pathological effects associated with the degenerate disc joint, such as inflammation and tissue breakdown. In this study the authors treat clinically relevant human nucleus pulposus and annulus fibrosus cells from patients undergoing discectomy with recombinant PDGF-AB/BB for 5 days and then deep phenotyped the outcomes using bulk RNA sequencing. In addition they irradiated healthy human disc cells which they subsequently treated with PDGF-AB/BB examining the expression of SASP-related markers and also PDGFRA receptor gene expression. Overall PDGF was able to down-regulate many senescent associated pathways and the degenerate phenotype in IVD cells. Altered pathways were associated with neurogenesis, mechanical stimuli, metabolism, cell cycle, reactive oxygen species and mitochondrial dysfunction. Overall the authors achieved their aims and the results by and large support their conclusions although improvements could be made to enhance the rigor of the study and findings

      Strengths:

      A major strength of this study is the use of human cells from patients undergoing discectomy for disc herniation as well as access to healthy human cells. Investigating the role of PDGF regarding cellular senescence in the degenerate disc joint is novel and an underexplored area of research which is a significant contribution to the field of spine. This study highlights a potential target for addressing cellular senescence where most of the prior focus has been on senolytic drugs. Such studies have broad implications to other age-related diseases where senescence plays a major role. The use of transcriptomics and therefore an unbiased approach to investigating the role of PDGF is also considered a strength as is the follow-up studies involving irradiating healthy human disc cells and treating these cells with PDGF. The combined assessment of both nucleus pulposus and annulus fibrosus cells in the context of these studies adds to the impact.

      Weaknesses:

      A weakness of these studies relates to qualitative data presented for the B-galactosidase assay. Quantification of such data sets would greatly strengthen the studies and lend further support to the hypotheses. The study in its current form could be strengthened by the inclusion of mechanistic studies probing the downstream PDGF receptor associated pathways for example specifically targeting or modulating the activity of the PDGF receptor PDGFRA.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The Reviewer asks that we provide the source of PDGF-AB/BB proteins.

      We apologize for omitting such information. We now provide the source of PDGF-AB/BB in the Methods as PeproTech. In our revised manuscript we clearly state in Page 7, line 142: “Cells were then treated with recombinant human PDGF-AB (40ng/ml; PeproTech, 10770584) or -BB (20ng/ml; PeproTech, 10771918) for 5 days. “

      The Reviewer asks that we adequately report our chosen irradiation parameters suggesting that we consider (PMCID: PMC5495460) for appropriate parameter reporting.

      We thank the Reviewer for this excellent suggestion. We now provide a more detailed irradiation reporting based on the shared manuscript in Page 9, line 10, line 204.

      The Reviewer requests more details about the age range to distinguish young from old donors.

      In the Methods section of our revised manuscript, we now provide the age range for our old donors being between 53 and 67 while our younger donor population ranged between 19 and 27 years of age. These changes are reflected in Page 6, line 128: “Human degenerated NP and AF tissues (Grade IV or V on Pfirrman grade; 64.6 ±8.5 years old)) were obtained as the surgical waste from donors with discogenic pain, with each donor providing written informed consent. Healthy NP and AF cells (23.0 ±3.7 years old) were gifted by Professor Lisbet Haglund from McGill University (Tissue Biobank #2019-4896).”

      The Reviewer wonders about the rationale for using different concentrations of PDGF-AB/BB in the degenerate cell and irradiation experiments.

      We apologize for our lack of clarity. We initially treated cells with different concentrations (20 and 40 ng/ml) of PDGF-AB/BB to first establish a dose-response. From our MTT and gene expression analyses we determined that 20ng/ml was sufficient to elicit significant changes in cell proliferation markers, including MKI67, CCNB1 and CCND1. Increasing the concentration to 40 ng/ml of either growth factor did not significantly influence these parameters. However, we felt that for our bulk RNA seq experiments, we may see better changes in signaling molecules under 40ng/ml of PDGF-AB since its effects on cell growth at this concentration were maximal while PDGF-BB was maintained at 20ng/ml based on its efficacy in our mitogenic response.

      The Reviewer asks that we consider describing the effects of PDGF-AB/BB as mitigating or therapeutic rather than protective both in the title and throughout the manuscript.

      We agree with the Reviewer’s recommendation, and we have now changed the title to “Therapeutic effects of PDGF-AB/BB against cellular senescence in human intervertebral disc”. Moreover, we implemented this change in the revised manuscript as requested.

      The Reviewer believes that changes in the NP are more clinically evident (by imaging methods), despite degeneration often initiating from the AF (annulus fibrosus), e,g. through tears/microtears and would like for us to reflect this in our revised manuscript.

      We agree with the Reviewer’s comment, and we thank them for this added accuracy. On this basis, we now corrected our language in the introduction by stating in Page 4, line 68 that: “To date, the main focus of IVD cell studies has been on the NP, as changes in the NP are easily detected through imaging techniques like MRI, making it the most visible indicator of disc degeneration in clinical practice. In addition, NP plays a crucial role in the progression of IVD degeneration due to its susceptibility to significant structural and functional changes during aging and degeneration.”

      The Reviewer points out a prior study which examined the effects of X-ray irradiation on NF-kB signaling in young and aged IVDs (PMCID: PMC5495460) suggesting that we include this reference in our revised manuscript.

      We thank the Reviewer for this suggestion, and we are now referencing this elegant study in the discussion section of our revised manuscript. Thus, in page 20, line 440 we state: “ In fact, it has been shown that NF-kB signaling was elevated in mouse IVDs exposed to a single 20 Gy dose of irradiation in an ex vivo culture model.”

      The Reviewer asks that our experimental methods are described in the order of the experimental workflow. For example, section 2.2 describes RNA sequencing, which is a terminal assay. Section 2.2 may be more appropriate for detailing the methods of PDGF-AB/BB treatment, along with the rationale.

      We thank the Reviewer for pointing this out and have reorganized the Methods section accordingly.

      Reviewer #2:

      The Reviewer requests more experimental details in the methodology including the rationale for such methods/conditions as well as specific culture models utilized, substrates, cell density, and media components.

      We apologize for our lack of clarity. We now revised the methods section based on the comments.

      The Reviewer asks about the quantitative data for b-galactosidase assay and immunofluorescence of senescence-associated proteins such as P21 and P16.

      We apologize for omitting this information. We now included the quantification of P21 and P16 positive cells, which is presented in the revised Figures 4. For b-galactosidase assay, we were unable to quantify the percentage of positive cells because we did not perform nuclei staining, making it difficult to accurately determine the total cell number. Instead, we provided representative images showing the full field of view at 10X magnification using Echo microscope.

      The Reviewer requests the protein level data of PDGFRA to determine if the transcripts are being translated to protein.

      We thank the Reviewer for this suggestion. The protein expression of PDGFRA has been included in the Supplementary Figure 2. We found that PDGFRA protein levels were decreased in both NP and AF cells in response to PDGF treatments. It is known that upon binding with PDGF ligands, PDGFRA undergoes rapid internalization and degradation, a mechanism that prevents overstimulation of the signaling pathway (doi: 10.1042/BST20200004). The upregulated gene expression probably attempting to compensate for this degradation and supports continued activation of PDGFRA signaling activation, emphasizing its crucial role in response to the PDGF treatment. Thus, we implemented it in the discussion section in page 22, line486:” Interestingly, while mRNA level was increased in PDGF treated NP cells, its protein level was decreased, highlighting the complexity in PDGF receptor dynamics. Upon binding with PDGF ligands, PDGFRA is known to undergo rapid internalization and degradation, a mechanism that prevents overstimulation of the signaling pathway (Rogers and Fantauzzo 2020). The upregulated gene expression probably attempting to compensate for this degradation and supports continued activation of PDGFRA signaling activation, emphasizing its crucial role in response to the PDGF treatment.”

      The Reviewer points out that our conclusion that “PDGF do not mediate their effects via the PDGFRA” is not supported by the current data asking that further discussion, interpretation, and direct comparison of the nucleus pulposus and annulus fibrosus data sets be presented to the readers.

      We thank the Reviewer for the insightful comment. In page 20, line 432, we have corrected our language to now state: “In contrast, while PDGF treatment alleviated the senescent phenotype in AF cells, it also induced changes in pathways such as response to mechanical stimuli and neurogenesis, which were distinct from those in NP cells. This indicates that the treatment enhanced IVD functionality through different mechanisms within the two compartments.”

      The Reviewer cannot appreciate the changes in S-phase between control and treated groups.

      We apologize for the poor quality of the figure in our initial submission. We analyzed the data in S phase and included them in our revised Figures 5C and 5F.

      The Reviewer believes that discectomies are typically not performed on patients with discogenic back pain but on patients who are undergoing surgery for a herniated disc.

      We agree with the Reviewer, and we corrected our language in the revised manuscript. In Page 6, line 128, we now stated: “Human degenerated NP and AF tissues (Grade IV or V on Pfirrman grade; 64.6 ±8.5 years old)) were obtained as the surgical waste from donors with disc herniation, with each donor providing written informed consent.”

      The Reviewer asks about the protein-protein interactions in AF cells.

      We thank the Reviewer for this suggestion, and we now included it in Figure 3.

      The Reviewer requests more details about the protocol and doses for the irradiation studies.

      In the revised manuscript, we added this information in page 10, line 204.

      The Reviewer asks whether the gene expression of PDGFRA was increased or decreased in irradiated cells compared to non-irradiated cells.

      The gene expression of PDGFRA was decreased in NP cells exposed to irradiation compared to non-irradiated cells. The data are shown in Figure 4 and their description in the text is in page 17, line 411.

    1. eLife Assessment

      This study presents important new insights linking obesity to kidney disease using a Drosophila model. A series of compelling experiments demonstrate that a high-fat diet induces excretion of a leptin-like JAK-STAT ligand from fat body, driving the adipose-nephrocyte axis through activated JAK-STAT signaling and subsequently causing a functional defect in nephrocytes. The approach using combination of genetic tools and pharmacological intervention is solid and confirms the mechanistic link, together with phenotypic analysis that further supports the authors conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      Zhao and colleagues employ Drosophila nephrocytes as a model to investigate the effects of a high-fat diet on these podocyte-like cells. Through a highly focused analysis, they initially confirm previous research in their hands demonstrating impaired nephrocyte function and move on to observe the mislocalization of a slit diaphragm-associated protein (pyd) and a knock-in into the locus of the Drosophila nephrin (sns). Employing another reporter construct, they identify activation of the JAK/STAT signaling pathway in nephrocytes. Subsequently, the authors demonstrate the involvement of this pathway in nephrocyte function from multiple angles, using a gain-of-function construct, silencing of an inhibitor, and ectopic overexpression of a ligand. Silencing the effector Stat92E via RNAi or inhibiting JAK/STAT with Methotrexate effectively restored impaired nephrocyte function and slit diaphragm architecture induced by a high-fat diet, while showing no impact under normal dietary conditions.

      Strengths:

      The findings establish a link between JAK/STAT activity and the impact of a high-fat diet on nephrocytes. This nicely underscores the importance of organ crosstalk for nephrocytes and supports a potential role for JAK/STAT in diabetic nephropathy, as previously suggested by other models.

      Weaknesses:

      While the analysis provides valuable insights, it appears somewhat over-reliant on tracer uptake in certain instances. Clinical inferences based on a Drosophila model should be interpreted with caution.

    3. Reviewer #2 (Public review):

      Summary:

      In their manuscript, Zhao et al. describe a link between JAK-STAT pathway activation in nephrocytes upon a high-fat diet. Nephrocytes are the homologs to mammalian podocytes, and it has been previously shown that metabolic syndrome and obesity is associated with worse outcomes for chronic kidney disease. A study from 2021 (Lubojemska et al.) could already confirm a severe nephrocyte phenotype upon feeding Drosophila a high fat diet and also linking lipid overflow by expressing adipose triglyceride lipase in the fat body to nephrocyte dysfunction. In this study, the authors identified a second pathway and mechanism, how lipid dysregulation impact on nephrocyte function. In detail, they show an activation of JAK-STAT signaling in nephrocytes upon feeding a high-fat diet, which was induced by Upd2 expression (a leptin-like hormone) in the fat body, the adipose tissue in Drosophila. Further, they could show genetic and pharmacological interventions can reduce JAK-STAT activation and thereby prevent the nephrocyte phenotype in the high-fat diet model.

      Strengths:

      The strength of this study is the combination of genetic tools and pharmacological intervention to confirm a mechanistic link between the fat body/adipose tissue and nephrocytes. Inter-organ communication is crucial in the development of several diseases, but the underlying mechanisms are only poorly understood. Using Drosophila, it is possible to investigate several players of one pathway, here JAK-STAT. This was done, by investigating the functional role of Hop, Socs36E and Stat92E in nephrocytes and has also been combined with feeding a high-fat diet, to assess restoration of nephrocyte morphology and function by inhibiting JAK-STAT signaling. Adding a translational approach was done by inhibiting JAK-STAT signaling with methotrexate, which also resulted in attenuated nephrocyte dysfunction. Expression of the leptin-like hormone upd2 in the fat body is a good approach to study inter-organ communication and the impact of other organs/tissue on nephrocyte function and expands their findings from nephrocyte function towards whole animal physiology.

      Weaknesses:

      Although the general findings of this study are of great interest, the number of flies investigated for the majority of the experiments is very low (6 flies). Also it is not clear whether the 6 flies used are from independent experiments to exclude differences in food/diet.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhao and colleagues employ Drosophila nephrocytes as a model to investigate the effects of a high-fat diet on these podocyte-like cells. Through a highly focused analysis, they initially confirm previous research in their hands demonstrating impaired nephrocyte function and move on to observe the mislocalization of a slit diaphragmassociated protein (pyd). Employing a reporter construct, they identify the activation of the JAK/STAT signaling pathway in nephrocytes. Subsequently, the authors demonstrate the involvement of this pathway in nephrocyte function from multiple angles, using a gain-of-function construct, silencing of an inhibitor, and ectopic overexpression of a ligand. Silencing the effector Stat92E via RNAi or inhibiting JAK/ STAT with Methotrexate effectively restored impaired nephrocyte function induced by a high-fat diet, while showing no impact under normal dietary conditions.

      Strengths:

      The findings establish a link between JAK/STAT activity and the impact of a high-fat diet on nephrocytes. This nicely underscores the importance of organ crosstalk for nephrocytes and supports a potential role for JAK/STAT in diabetic nephropathy, as previously suggested by other models.

      Weaknesses:

      The analysis is overly reliant on tracer endocytosis and single lines. Immunofluorescence of slit diaphragm proteins would provide a more specific assessment of the phenotypes.

      We thank the reviewer for the positive comments and pointing out that slit diaphragm markers would provide a more specific assessment of the phenotypes. In our revised manuscript, we used Sns-mRuby3, in which mRuby3 was tagged endogenously at the C-terminal of Sns (PMID: 39195240 and PMID: 39431457), to show the slit diaphragm pattern.

      Reviewer #2 (Public Review):

      Summary:

      In their manuscript, Zhao et al. describe a link between JAK-STAT pathway activation in nephrocytes on a high-fat diet. Nephrocytes are the homologs to mammalian podocytes and it has been previously shown, that metabolic syndrome and obesity are associated with worse outcomes for chronic kidney disease. A study from 2021 (Lubojemska et al.) could already confirm a severe nephrocyte phenotype upon feeding Drosophila a high-fat diet and also linking lipid overflow by expressing adipose triglyceride lipase in the fat body to nephrocyte dysfunction. In this study, the authors identified a second pathway and mechanism, how lipid dysregulation impact on nephrocyte function. In detail, they show activation of JAK-STAT signaling in nephrocytes upon feeding them a high-fat diet, which was induced by Upd2 expression (a leptin-like hormone) in the fat body, and the adipose tissue in Drosophila. Further, they could show genetic and pharmacological interventions can reduce JAK-STAT activation and thereby prevent the nephrocyte phenotype in the high-fat diet model.

      Strengths:

      The strength of this study is the combination of genetic tools and pharmacological intervention to confirm a mechanistic link between the fat body/adipose tissue and nephrocytes. Inter-organ communication is crucial in the development of several diseases, but the underlying mechanisms are only poorly understood. Using Drosophila, it is possible to investigate several players of one pathway, here JAK-STAT. This was done, by investigating the functional role of Hop, Socs36E, and Stat92E in nephrocytes and has also been combined with feeding a high-fat diet, to assess restoration of nephrocyte function by inhibiting JAK-STAT signaling. Adding a translational approach was done by inhibiting JAK-STAT signaling with methotrexate, which also resulted in attenuated nephrocyte dysfunction. Expression of the leptin-like hormone upd2 in the fat body is a good approach to studying inter-organ communication and the impact of other organs/tissue on nephrocyte function and expands their findings from nephrocyte function towards whole animal physiology.

      Weaknesses:

      Although the general findings of this study are of great interest, there are some weaknesses in the study, which should be addressed. Overall, the number of flies investigated for the majority of the experiments is very low (6 flies) and it is not clear whether the flies used, are from independent experiments to exclude problems with food/diet. For the analysis, the mean values of flies should be calculated, as one fly can be considered a biological replicate, but not all individual cells. By increasing the number of flies investigated, statistical analysis will become more solid. In addition, the morphological assessment is rather preliminary, by only using a Pyd antibody. Duf or Sns should be visualized as well, also the investigation of the different transgenic fly strains studying the importance of JAK-STAT signaling in nephrocytes needs to include a morphological assessment. Moreover, the expected effect of feeding a high-fat diet on nephrocytes needs to be shown (e.g. by lipid droplet formation) and whether upd2 is actually increased here should also be assessed. The time points of assessment vary between 1, 3, and 7 days and should be consistent throughout the study or the authors should describe why they use different time points.

      We thank the reviewer for the comments and suggestions. HFD causes enlarged crop (Liao et al, 2021, PMID: 33171202) and accumulation of lipid droplets in the intestine. To exclude the problems with different batches of food/diet, we checked crop and the intestine during the sample preparation as indications of food consistency.

      We followed the suggestion to take the mean values of flies in the data analysis, one was considered a biological replicate in the revised version. We added in another slit diaphragm protein reporter Sns-mRuby3, in which mRuby3 fluorescent protein was tagged at the C-terminal of endogenous Sns. This reporter was used to show the effect of HFD on slit diaphragm protein, manipulation of Jak/Stat pathway (ppl-Gal4>upd2 and dot-Gal4>UAS-Stat92E-RNAi), and drug treatment.

      Lubojemska et al 2021 (PMID: 33945525) showed that HFD leads to lipid droplet accumulation in larval nephrocytes. Following the reviewer’s suggestion, we stained the adult nephrocytes with Nile red and found lipid droplet formation caused by HFD, verifying the HFD effects on lipid droplet accumulation.

      Regarding the timepoints, the newly eclosed flies (1-day old) were treated for 7 days (transferred to fresh diet or shifted from 18 to 29 °C for 7 days to induce target gene expression). Thus, the flies were 7 days old. In the revised manuscript, we changed “1-day-old females” to “7-day-old females” in the figure legend. The exception was Figure 4 panel G and H, we used Day 3 for the UAS-hop.Tum overexpression in the flp-out clones, which is different from the HFD approach (Day 7). This is because Hop.Tum is a strong gain of function mutation. UAS-hop.Tum overexpression in the eye imaginal disc leads to apoptosis via up-regulating a proapoptotic gene hid (Bhawana Maurya et al, 2021, PMID: 33824299). Thus, we used Day 3 for this experiment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are relevant issues, that should be addressed:

      Major:

      - The analysis of JAK/STAT signaling in nephrocytes is limited to nephrocyte function, despite the nice slit diaphragm phenotype shown in Figure 2A. What happens to the slit diaphragm in the other genotypes, the rescue settings in particular? Immunofluorescence of Pyd should be explored for all conditions to evaluate proper phenocopy. Tracer endocytosis is much less specific.

      We thank the reviewer for the suggestion. We made a transgenic line Sns-mRuby3, in which mRuby3 was tagged to the endogenous Sns C-terminal. It has been used as a slit diaphragm reporter (PMID: 39195240 and PMID: 39431457). Apart from the tracer assays, we used Sns-mRuby3 reporter and/or Pyd staining to visualize the changes in slit-diaphragm structures.

      - The interventions are restricted to single RNAi lines and reporters, raising concerns about specificity/potential off-targets. Additional lines should be tested for verification.

      Different versions of RNAi lines are available for targeting fly genes. For UAS-Socs36E-RNAi, we chose the one that was generated with a short hairpin, which is known to restrict the off-target effects (Ni et al, 2011, PMID: 21460824). For UAS-Stat92E-RNAi, we added in an independent RNAi line (Figure 6 - figure supplement 1 and 2).   

      Minor:

      - In Figure 2C, the image of HFD shows a section that cuts through the surface at a shallower angle, making everything appear blurry. This image should be replaced.

      We replaced Figure 2C (the image of HFD) with another one.

      - What is the relevance (if any) of reduced electrodense vacuoles with a high-fat diet? An effect on endocytic trafficking/endosome architecture remains unexplored.

      Lubojemska et al (PMID: 33945525) studied the endocytic trafficking/endosome architecture of the larval nephrocytes and found that HFD impaired the endocytosis. We studied the adult pericardial nephrocytes. It is very likely that the endocytic trafficking/endosome architecture is affected by HFD in the adult nephrocytes.  

      - How do the findings presented in this manuscript correlate with a similar study by Lubojemska et al.? At least the discussion should provide more evaluation of this aspect.

      Lubojemska et al (PMID: 33945525) assayed the larval nephrocytes and found that a HFD leads to the ectopic accumulation of lipid droplets in the nephrocytes and decreased endocytosis. They further demonstrated that lipid droplet lipolysis and PGC1α counteracts the harmful effects of a HFD. We performed Nile red staining and verified the accumulation of lipid droplets in the adult pericardial nephrocytes upon HFD feeding, which agrees with Lubojemska discovery. We found that a HFD activates Jak/Stat pathway, which mediates the nephrocyte functional defects. A previous study showed that Stat1 has an inhibitory effect on PGC1α transcription (PMID: 26689548). Further study is needed to investigate the interaction between Jak/Stat pathway and PGC1α transcription. We added the information to the discussion.

      - Please check spelling and grammar.

      Reviewer #2 (Recommendations For The Authors):

      (1) Which cells are investigated? Please state.

      Pericardial nephrocytes were used in this study. The information was added to the result parts.

      (2) Rephrase 'chronic kidney disease model'. Feeding for 7 days and assessment after 7 days cannot be considered chronic as flies can live more than 60 days.

      Lubojemska et al (PMID: 33945525) fed the newly hatched larvae with a HFD and used the third instar larvae for the experiments. The term “chronic kidney disease” has been used in the reference PMID: 33945525. It takes about 4 days for fly larvae to develop from the first instar to the third instar. Thus, the animals were fed on the HFD for only 4 days. In this regard, feeding for seven days might be considered as chronic.

      (3) Line 89: Curran et al., 2014). with risk increasing risk as BMI increases (Hsu et al., 2006). Please correct this sentence.

      We thank the reviewer for finding the error. In the revised version, the sentence was changed as “with increasing risk as BMI increases (Hsu et al., 2006)”.

      (4) Figure 1: The authors should explain why they use FITC-Albumin and 10kDA dextran, what are the differences, and why are both used?

      The tracers are different in size (70kD FITC-Albumin and 10kDA dextran). Both FITC-Albumin and 10kDA dextran have been used in previous publications (Zhao et al 2024, PMID: 39431457 and Weavers et al 2009, PMID: 18971929) to show that the nephrocytes can efficiently take up the tracers of different sizes.

      (5) Figure 3: The JAK-STAT sensor was used on Day 1 to confirm activation of JAKSTAT signaling, which means a very fast response towards the HFD after 24hrs. How is the activation after 7 days? The nephrocyte assessment in Figures 1 and 2 is done at the later time point, how about earlier time points in HFD? One would expect an earlier phenotype as well if JAK-STAT signaling is causative.

      In Figure 3C, newly eclosed flies (1-day old) were fed on a control diet or a HFD for 7 days. Thus, in the legend it shall be “7-day-old females”. Sorry for misleading. The caption was updated as “7-day-old females”.

      (6) Figure 4H: I don't understand how many cells or flies are depicted and analysed? Are the dots one nephrocyte from 4 flies? If yes, the numbers need to be increased.

      In figure 4H, we quantified 5 UAS-hop.Tum clones and 5 neighbor cells. We only found 5 clones from 4 flies. We didn’t quantify all the nephrocytes, since we compared the clone with its neighbor cell. To make it easier to follow, we changed the description as “n= 5 clones and 5 neighbor cells”.

      (7) Figure 4: Why are flies investigated at different ages? Day 1 vs Day 3? This should be consistent with the HFD approach and day 7. Or investigate the HFD at earlier time points as well.

      In Figure 4, the newly eclosed flies (1-day old) were shifted from 18 to 29 °C for 7 days to induce target gene expression. Thus, the flies were 7-day old. In the revised manuscript, we changed “1-day-old females” to “7-day-old females” in the figure legend. We used Day 3 for the UAS-hop.Tum overexpression in the flp-out clones, which is different from the HFD approach (Day 7). This is because Hop.Tum is a strong gain of function mutation. UAS-hop.Tum overexpression in the eye imaginal disc leads to apoptosis via up-regulating a proapoptotic gene hid (Bhawana Maurya et al, 2021, PMID: 33824299). Thus, we used Day 3 for this experiment.

      (8) Figure 5: Do the authors see upd2-GFP in the nephrocyte or at the nephrocyte? Is upd2 filtered to bind the JAK-STAT-receptor? They should show this, which is easy to do due to the GFP label.

      We thank the reviewer for the suggestion. We looked into the nephrocyte from ppl-Gal4>upd2-GFP flies and found Upd2-GFP in the nephrocytes. We further showed that ppl-Gal4 was not expressed in the nephrocytes, suggesting that Upd2-GFP is secreted from the fat body and transported to the nephrocytes. We stained the nephrocytes for Pyd and found compromised fingerprint pattern caused by Upd2-GFP expression in the fat body. The data was added to Figure 5 - figure supplement 1.

      (9) Figure 5: What are the upd2 levels after day 1 and compared to HFD at day 7? In the Rajan et al manuscript, upd2 levels have been assessed by qPCR, this can be done here as well. Although there is a mechanistic link shown here, I think it would be interesting to test the upd2 levels at the different time points assessed.

      In the Rajan et al manuscript, they showed that the expression of upd2 was up regulated by HFD. My previous work showed that HFD changes taste perception. We performed qPCR to determine the expression of upd2 and verified that upd2 was upregulated in HFD fed flies (Yunpo Zhao et al. 2023. PMID: 37934669). We included the reference in the revised version.

      (10) Figure 6: Does a Socs36E overexpression e.g. with the Bloomington strain 91352 also rescue the HFD phenotype, by blocking JAK-STAT signaling?

      We thank the reviewer for the suggestion. We tested the effect of Socs36E overexpression and observed that UAS-Socs36E can partially rescue HFD caused nephrocyte functional decline. The data was not included in the revised manuscript. Notably, apart from having an inhibitory effect on the Jak/Stat, Socs36E represses MAPK pathway (Amoyel et al, 2016, PMID: 26807580).    

      (11) Figure 7: What is the control for the methotrexate treatment? What is the solvent?

      We used DMSO as the solvent for methotrexate and used it as the control for the methotrexate treatment. We added the following sentences to the method parts, “Methotrexate (06563, Sigma-Aldrich, MO) was dissolved in DMSO to make a 10mM stock solution”, and “The samples incubated in Schneider’s Medium supplemented with DMSO vehicle were used a control”.

      (12) Why did the authors use Dot-Gal4 for the Socs36E knockdown and Dot-Gal4ts for the Stat92E knockdown?

      We used Dot-Gal4ts and temperature shifting to restrict the Stat92E knockdown at adult stages.

      (13) Supplementary Figure 1: Please add the individual data to the figure as done for all other figures.

      We thank the reviewer for this comment. The figure individual data was added according to the suggestion.

    1. eLife Assessment

      Using microscopy experiments and theoretical modelling, the authors present convincing evidence of cellular coordination in the gliding filamentous cyanobacterium Fluctiforma draycotensis. The results are fundamental for the understanding of cyanobacterial motility and the underlying molecular and mechanical pathways of cellular coordination.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use microscopy experiments to track the gliding motion of filaments of the cyanobacteria Fluctiforma draycotensis. They find that filament motion consists of back and forth trajectories along a "track", interspersed with reversals of movement direction, with no clear dependence between filament speed and length. It is also observed that longer filaments can buckle and form plectonemes. A computational model is used to rationalize these findings.

      Strengths:

      Much work in this field focuses on molecular mechanisms of motility; by tracking filament dynamics this work helps to connect molecular mechanisms to environmentally and industrially relevant ecological behavior such as aggregate formation.

      The observation that filaments move on tracks is interesting and potentially ecologically signifiant.

      The observation of rotating membrane-bound protein complexes and tubular arrangement of slime around the filament provide important clues to the mechanism of motion.

      The observation that long filaments buckle has potential to shed light on the nature of mechanical forces in the filaments, e.g. through study of the length dependence of buckling.

      The comparison between motility on agar and on glass is interesting since it shows that filaments have both intrinsic propensity to reverse (that is seen on glass) and mechanically triggered reversal (that is seen on agar when the filament reaches the end of a track).

      Weaknesses:

      The manuscript makes the interesting statement that the distribution of speed vs filament length is uniform, which would constrain the possibilities for mechanical coupling between the filaments. However Fig 2C does not show a uniform distribution but rather an apparent lack of correlation between speed and filament length, although the statistical degree of correlation is not given. In my view, Fig 2C should not be described as a uniform distribution since mathematically that means something very different than what is shown here. Instead the figure should be described quantitatively with the use of a measured correlation coefficient. This also applies to Fig. S3A.

      The statement "since filament speed results from a balance between propulsive forces and drag, these observations of no or positive correlation between filament speed and length show that all (or a fixed proportion of) cells in a filament contribute to propulsive force generation" helps to clarify the important link between Fig 2C and the concept that all cells contribute, but I think this statement is not obvious for many readers, and could be made clearer, e.g. by the use of a simple mathematical model for a chain of bacterial that accounts for drag forces and propulsion forces for each bacterium.

      The authors have now clarified that the computational model is 1D and cannot explain the coupling between rotation, slime generation and motion. I find it encouraging and important that model predictions for the dwell time distributions (Fig S12 and S13) are similar to experimental measurements, but I think it would be better to put these results in the main text, together also with Fig S4. If these important results are in the supplement it is harder for the reader to assess the match between model and experiments.

      Filament buckling is not analysed in quantitative detail, but the authors have now clarified that this will be the topic of future work with a 2D or 3D computational model.

    3. Reviewer #2 (Public review):

      Summary:

      The authors combined time-lapse microscopy with biophysical modeling to study the mechanisms and timescales of gliding and reversals in filamentous cyanobacterium Fluctiforma draycotensis. They observed the highly coordinated behavior of protein complexes moving in a helical fashion on cells' surfaces and along individual filaments as well as their de-coordination, which induces buckling in long filaments.

      Strengths:

      The authors provided concrete experimental evidence of cellular coordination and de-coordination of motility between cells along individual filaments. The evidence is comprised of individual trajectories of filaments that glide and reverse on surfaces as well as the helical trajectories of membrane-bound protein complexes that move on individual filaments and are implicated in generating propulsive forces.

      Limitations:

      The biophysical model is one-dimensional and thus does not capture the buckling observed in long filaments. I expect that the buckling contains useful information since it reflects the competition between bending rigidity, the speed at which cell synchronization occurs, and the strength of the propulsion forces.

      Future directions:

      The study highlights the need to identify molecular and mechanical signaling pathways of cellular coordination. In analogy to the many works on the mechanisms and functions of multi-ciliary coordination, elucidating coordination in cyanobacteria may reveal a variety of dynamic strategies in different filamentous cyanobacteria.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present new observations related to the gliding motility of the multicellular filamentous cyanobacteria Fluctiforma draycotensis. The bacteria move forward by rotating their about their long axis, which causes points on the cell surface to move along helical paths. As filaments glide forward they form visible tracks. Filaments preferentially move within the tracks. The authors device a simple model in which each cell in a filament exerts a force that either pushes forwards or backwards. Mechanical interactions between cells cause neighboring cells to align the forces they exert. The model qualitatively reproduces the tendency of filaments to move in a concerted direction and reverse at the end of tracks.

      The authors seek to understand how cells in a filament synchronize their motion to move in a concerted direction. This question connects to the evolution of multicellular life and so is important well beyond the specific field of cyanobacterial locomotion.

      Strengths:

      The biophysical model used to describe cell-cell coordination of locomotion is clear and reasonable. This model provides a useful phenomenological framework in which to consider the roles of individual cells in the coordinated motion of the group. The qualitative consistency between theory and observation suggests that this model captures some essential qualities of the true system.

      The observation that filaments reverse at the ends of tracks is compelling, but difficult to clearly connect to any one microscopic model.

      The observations of helical motion of the filament are compelling.

      Weaknesses:

      The comparison of theory and observation is mainly qualitative. While the authors have done a good job fitting the observations to the theory, it is not possible to systematically vary parameters, independently estimate parameter values, or apply external forces. Consequently, more experiments are needed before the proposed model can the accepted or rejected. This manuscript provides a promising hypothesis but not a compelling justification for it.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors use microscopy experiments to track the gliding motion of filaments of the cyanobacteria Fluctiforma draycotensis. They find that filament motion consists of back-and-forth trajectories along a "track", interspersed with reversals of movement direction, with no clear dependence between filament speed and length. It is also observed that longer filaments can buckle and form plectonemes. A computational model is used to rationalise these findings.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      Much work in this field focuses on molecular mechanisms of motility; by tracking filament dynamics this work helps to connect molecular mechanisms to environmentally and industrially relevant ecological behavior such as aggregate formation.

      The observation that filaments move on tracks is interesting and potentially ecologically significant.

      The observation of rotating membrane-bound protein complexes and tubular arrangement of slime around the filament provides important clues to the mechanism of motion.

      The observation that long filaments buckle has the potential to shed light on the nature of mechanical forces in the filaments, e.g. through the study of the length dependence of buckling.

      We thank the reviewer for listing these positive aspects of the presented work.

      Weaknesses:

      The manuscript makes the interesting statement that the distribution of speed vs filament length is uniform, which would constrain the possibilities for mechanical coupling between the filaments. However, Figure 1C does not show a uniform distribution but rather an apparent lack of correlation between speed and filament length, while Figure S3 shows a dependence that is clearly increasing with filament length. Also, although it is claimed that the computational model reproduces the key features of the experiments, no data is shown for the dependence of speed on filament length in the computational model. The statement that is made about the model "all or most cells contribute to propulsive force generation, as seen from a uniform distribution of mean speed across different filament lengths", seems to be contradictory, since if each cell contributes to the force one might expect that speed would increase with filament length.

      We agree that the data shows in general a lack of correlation, rather than strictly being uniform. In the revised manuscript, we intend to collect more data from observations on glass to better understand the relation between filament length and speed.

      In considering longer filaments, one also needs to consider the increased drag created by each additional cell - in other words, overall friction will either increase or be constant as filament length increases. Therefore, if only one cell (or few cells) are generating motility forces, then adding more cells in longer filaments would decrease speed.

      Since the current data does not show any decrease in speed with increasing filament length, we stand by the argument that the data supports that all (or most) cells in a filament are involved in force generation for motility. We would revise the manuscript to make this point - and our arguments about assuming multiple / most cells in a filament contributing to motility - clear.

      The computational model misses perhaps the most interesting aspect of the experimental results which is the coupling between rotation, slime generation, and motion. While the dependence of synchronization and reversal efficiency on internal model parameters are explored (Figure 2D), these model parameters cannot be connected with biological reality. The model predictions seem somewhat simplistic: that less coupling leads to more erratic reversal and that the number of reversals matches the expected number (which appears to be simply consistent with a filament moving backwards and forwards on a track at constant speed).

      We agree that the coupling between rotation, slime generation and motion is interesting and important when studying the specific mechanism leading to filament motion. However, we believe it is even more fundamental to consider the intercellular coordination that is needed to realise this motion. Individual filaments are a collection of independent cells. This raises the question of how they can coordinate their thrust generation in such a way that the whole filament can both move and reverse direction of motion as a single unit. With the presented model, we want to start addressing precisely this point.

      The model allows us to qualitatively understand the relation between coupling strength and reversals (erratic vs. coordinated motion of the filament). It also provides a hint about the possibility of de-coordination, which we then look for and identify in longer filaments.

      While the model’s results seem obvious in hindsight, the analysis of the model allows phrasing the question of cell-to-cell coordination, which so far has not been brought up when considering the inherently multi-cell process of filament motility.

      Filament buckling is not analysed in quantitative detail, which seems to be a missed opportunity to connect with the computational model, eg by predicting the length dependence of buckling.

      Please note that Figure S10 provides an analysis of filament length and number of buckling instances observed. This suggests that buckling happens only in filaments above a certain length.

      We do agree that further analyses of buckling - both experimentally and through modelling would be interesting. This study, however, focussed on cell-to-cell coupling / coordination during filament motility. We have identified the possibility of de-coordination through the use of a simple 1D model of motion, and found evidence of such de-coordination in experiments. Notice that the buckling we report does not depend on the filament hitting an external object. It is a direct result of a filament activity which, in this context, serves as evidence of cellular de-coordination.

      Now that we have observed buckling and plectoneme formation, these processes need to be analysed with additional experiments and modelling. The appropriate model for this process needs to be 3D, and should ideally include torques arising from filament rotation. Experimentally, we need to identify means of influencing filament length and motion and see if we can measure buckling frequency and position across different filament lengths. These works are ongoing and will have to be summarised in a separate, future publication.

      Reviewer #2 (Public review):

      Summary:

      The authors combined time-lapse microscopy with biophysical modeling to study the mechanisms and timescales of gliding and reversals in filamentous cyanobacterium Fluctiforma draycotensis. They observed the highly coordinated behavior of protein complexes moving in a helical fashion on cells' surfaces and along individual filaments as well as their de-coordination, which induces buckling in long filaments.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The authors provided concrete experimental evidence of cellular coordination and de-coordination of motility between cells along individual filaments. The evidence is comprised of individual trajectories of filaments that glide and reverse on surfaces as well as the helical trajectories of membrane-bound protein complexes that move on individual filaments and are implicated in generating propulsive forces.

      We thank the reviewer for listing these positive aspects of the presented work.

      Limitations:

      The biophysical model is one-dimensional and thus does not capture the buckling observed in long filaments. I expect that the buckling contains useful information since it reflects the competition between bending rigidity, the speed at which cell synchronization occurs, and the strength of the propulsion forces.

      Cell-to-cell coordination is a more fundamental phenomenon than the buckling and twisting of longer filaments, in that the latter is a consequence of limits of the former. In this sense, we are focussing here on something that we think is the necessary first step to understand filament gliding. The 3D motion of filaments (bending, plectoneme formation) is fascinating and can have important consequences for collective behaviour and macroscopic structure formation. As a consequence of cellular coupling, however, it is beyond the scope of the present paper.

      Please also see our response above. We believe that the detailed analysis of buckling and plectoneme formation requires (and merits) dedicated experiments and modelling which go beyond the focus of the current study (on cellular coordination) and will constitute a separate analysis that stands on its own. We are currently working in that direction.

      Future directions:

      The study highlights the need to identify molecular and mechanical signaling pathways of cellular coordination. In analogy to the many works on the mechanisms and functions of multi-ciliary coordination, elucidating coordination in cyanobacteria may reveal a variety of dynamic strategies in different filamentous cyanobacteria.

      We thank the reviewer for highlighting this point again and seeing the value in combining molecular and dynamical approaches.

      Reviewer #3 (Public review):

      Summary:

      The authors present new observations related to the gliding motility of the multicellular filamentous cyanobacteria Fluctiforma draycotensis. The bacteria move forward by rotating their about their long axis, which causes points on the cell surface to move along helical paths. As filaments glide forward they form visible tracks. Filaments preferentially move within the tracks. The authors devise a simple model in which each cell in a filament exerts a force that either pushes forward or backwards. Mechanical interactions between cells cause neighboring cells to align the forces they exert. The model qualitatively reproduces the tendency of filaments to move in a concerted direction and reverse at the end of tracks.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The observations of the helical motion of the filament are compelling. The biophysical model used to describe cell-cell coordination of locomotion is clear and reasonable. The qualitative consistency between theory and observation suggests that this model captures some essential qualities of the true system.

      The authors suggest that molecular studies should be directly coupled to the analysis and modeling of motion. I agree.

      We thank the reviewer for listing these positive aspects of the presented work and highlighting the need for combining molecular and biophysical approaches.

      Weaknesses:

      There is very little quantitative comparison between theory and experiment. It seems plausible that mechanisms other than mechano-sensing could lead to equations similar to those in the proposed model. As there is no comparison of model parameters to measurements or similar experiments, it is not certain that the mechanisms proposed here are an accurate description of reality. Rather the model appears to be a promising hypothesis.

      We agree with the referee that the model we put forward is one of several possible. We note, however, that the assumption of mechanosensing by each cell - as done in this model - results in capturing both the alignment of cells within a filament (with some flexibility) and reversal dynamics. We have explored an even more minimal 1D model, where the cell’s direction of force generation is treated as an Ising-like spin and coupled between nearest neighbours (without assuming any specific physico-chemical basis). We found that this model was not fully able to capture both phenomena. In that model, we found that alignment required high levels of coupling (which is hard to justify except for mechanical coupling) and reversals were not readily explainable (and required additional assumptions). These points led us to the current, mechanically motivated model.

      The parameterisation of the current model would require measuring cellular forces. To this end, a recent study has attempted to measure some of the physical parameters in a different filamentous cyanobacteria [1] and in our revision we will re-evaluate model parameters and dynamics in light of that study. We will also attempt to directly verify the presence of mechano-sensing by obstructing the movement of filaments.

      Summary from the Reviewing Editor:

      The authors present a simple one-dimensional biophysical model to describe the gliding motion and the observed statistics of trajectory reversals. However, the model does not capture some important experimental findings, such as the buckling occurring in long filaments, and the coupling between rotation, slime generation, and motion. More effort is recommended to integrate the information gathered on these different aspects to provide a more unified understanding of filament motility. In particular, the referees suggest performing a more quantitative analysis of the buckling in long filaments. Finally, it is also recommended to discuss the results in the context of previous literature, in order to better explain their relevance. Please find below the detailed individual recommendations of the three reviewers.

      We thank the editor for this accurate summary of the presented work and for highlighting the key points raised by the reviewers. We have provided below point-by-point replies to these.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The relevance of the study organism Fluctiforma draycotensis is not clearly explained, and the results are not discussed in the context of previous literature. The motivation would be clearer if the manuscript explained why this model organism was chosen and how the results compare with those previously observed for this or other organisms.

      We have extended the introduction and discussion sections to make it clearer why we have worked with this organism and how the findings from this work relate to previous ones. In brief, Flucitforma draycotensis is a useful organism to work with as it not only displays significant motility but it also displays intriguing collective behaviour at different scales. Previous works on gliding motility in filamentous cyanobacteria have mostly focussed on the model organism Nostoc punctiforme, which only displays motility after differentiation into hormogonia [1]. There have also been studies in a range of different filamentous species, including those of the non-monophyletic genus, Phormidium, but these studies mostly looked at effects of genetic deletions on motility [2] or utilised electron microscopy to identify proteins (or surface features) involved in motility [3-5]. It must be noted that motility is also described and studied in non-filamentous cyanobacteria, but the dynamics of motion and molecular mechanisms there are different to filamentous cyanobacteria [6,7]. These previous studies are now cited / summarised in the revised introduction and discussion sections.

      The inferred tracks, probably associated with secreted slime, play a key role since it is supposed that the tracks provide the external force that keeps the filaments straight. Movie S3, in phase contrast, provides convincing evidence for the tracks, but they cannot be seen in the fluorescence images presented in the main text. Clearer evidence of them should be shown in the main text. An especially important aspect of the tracks is where they start and end since the computational model assumes that reversal happens due to forces generated by reaching the end of a track. Therefore it seems important to comment on what produces the tracks, to check whether reversals actually happen at the end of a track, etc. Perhaps tracks could be strained with Concanavalin-A?

      To confirm that reversals happen on track ends, we have now performed an analysis on agar, where we can see tracks on phase microscopy. This analysis confirms that, on agar, reversals indeed happen on track ends. We added this analysis, along with images showing tracks clearly as a new Fig in the main text (see new Fig. 1).

      Further confirming the reversal at track ends, we note that filaments on circular tracks do not not reverse over durations longer than the ‘expected reversal interval’ of a filament on a straight track (see details in response to Reviewer 2).

      Regarding what produces the tracks on agar, we are still analysing this using different methods and these results will be part of a future study. Fluorescent staining can be used to visualise slime tubes using TIRF microscopy, as shown in Fig. S8, however, visualising tracks on agar using low magnification microscopy has been difficult due to background fluorescence from agar.

      We would also like to clarify that the model does not incorporate any assumptions regarding the track-filament interaction, other than that the track ends behave akin to a physical boundary for the filament. The observed reversal at track ends and “what” produces the track are distinct aspects of filament motion. We do not think that the model’s assumption of filament reversal at the end of the track requires understanding of the mechanism of slime production.

      Reviewer #3 (Recommendations for the authors):

      The manuscript combines three distinct topics: (1) the difference in locomotion on glass vs agar, (2) the development of a biophysical model, and (3) the helical motion of filament. It is not clear what insight one can gain from any one of these topics about the two others. The manuscript would be strengthened by more clearly connecting these three aspects of the work. A stronger comparison of theory to observation would be very useful. Some suggestions:

      (1) The observation that it is only the longest filaments that buckle is interesting. It should be possible to predict the critical length from the biophysical model. Doing so could allow fits of some model parameters.

      (2) What model parameters change between glass and agar? Can you explain these qualitative differences in motility by changing one model parameter?

      (3) Is it possible to exert a force on one end of a filament to see if it is really mechano-sensing that couples their motion?

      We thank the reviewer for this comment and agree with them that a better connection between model and experiment should be sought. We believe that the new analyses, presented below in response to the 2nd suggestion of the reviewer, provide such a connection in the context of reversal frequency. As stated below, we think that the 1st suggestion falls outside of the scope of the current work, but should form the basis of a future study.

      Regarding suggestion (1) - addressing buckling:

      We agree with the reviewer that using a model to predict a critical buckling length would be useful. We note, however, that the presented study focussed on cell-to-cell coupling / coordination during filament motility using a 1D, beadchain model. The buckling observations served, in this context, as evidence of cellular de-coordination. Now that we have observed buckling (and plectoneme formation), these processes need to be analysed with further experiments and modelling. The appropriate model for studying buckling would have to be at least 2D (ideally 3D) and consider elastic forces and torques relating to filament bending, rotation, and twisting. Experimentally, we need to identify means of influencing filament length and motion and undertake further measurements of buckling frequency and position across different filament lengths. These investigations are ongoing and will be summarised in a separate, future publication.

      Regarding suggestion (2) - addressing differences in motility on agar vs. glass:

      We believe that the two key differences between agar and glass experiments are the occasional detachment of filaments from substrate on glass and the lack of confining tracks on glass. These differences might arise from the interactions between the filament, the slime, and the surface. As both slime and agar contain polysaccharides, the slime-agar interaction can be expected to be different from the slime-glass interaction. Additionally, in the agar experiments, the filaments are confined between the agar and a glass slide, while they are not confined on the glass, leaving them free to lift up from the glass surface. We expect these factors to alter reversal frequency between the two conditions. To explore this possibility, we have now extended the analysis of experimental data from glass and present that (see details below):

      (i) dwell times are similar between agar and glass, and

      (ii) reversal frequency distribution is different between glass and agar, and remains constant across filament length on glass.

      We were able to explore these experimental findings with new model simulations, by removing the assumption of an “external bounding frame”. We then analysed reversal frequency within against model parameters, as detailed below.

      “The movement of the filaments on glass. We have extended our analysis of motility on glass resulting in the following noted features. Firstly, the median speed shows a weak positive correlation with filament length on glass (see original Fig S3B vs. updated Fig. S3A). This is slightly different to agar, where we do not observe any strong correlation in either direction (see original, Fig. 1 vs. updated Fig 2). Both the cases of positive, and no correlation, support our original hypothesis that the propulsion force is generated by multiple cells within the filament.

      Secondly, the filaments on glass display ‘stopping’ events that are not followed by a reversal, but are instead followed by a continuation in the original direction of motion, which we term ‘stop-go’ events, in contrast to the reversals. The dwell times associated with reversals and ‘stop-go’ events are similarly distributed (see original Fig S3A vs. updated Fig S3B). Furthermore, the dwell time distributions are similar between agar and glass (compare old Fig. 1C vs. new Fig 2C and new Fig. S3B). This suggests that the reversal process is the same on both agar and glass.

      Thirdly, we find that the frequencies of both reversal and stop-go events on glass are uncorrelated with the filament length (see new Fig. S4A) and there are approximately twice as many reversals as stop-go events. In contrast, the filaments on agar reverse with a frequency that is inversely proportional to the filament length (which is in turn proportional to the track length) (see original Fig. S1). The distribution of reversal frequencies on agar is broader and flatter than the distribution on glass (see new Fig. S4B). These findings are inline with the idea that tracks on agar (which are defined by filament length) dictate reversal frequency, resulting in the strong correlations we observe between reversal frequency, track length, and filament length. On glass, filament movement is not constrained by tracks, and we have a specific reversal frequency independent of filament length.”

      “Model can capture movement of filaments on glass and provides hypotheses regarding constancy of reversal frequency with length. We believe the model parameters controlling cellular memory (ω<sub>max</sub>) and strength of cellular coupling (K<sub>ω</sub>) describe the internal behaviour of a filament and therefore should not change depending on the substrate. Thus, we expect the model to be able to capture movement on glass just by removal of any ‘confining tracks’, i.e external forces, from the simulations. Indeed, we find that the model displays both stop-go and reversal events when simulated without any external force and can capture the dwell time distribution under this condition (compare new Figs. S12,S13 with S3).

      In terms of reversal frequency, however, the model shows a reduction in reversal frequency with filament length (see new Fig. S15). This is in contrast to the experimental data. We find, however, that model results also show a reduction in reversal frequency with increasing (ω<sub>max</sub> and K<sub>ω</sub> (see new Fig. S14 and S15). This effect is stronger with (ω<sub>max</sub>, while it quickly saturates with K<sub>ω</sub> (see new Fig. S14). Therefore, one possibility of reconciling the model and experiment results in terms of constant reversal frequency with filament length would be to assume that (ω<sub>max</sub> is decreasing with filament length (see new Fig. S16). Testing this hypothesis - or adding additional mechanisms into the model - will constitute the basis of future studies.”

      Regarding suggestion (3) - role of mechanosensing:

      We have tried several experiments to evaluate mechanosensing. First, we have used a micropipette or a thin wire placed on the agar, to create a physical barrier in the way of the filaments. The micropipette approach was not quite feasible in our current setup. The wire approach was possible to implement, but the wire caused a significant undulation / perturbation on agar. Possibly relating to this, filaments tended to continue moving alongside the wire barrier. Therefore, these experiments were inconclusive at this stage with regards to mechanosensing a physical barrier. As an alternative, we have attempted trapping gliding filaments using an optical trap with a far red laser that should not affect the physiology of the cells. This did not cause an immediate reversal in filament motion. However, this could be due to the optical trap strength being below the threshold value for mechanosensing. The force per unit length generated by filamentous cyanobacteria has been calculated via a model of self-buckling rods, giving a value of ≈1nN/μm [8]. In comparison, the optical trap generates forces on the scale of pN. Thus, the trap force is several orders of magnitude lower than the propulsive force generated by a filament, given filament lengths in the range of ten to several hundreds μm. We conclude that the lack of observed response may be due to the optical trap force being too weak.

      Thus, the experiments we can perform using our current available methods and equipment are not able to prove either the presence or the absence of mechanosensing in the filament. We plan to perform further experiments in this direction, involving new and/or improved experimental setups, such as use of Atomic Force Microscopy.

      We would like to note that there is an additional observation that supports the idea of reversals being mediated by mechanosensing at the end of a track, instead of the locations of the track ends being caused by the intrinsic reversal frequency of the filament. In a few instances (N = 4), filaments on agar ended up on a circular track (see Movie S4 for an example). These filaments did not reverse over durations a few times longer than the ‘expected reversal interval’ of a filament on a straight track.

      Should $N$ following eq 7 and in eq 9 be $N_f$?

      We have corrected this typo.

      It would be useful to include references to what is known about mechanosensing in cyanobacteria.

      We agree with the reviewer, and we have not updated the discussion section to include this information. Mechanosensing has not yet been shown directly in any cyanobacteria, but several species are shown to harbor genes that are implicated (by homology) to be involved in mechanosensing. In particular, analysis of cyanobacterial genomes predicts the presence of a significant number of homologues of the Escherichia coli mechanosensory ion channels MscS and MscL [9]. We have also identified similar MscS protein sequences in F. draycotensis. These channels open when the membrane tension increases, allowing the cell to protect itself from swelling and rupturing when subject to extreme osmotic shock. [10,11]

      We also note that F. draycotensis, as with other filamentous cyanobacteria, have genes associated with the type IV pili, which may be involved in the surface-based motility [1]. Type IV pili have been shown to be mechanosensitive. For example, in cells of Pseudomonas aeruginosa that ‘twitch’ on a surface using type IV pili, application of mechanical shear stress results in increased production of an intracellular signalling molecule involved in promoting biofilm production. The pilus retraction motor has been shown to be involved in this shear-sensing response [12]. Additionally, twitching P. aeruginosa cells often reverse in response to collisions with other cells. Reversal is also caused by collisions with inert glass microfibres, which suggests that the pili-based motility can be affected by a mechanical stimulus [13].

      References

      (1) D. D. Risser, Hormogonium Development and Motility in Filamentous Cyanobacteria. Appl Environ Microbiol 89, e0039223 (2023).

      (2) T. Lamparter et al., The involvement of type IV pili and the phytochrome CphA in gliding motility, lateral motility and photophobotaxis of the cyanobacterium Phormidium lacuna. PLoS One 17, e0249509 (2022)

      (3) E. Hoiczyk, Gliding motility in cyanobacteria: observations and possible explanations. Arch Microbiol 174, 11-17 (2000).

      (4) D. G. Adams, D. Ashworth, B. Nelmes, Fibrillar Array in the Cell Wall of a Gliding Filamentous Cyanobacterium. Journal of Bacteriology 181 (1999).

      (5) L. N. Halfen, R. W. Castenholz, Gliding in a blue-green alga: a possible mechanism. Nature 225, 1163-1165 (1970).

      (6) S. N. Menon, P. Varuni, F. Bunbury, D. Bhaya, G. I. Menon, Phototaxis in Cyanobacteria: From Mutants to Models of Collective Behavior. mBio 12, e0239821 (2021).

      (7) F. D. Conradi, C. W. Mullineaux, A. Wilde, The Role of the Cyanobacterial Type IV Pilus Machinery in Finding and Maintaining a Favourable Environment. Life (Basel) 10 (2020).

      (8) M. Kurjahn, A. Deka, A. Girot, L. Abbaspour, S. Klumpp, M. Lorenz, O. Bäumchen, S. Karpitschka Quantifying gliding forces of filamentous cyanobacteria by self-buckling. eLife 12:RP87450 (2024).

      (9) S.C. Johnson, J. Veres, H. R. Malcolm, Exploring the diversity of mechanosensitive channels in bacterial genomes. Eur Biophys J 50, 25–36 (2021).

      (10) S.I. Sukharev, W.J. Sigurdson, C. Kung, F. Sachs, Energetic and spatial parameters for gating of the bacterial large conductance mechanosensitive channel, MscL. Journal of General Physiology, 113(4), 525-540 (1999).

      (11) N. Levina, S. Tötemeyer, N.R. Stoke, P. Louis, M.A. Jones, I.R. Boot. Protection of Escherichia coli cells against extreme turgor by activation of MscS and MscL mechanosensitive channels: identification of genes required for MscS activity. The EMBO journal (1999).

      (12) V.D. Gordon, L. Wang, Bacterial mechanosensing: the force will be with you, always. Journal of cell science 132(7):jcs227694 (2019).

      (13) M.J. Kühn, L. Talà, Y.F. Inclan, R. Patino, X. Pierrat, I. Vos, Z. Al-Mayyah, H. Macmillan, J. Negrete Jr, J.N. Engel, A. Persat, Mechanotaxis directs Pseudomonas aeruginosa twitching motility. Proceedings of the National Academy of Sciences. 118(30):e2101759118 (2021).

    1. eLife Assessment

      This fundamental work by Yamamoto and colleagues advances our understanding of how positional information is coordinated between axes during limb outgrowth and patterning. They provide solid evidence that the dorsal-ventral axis feeds into anterior-posterior signaling, and identify the responsible molecules by combining transplantations with molecular manipulations. This work will be of broad interest to regeneration, tissue engineering, and evolutionary biologists.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Yamamoto et al. presents a model by which the four main axes of the limb are required for limb regeneration to occur in the axolotl. A longstanding question in regeneration biology is how existing positional information is used to regenerate the correct missing elements. The limb provides an accessible experimental system by which to study the involvement of the anteroposterior, dorsoventral, and proximodistal axes in the regenerating limb. Extensive experimentation has been performed in this area using grafting experiments. Yamamoto et al. use the accessory limb model and some molecular tools to address this question. There are some interesting observations in the study. In particular, one strength is the potent induction of accessory limbs in the dorsal axis with BMP2+Fgf2+Fgf8, which is very interesting.

      Strengths:

      The manuscript presents some novel phenotypes generated in axolotl limbs due to Wnt signaling. This is generally the first example in which Wnt signaling has provided a gain-of-function in the axolotl limb model. They also present a potent way of inducing limb patterning in the dorsal axis by the addition of just beads loaded with Bmp2+Fgf8+Fgf2.

      Weaknesses:

      Although interesting, the study makes bold claims about determining the molecular basis of DV positional cues, but the experimental evidence is not definitive and does not take into account the previous work on DV patterning in the amniote limb. Also, testing the hypothesis on blastemas after limb amputation would be needed to support the strong claims in the study. There are several examples of very strong claims, but the evidence lacks support for these claims.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores how signals from all sides of a developing limb, front/back and top/bottom, work together to guide the regrowth of a fully patterned limb in axolotls, a type of salamander known for its impressive ability to regenerate limbs. Using a model called the Accessory Limb Model (ALM), the researchers created early limb regenerates (called blastemas) with cells from different sides of the limb. They discovered that successful limb regrowth only happens when the blastema contains cells from both the top (dorsal) and bottom (ventral) of the limb. They also found that a key gene involved in front/back limb patterning, called Shh (Sonic hedgehog), is only turned on when cells from both the dorsal and ventral sides come into contact. The study identified two important molecules, Wnt10B and FGF2, that help activate Shh when dorsal and ventral cells interact. Finally, the authors propose a new model that explains how cells from all four sides of a limb, dorsal, ventral, anterior (front), and posterior (back), contribute at both the cellular and molecular level to rebuilding a properly structured limb during regeneration.

      Strengths:

      The techniques used in this study, like delicate surgeries, tissue grafting, and implanting tiny beads soaked with growth factors, are extremely difficult, and only a few research groups in the world can do them successfully. These methods are essential for answering important questions about how animals like axolotls regenerate limbs with the correct structure and orientation. To understand how cells from different sides of the limb communicate during regeneration, the researchers used a technique called in situ hybridization, which lets them see where specific genes are active in the developing limb. They clearly showed that the gene Shh, which helps pattern the front and back of the limb, only turns on when cells from both the top (dorsal) and bottom (ventral) sides are present and interacting. The team also took a broad, unbiased approach to figure out which signaling molecules are unique to dorsal and ventral limb cells. They tested these molecules individually and discovered which could substitute for actual dorsal and ventral cells, providing the same necessary signals for proper limb development. Overall, this study makes a major contribution to our understanding of how complex signals guide limb regeneration, showing how different regions of the limb work together at both the cellular and molecular levels to rebuild a fully patterned structure.

      Weaknesses:

      Because the expressional analyses are performed on thin sections of regenerating tissue, they provide only a limited view of the gene expression patterns in their experiments, opening the possibility that they could be missing some expression in other regions of the blastema. Additionally, the quantification method of the expressional phenotypes in most of the experiments does not appear to be based on a rigorous methodology. Therefore, performing alternate expressional analysis, using RNA-seq or qRT-PCR (for example) on the entire blastema would help validate that the authors are not missing something.

      Overall, the number of replicates per sample group is quite low (sometimes as low as 3), which is especially risky with challenging techniques like the ones the authors employ. The authors don't appear to have performed a power analysis to calculate the number of animals used in each experiment that are sufficient to identify possible statistical differences between groups. Increasing the sample sizes would substantially increase the rigor of their experiments.

      Likewise, the authors' use of an AI-generated algorithm to quantify symmetry on the dorsal/ventral axis, and this approach doesn't appear to account for possible biases due to tissue sectioning angles. They also appear to arbitrarily pick locations in each sample group to compare symmetry measurements. There are other methods, which include using specific muscle groups and nerve bundles as dorsal/ventral landmarks, that would more clearly show differences in symmetry.

    4. Reviewer #3 (Public review):

      Summary:

      After salamander limb amputation, the cross-section of the stump has two major axes: anterior-posterior and dorsal-ventral. Cells from all axial positions (anterior, posterior, dorsal, ventral) are necessary for regeneration, yet the molecular basis for this requirement has remained unknown. To address this gap, Yamamoto et al. took advantage of the ALM assay, in which defined positional identities can be combined on demand and their effects assessed through the outgrowth of an ectopic limb. They propose a compelling model in which dorsal and ventral cells communicate by secreting Wnt10b and Fgf2 ligands, respectively, with this interaction inducing Shh expression in posterior cells. Shh was previously shown to induce limb outgrowth in collaboration with anterior Fgf8 (PMID: 27120163). Thus, this study completes a concept in which four secreted signals from four axial positions interact for limb patterning. Notably, this work firmly places dorsal-ventral interactions upstream of anterior-posterior, which is striking for a field that has been focussed on anterior-posterior communication. The ligands identified (Wnt10b, Fgf2) are different from those implicated in dorsal-ventral patterning in the non-regenerative mouse and chick models. The results in the context of ALM/ectopic limb engineering are impressive, but the authors do not extend their experiments to assay 'normal' regeneration after amputation.

      Strengths:

      (1) The ALM and use of GFP grafts for lineage tracing (Figures 1-3) take full advantage of the salamander model's unique ability to outgrow patterned limbs under defined conditions. As far as I am aware, the ALM has not been combined with precise grafts that assay 2 axial positions at once, as performed in Figure 3. The number of ALMs performed in this study deserves special mention, considering the challenging surgery involved.

      (2) The authors identify that posterior Shh is not expressed unless both dorsal and ventral cells are present. This echoes previous work in mouse limb development models (AER/ectoderm-mesoderm interaction), but this link between axes was not known in salamanders. The authors elegantly reconstitute dorsal-ventral communication by grafting, finding that this is sufficient to trigger Shh expression (Figure 3 - although see also the Weaknesses section.)

      (3) Impressively, the authors discovered two molecules sufficient to substitute dorsal or ventral cells through electroporation into dorsal- or ventral-depleted ALMs (Figure 5). These molecules did not change the positional identity of target cells. The same group previously identified the ventral factor (Fgf2) to be a nerve-derived factor essential for regeneration. In Figure 6, the authors demonstrate that nerve-derived factors, including Fgf2, are alone sufficient to grow out ectopic limbs from a dorsal wound. Limb induction with a 3-factor cocktail without supplementing with other cells is conceptually important for regenerative engineering.

      (4) The writing style and presentation of results are very clear.

      Weaknesses:

      (1) The expression data are the weakest part of this study.

      • Despite being a central message, I found the Shh in situs unconvincing (e.g. Figure 2I, 3C, 5C), especially without sense probe controls. An additional assay would be essential to make the Shh data convincing - perhaps like in Figure 5D (qPCR?), RNA-sequencing, or a downstream target gene.

      • It is not clear what the n numbers mean for the in situ data (slides analysed / number of biological samples / other?). This is crucial to understanding the reliability of the results.

      • The authors do not assay where and when Wnt10b and Fgf2 are expressed beyond the bulk RNA-sequencing (which presumably contains both epidermis and mesenchyme cells). This is a shame, as understanding which cell types express these molecules, and when, would be important for understanding the mechanism.

      (2) It is important to consider that the ALM is not 'regeneration', even if the authors have previously argued that ALM bumps and regenerating blastemas are equivalent (PMID: 17959163). The start- and end- points of ALM are different from regeneration, even though there are undoubtedly common principles involved. Thus, I find the word 'regeneration' in the title and last sentence of the abstract unsubstantiated unless evidence is provided that the same mechanisms (Wnt10b/Fgf2/Shh) function during normal limb regeneration.

      (3) Drawing the exact boundaries of the Ant/Pos/Dor/Ven BL and grafts in the cartoon in Figure 1 (with respect to anatomical landmarks) would help to better understand the experiments in Figures 3 and 4.

      (4) I find the 'positional cue' and 'positional value' terminology confusing, despite the authors' efforts. It is not clear if they refer to cell autonomous or secreted signals, and, as the authors mention, the definitions partially overlap. Lmx1b is defined as a positional value, even though it is necessary and sufficient for dorsal identity (so, isn't it positional information?). Much simpler would be to describe Wnt10b and Fgf2 as what they are: dorsally or ventrally expressed signals that substitute for dorsal or ventral tissue without inducing changes in positional information.

      Overall appraisal:

      This is a logical and well-executed study that creatively uses the axolotl model to advance an important framework for understanding limb patterning. The reliability of the Shh expression data is a weak point in this otherwise impressive study. The relevance of the mechanisms to normal limb regeneration is not substantiated.

    5. Author response:

      We sincerely thank the editor and all three reviewers for their constructive comments. We deeply appreciate the reviewers’ efforts in highlighting both the strengths and the weaknesses of our study. To enhance the quality and clarity of our work, we plan to address the concerns raised in the public reviews through the following actions:

      (1) Improving the tone and language of the manuscript

      We will revise the manuscript thoroughly, incorporating additional explanations and clarifications where necessary, and improving the tone and language to enhance readability and precision. Especially, we will pay careful attention on the terms “positional information,” “positional value,” and “positional cue,” and we plan to explain them in a historical context.

      (2) Extending analysis to regular blastemas

      To validate the applicability of our proposed model beyond the accessory limb model (ALM), we will examine the gene expression patterns of key signaling molecules in regular blastemas generated by limb amputation. This will allow us to test whether the mechanisms we describe are also active during normal limb regeneration.

      (3) Increasing sample sizes in critical experiments

      In order to ensure reproducibility and statistical reliability, we will increase the number of biological replicates in key experiments within the limitations regulated by our animal ethics approval. Additionally, we will collect data that clearly defines the dorsal/ventral axis within the structures, as far as possible. We will also revise the manuscript to pay closer attention to the anterior/posterior/dorsal/ventral axis in the existing data, ensuring that it is clearly described.

      (4) Adding quantitative gene expression data

      To support and reinforce our in situ hybridization results, we will include additional quantitative gene expression analyses (e.g., qRT-PCR), thereby strengthening the conclusions drawn from our expression data.

      We are grateful for the reviewers’ insights and are confident that these revisions will significantly strengthen our manuscript.

    1. eLife Assessment

      This valuable study investigates how stochastic and deterministic factors are integrated during cellular decision-making, particularly in situations where cells differentiate into distinct fates despite being exposed to the same environmental conditions. The authors present convincing evidence that gene expression variability contributes to the robustness of cell fate decisions in D. discoideum, which sheds light into the role of stochasticity during cell differentiation.

    2. Joint Public Review:

      Summary:

      The authors investigate how stochastic and deterministic factors are integrated in cell fate decisions, using Dictyostelium discoideum as a model system. They show that cells in different cell cycle phases (a deterministic factor) are predisposed to different fates, albeit with deviations, when exposed to the same environmental stimulus. However, gene expression variability (a stochastic factor) enhances the robustness of cellular responses to environmental cues that disrupt the cell cycle.

      Using a simple, tractable mathematical model, the authors demonstrate that cell fate decisions in D. discoideum depend on a combination of deterministic and stochastic factors, i.e., cell cycle phase and gene expression variability, respectively. They then identify Set1 - a key regulator of gene expression variability - indicate the mechanism through which it modulates this variability, and link it to a phenotype in D. discoideum development. Finally, they confirm that gene expression variability contributes to the robustness of the cell's response to environmental disruptions that interfere with the cell cycle.

      Strengths:

      The authors are careful in the choice of their experiments and in measuring gene expression variability, using methods that account for expected trends with average gene expression.

      Weaknesses:

      However, in terms of mathematical modelling, it would be important to rule out sources of stochasticity (other than gene expression variability), and also to consider cases where stochastic factors are not necessarily completely independent of the deterministic ones.